Efficiently Finding Maximum Values and Associated Data in Large SQL Tables
Often, database queries require finding the maximum value in a column and retrieving the corresponding data from other columns within the same row. This is especially challenging with very large datasets. Consider a table needing to find the highest version number for each unique ID, along with its associated tag:
Sample Table:
ID | tag | version
----- ----- -----
1 | A | 10
2 | A | 20
3 | B | 99
4 | C | 30
5 | F | 40
Desired Result:
ID | tag | version
----- ----- -----
2 | A | 20
3 | B | 99
4 | C | 30
5 | F | 40
For tables with around 28 million rows, standard methods like nested SELECT
statements or simple GROUP BY
with MAX
can be incredibly slow. A much more efficient solution uses the ROW_NUMBER()
window function:
SELECT s.id, s.tag, s.version
FROM (
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.id ORDER BY t.version DESC) AS rnk
FROM YourTable t
) s
WHERE s.rnk = 1;
This query works in two steps:
Inner Query: It assigns a unique rank (rnk
) to each row within each ID
partition (group of rows with the same ID). The ranking is based on the version
column in descending order, meaning the highest version gets rank 1.
Outer Query: It filters the results from the inner query, selecting only the rows where rnk = 1
. This effectively gives us the row with the maximum version
for each ID
.
This approach avoids nested queries and GROUP BY
operations, making it significantly faster for large datasets. The use of ROW_NUMBER()
provides a clean and efficient way to achieve the desired outcome.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3