"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > Which Composite Index is Optimal for Range Queries: High vs. Low Cardinality Columns?

Which Composite Index is Optimal for Range Queries: High vs. Low Cardinality Columns?

Published on 2024-12-21
Browse:899

Which Composite Index is Optimal for Range Queries: High vs. Low Cardinality Columns?

High Cardinality Column Placement in Composite Indexes with Range Queries

When querying a table with a composite index involving a range condition, the placement of columns within the index can significantly impact performance.

Consider the table files with a primary key (did, filename) and two composite indexes: INDEX(filetime, ext) and INDEX(ext, filetime). Both indexes contain the filetime column, which has higher cardinality than ext.

The query:

WHERE ext = '...'
  AND filetime BETWEEN ... AND ...

requires accessing data based on both ext and filetime. The question arises: which index is optimal for such a query?

Analysis

To determine the optimal index, we can use FORCE INDEX and examine the execution plans:

-- Force range on filetime first
FORCE INDEX(fe) SELECT COUNT(*), AVG(fsize)
FROM files
WHERE ext = 'gif'
  AND filetime >= '2015-01-01'
  AND filetime = '2015-01-01'
  AND filetime 

The output shows that INDEX(ext, filetime) (ef) has a significantly lower row count, indicating a more efficient scan.

Optimizer Trace

To further analyze the optimizer's behavior, we can use the optimizer trace:

SELECT explain_format = 'JSON';

SELECT COUNT(*), AVG(fsize)
FROM files
WHERE ext = 'gif'
  AND filetime >= '2015-01-01'
  AND filetime 

The trace reveals that the optimizer chooses INDEX(ext, filetime) because it can use both columns of the index to filter and fetch data. In contrast, INDEX(filetime, ext) can only use the first column (filetime) for filtering.

Conclusions

Based on the analysis, the following conclusions can be drawn:

  • For composite indexes used in range queries, the column involved in the equality predicate (ext in this case) should be placed first in the index definition.
  • Query performance is improved when the columns in an index are ordered according to the order in which they are used in the WHERE clause.
  • Cardinality alone is not a decisive factor in determining the optimal index. In scenarios where the range column has higher cardinality but the equality column is involved in an equality predicate, placing the equality column first yields better performance.
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3