」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > Mysql資料庫索引初學者詳解

Mysql資料庫索引初學者詳解

發佈於2024-08-01
瀏覽:592

Core Concepts

  • Primary Key Index / Secondary Index
  • Clustered Index / Non-Clustered Index
  • Table Lookup / Index Covering
  • Index Pushdown
  • Composite Index / Leftmost Prefix Matching
  • Prefix Index
  • Explain

1. [Index Definition]

1. Index Definition

Besides the data itself, the database system also maintains data structures that satisfy specific search algorithms. These structures reference (point to) the data in a certain way, allowing advanced search algorithms to be implemented on them. These data structures are indexes.

2. Data Structures of Indexes

  • B-tree / B tree (MySQL's InnoDB engine uses B tree as the default index structure)
  • HASH table
  • Sorted array

3. Why Choose B Tree Over B Tree

  • B-tree structure: Records are stored in the tree nodes.

Mysql Database Index Explained for Beginners

  • B tree structure: Records are stored only in the leaf nodes of the tree.

Mysql Database Index Explained for Beginners

  • Assuming a data size of 1KB and an index size of 16B, with the database using disk data pages, and a default disk page size of 16K, the same three I/O operations will yield:
  1. B-tree can fetch 16*16*16=4096 records.

  2. B tree can fetch 1000*1000*1000=1 billion records.

2. [Index Types]

1. Primary Key Index and Secondary Index

  • Primary Key Index: The leaf nodes of the index are data rows.
  • Secondary Index: The leaf nodes of the index are KEY fields plus primary key index. Therefore, when querying through a secondary index, it first finds the primary key value, and then InnoDB finds the corresponding data block through the primary key index.
  • In InnoDB, the primary index file directly stores the data row, called clustered index, while secondary indexes point to the primary key reference.
  • In MyISAM, both primary and secondary indexes point to physical rows (disk positions).

Mysql Database Index Explained for Beginners

2. Clustered Index and Non-Clustered Index

  • A clustered index reorganizes the actual data on the disk to be sorted by one or more specified column values. The characteristic is that the storage order of the data and the index order are consistent. Generally, the primary key will default to creating a clustered index, and a table only allows one clustered index (reason: data can only be stored in one order). As shown in the image, InnoDB's primary and secondary indexes are clustered indexes.
  • Compared to the leaf nodes of a clustered index being data records, the leaf nodes of a non-clustered index are pointers to the data records. The biggest difference is that the order of data records does not match the index order.

3. Advantages and Disadvantages of Clustered Index

  • Advantage: When querying entries by primary key, it does not need to perform a table lookup (data is under the primary key node).
  • Disadvantage: Frequent page splits can occur with irregular data insertion.

3. [Extended Index Concepts]

1. Table Lookup

The concept of table lookup involves the difference between primary key index and non-primary key index queries.

  • If the query is select * from T where ID=500, a primary key query only needs to search the ID tree.
  • If the query is select * from T where k=5, a non-primary key index query needs to first search the k index tree to get the ID value of 500, then search the ID index tree again.
  • The process of moving from the non-primary key index back to the primary key index is called table lookup.

Queries based on non-primary key indexes require scanning an additional index tree. Therefore, we should try to use primary key queries in applications. From the perspective of storage space, since the leaf nodes of the non-primary key index tree store primary key values, it is advisable to keep the primary key fields as short as possible. This way, the leaf nodes of the non-primary key index tree are smaller, and the non-primary key index occupies less space. Generally, it is recommended to create an auto-increment primary key to minimize the space occupied by non-primary key indexes.

2. Index Covering

  • If a WHERE clause condition is a non-primary key index, the query will first locate the primary key index through the non-primary key index (the primary key is located at the leaf nodes of the non-primary key index search tree), and then locate the query content through the primary key index. In this process, moving back to the primary key index tree is called table lookup.
  • However, when our query content is the primary key value, we can directly provide the query result without table lookup. In other words, the non-primary key index has already "covered" our query requirement in this query, hence it is called a covering index.
  • A covering index can directly obtain query results from the auxiliary index without table lookup to the primary index, thereby reducing the number of searches (not needing to move from the auxiliary index tree to the clustered index tree) or reducing IO operations (the auxiliary index tree can load more nodes from the disk at once), thereby improving performance.

3. Composite Index

A composite index refers to indexing multiple columns of a table.

Scenario 1:

A composite index (a, b) is sorted by a, b (first sorted by a, if a is the same then sorted by b). Therefore, the following statements can directly use the composite index to get results (in fact, it uses the leftmost prefix principle):

  • select … from xxx where a=xxx;
  • select … from xxx where a=xxx order by b;

The following statements cannot use composite queries:

  • select … from xxx where b=xxx;

Scenario 2:

For a composite index (a, b, c), the following statements can directly get results through the composite index:

  • select … from xxx where a=xxx order by b;
  • select … from xxx where a=xxx and b=xxx order by c;

The following statements cannot use the composite index and require a filesort operation:

  • select … from xxx where a=xxx order by c;

Summary:

Using the composite index (a, b, c) as an example, creating such an index is equivalent to creating indexes a, ab, and abc. Having one index replace three indexes is certainly beneficial, as each additional index increases the overhead of write operations and disk space usage.

4. Leftmost Prefix Principle

  • From the above composite index example, we can understand the leftmost prefix principle.
  • Not just the full definition of the index, as long as it meets the leftmost prefix, it can be used to speed up retrieval. This leftmost prefix can be the leftmost N fields of the composite index or the leftmost M characters of the string index. Use the "leftmost prefix" principle of the index to locate records and avoid redundant index definitions.
  • Therefore, based on the leftmost prefix principle, it is crucial to consider the field order within the index when defining composite indexes! The evaluation criterion is the reusability of the index. For example, when there is already an index on (a, b), there is generally no need to create a separate index on a.

5. Index Pushdown

MySQL 5.6 introduced the index pushdown optimization, which can filter out records that do not meet the conditions based on the fields included in the index during index traversal, reducing the number of table lookups.

  • Create table
CREATE TABLE `test` (
   `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment primary key',
   `age` int(11) NOT NULL DEFAULT '0',
   `name` varchar(255) CHARACTER SET utf8 NOT NULL DEFAULT '',
   PRIMARY KEY (`id`),
   KEY `idx_name_age` (`name`,`age`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
  • SELECT * from user where name like 'Chen%' Leftmost prefix principle, hitting idx_name_age index
  • SELECT * from user where name like 'Chen%' and age=20
    • Before version 5.6, it would first match 2 records based on the name index (ignoring the age=20 condition at this point), find the corresponding 2 IDs, perform table lookups, and then filter based on age=20.
    • After version 5.6, index pushdown is introduced. After matching 2 records based on name, it will not ignore the age=20 condition before performing table lookups, filtering based on age before table lookup. This index pushdown can reduce the number of table lookups and improve query performance.

6. Prefix Index

When an index is a long character sequence, it can take up a lot of memory and be slow. In this case, prefix indexes can be used. Instead of indexing the entire value, we index the first few characters to save space and achieve good performance. Prefix index uses the first few letters of the index. However, to reduce the index duplication rate, we must evaluate the uniqueness of the prefix index.

  • First, calculate the uniqueness ratio of the current string field: select 1.0*count(distinct name)/count(*) from test
  • Then, calculate the uniqueness ratio for different prefixes:
    • select 1.0*count(distinct left(name,1))/count(*) from test for the first character of the name as the prefix index
    • select 1.0*count(distinct left(name,2))/count(*) from test for the first two characters of the name as the prefix index
    • ...
  • When left(str, n) does not significantly increase, select n as the prefix index cut-off value.
  • Create the index alter table test add key(name(n));

4. [Viewing Indexes]

After adding indexes, how do we view them? Or, if statements are slow to execute, how do we troubleshoot?

Explain is commonly used to check if an index is effective.

After obtaining the slow query log, observe which statements are slow. Add explain before the statement and execute it again. Explain sets a flag on the query, causing it to return information about each step in the execution plan instead of executing the statement. It returns one or more rows of information showing each part of the execution plan and the execution order.

Important fields returned by explain:

  • type: Shows the search method (full table scan or index scan)
  • key: The index field used, null if not used

Explain's type field:

  • ALL: Full table scan
  • index: Full index scan
  • range: Index range scan
  • ref: Non-unique index scan
  • eq_ref: Unique index scan
版本聲明 本文轉載於:https://dev.to/coder_world/mysql-database-index-explained-for-beginners-3heg?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • Hacktoberfest 週線上拍賣系統
    Hacktoberfest 週線上拍賣系統
    概述 在 Hacktoberfest 的第三週,我決定為一個較小但有前途的專案做出貢獻:線上拍賣系統。儘管該專案仍處於早期階段,但它已經顯示出成長潛力,而且我看到了幫助改進其程式碼庫的機會。我的任務是透過減少冗餘程式碼和改進整體結構來重構項目,使其更具可維護性和可擴展性。 ...
    程式設計 發佈於2024-11-06
  • 如何使用“exception_ptr”在 C++ 執行緒之間傳播異常?
    如何使用“exception_ptr”在 C++ 執行緒之間傳播異常?
    在C 中的線程之間傳播異常當從主線程調用的函數生成多個線程時,就會出現在C 中的執行緒之間傳播異常的任務用於CPU 密集型工作的工作執行緒。挑戰在於處理工作執行緒上可能發生的異常並將其傳播回主執行緒以進行正確處理。 傳統方法一種常見方法是手動捕獲工作線程上的各種異常,記錄它們的詳細信息,然後在主線程...
    程式設計 發佈於2024-11-06
  • 如何使用 3D CSS 轉換來修復 Firefox 中的鋸齒狀邊緣?
    如何使用 3D CSS 轉換來修復 Firefox 中的鋸齒狀邊緣?
    使用3D CSS 變換時Firefox 中的鋸齒狀邊緣與Chrome 中使用CSS 變換時的鋸齒狀邊緣問題類似,Firefox 在3D 變換中也出現了這個問題。背面可見性作為 Chrome 中的潛在解決方案,在 Firefox 中被證明無效。 解決方案:要在Firefox 中緩解此問題,您可以實施以...
    程式設計 發佈於2024-11-06
  • 為什麼 PHP 的 mail() 函數會為電子郵件發送帶來挑戰?
    為什麼 PHP 的 mail() 函數會為電子郵件發送帶來挑戰?
    為什麼PHP 的mail() 函數達不到要求:限制和陷阱雖然PHP 提供了mail() 函數用於發送電子郵件,但它卻失敗了與專用庫或擴展相比較短。以下是與使用mail() 相關的缺點和限制的全面檢查:格式問題:mail() 可能會遇到以下問題:標題和內容格式,尤其是作業系統之間的換行差異。這些錯誤可...
    程式設計 發佈於2024-11-06
  • 使用 npyConverter 簡化 NumPy 檔案轉換
    使用 npyConverter 簡化 NumPy 檔案轉換
    如果您使用 NumPy 的 .npy 檔案並需要將其轉換為 .mat (MATLAB) 或 .csv 格式,npyConverter 就是適合您的工具!這個簡單的基於 GUI 的工具透過乾淨且用戶友好的介面提供 .npy 檔案的批量轉換。 主要特點 批次轉換:將目錄下所有.npy檔...
    程式設計 發佈於2024-11-06
  • 如何停用特定線路的 Eslint 規則?
    如何停用特定線路的 Eslint 規則?
    停用特定行的Eslint 規則在JSHint 中,可以使用語法停用特定行的linting 規則: /* jshint ignore:start */ $scope.someVar = ConstructorFunction(); /* jshint ignore:end */對於 eslint,有幾...
    程式設計 發佈於2024-11-06
  • 如何在沒有錯誤的情況下將清單插入 Pandas DataFrame 單元格?
    如何在沒有錯誤的情況下將清單插入 Pandas DataFrame 單元格?
    將清單插入Pandas 儲存格問題在Python 中,嘗試將清單插入Pandas DataFrame 的儲存格可能會導致錯誤或意圖想不到的結果。例如,當嘗試將清單插入DataFrame df 的儲存格1B 時:df = pd.DataFrame({'A': [12, 23], 'B': [np.na...
    程式設計 發佈於2024-11-06
  • Matplotlib 中的「plt.plot」、「ax.plot」和「figure.add_subplot」之間的主要差異是什麼?
    Matplotlib 中的「plt.plot」、「ax.plot」和「figure.add_subplot」之間的主要差異是什麼?
    Matplotlib 中繪圖、軸與圖形之間的差異Matplotlib 是一個用於建立視覺化的物件導向的 Python 函式庫。它使用三個主要物件:圖形、軸和繪圖。 圖形圖形表示將在其中顯示可視化的整個畫布或視窗。它定義畫布的整體大小和佈局,包括邊距、背景顏色和任何其他全域屬性。 軸軸表示圖中繪製資料...
    程式設計 發佈於2024-11-06
  • FireDucks:以零學習成本獲得超越 pandas 的效能!
    FireDucks:以零學習成本獲得超越 pandas 的效能!
    Pandas 是最受歡迎的庫之一,當我在尋找一種更簡單的方法來加速其性能時,我發現了 FireDucks 並對它產生了興趣! 與 pandas 的比較:為什麼選擇 FireDucks? Pandas 程式可能會遇到嚴重的效能問題,這取決於其編寫方式。然而,作為一名數據科學家,我想花...
    程式設計 發佈於2024-11-06
  • CSS 網格:嵌套網格佈局
    CSS 網格:嵌套網格佈局
    介紹 CSS Grid 是一種佈局系統,因其在創建多列佈局方面的靈活性和效率而迅速受到 Web 開發人員的歡迎。它最有用的功能之一是能夠建立嵌套網格佈局。嵌套網格可以在設計複雜網頁時提供更多控制和精確度。在本文中,我們將探討在 CSS Grid 中使用嵌套網格佈局的優點、缺點和主要...
    程式設計 發佈於2024-11-06
  • 適用於 Java 的 Jupyter 筆記本
    適用於 Java 的 Jupyter 筆記本
    Jupyter Notebook 的强大 Jupyter Notebooks 是一个出色的工具,最初是为了帮助数据科学家和工程师使用 python 编程语言简化数据处理工作而开发的。事实上,笔记本的交互性使其非常适合快速查看代码结果,而无需搭建开发环境、编译、打包等。此功能对于数据...
    程式設計 發佈於2024-11-06
  • 如何在 PyQt 中的主視窗和執行緒之間共享資料:直接引用與訊號和插槽?
    如何在 PyQt 中的主視窗和執行緒之間共享資料:直接引用與訊號和插槽?
    PyQt 中主視窗與執行緒之間共享資料多執行緒應用程式通常需要在主視窗執行緒與工作執行緒之間共用數據。為了確保線程安全和正確的通信,PyQt 提供了幾種實用的方法。 選項 1:直接引用主視窗在此方法中,對主視窗的引用視窗被傳遞給執行緒。然後執行緒可以直接存取主視窗中的數據,例如 spinbox 的值...
    程式設計 發佈於2024-11-06
  • 對於專業開發人員來說最有用的 VS Code 快捷方式?
    對於專業開發人員來說最有用的 VS Code 快捷方式?
    VS Code 中 20 個最有用的快捷鍵 一般導航 指令面板:存取 VS Code 中的所有可用指令。 Ctrl Shift P (Windows/Linux) 或 Cmd Shift P (macOS) 快速開啟:按名稱快速開啟檔案。 Ctrl P (Windows/Linux) 或 Cmd ...
    程式設計 發佈於2024-11-06
  • 何時使用“composer update”與“composer install”?
    何時使用“composer update”與“composer install”?
    探索composer update和composer install之間的區別Composer是一個流行的PHP依賴管理器,提供兩個關鍵命令:composer update和composer install。雖然它們具有管理依賴關係的共同目標,但它們具有不同的目的並以不同的方式運作。 Compose...
    程式設計 發佈於2024-11-06

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3