Sunday, January 22, 2012

MySQL InnoDB Indexes


MySQL Indexing – InnoDB Indexes
A few days back I was reading about MySQL indexes, more specifically InnoDB indexes, to better understand query performance and optimization. So I thought of sharing some information on this topic. Indexes are basically structures that help the database engine in finding (retreiving) the records faster. An opposite to index lookup is full scan. Think of a full table scan as going through all the rows in a table and selecting the right row.
A common example that is generally given when explaining indexes is a book's index. To look up for a topic in any book, you either look up for the topic in an index or scan the whole book page by page. Obviously if the book has less pages, it is viable to go page by page and scan for a topic but if the book has decent number of pages than using an index is a smart and efficient approach. Same is the case with database indexes.

InnoDB Indexes
MySQL InnoDB Index Structure uses B+ Tree structure to store its data. B+ Tree structure is a different topic. I will be writing a blog on how data in B+ Trees is organized and how insert, update and delete effects the tree

Clustered Index
Clustered Index is an approach to store data. Think of a Clustered Index as a tree structure (index), with data rows as leaves and primary key as nodes above the leaves. InnoDB clusters the data by primary key. Below are some points to remember regarding the Clustered Index:
  • As stated earlier, InnoDB clusters the data by primary key. If the table has a primary key, MySQL will use this primary key for Clustered Index.
  • If the table does not have a primary key, MySQL selects the first UNIQUE and NOT NULL index for Clustered Index.
  • If none of the above applies, MySQL will generate a hidden (6 byte) field that contain row IDs. MySQL will use this hidden field to cluster data (as Clustered Index).
As Clustered Index holds both the data and primary key on the same page, row access is faster because no additional disk I/O is needed. On the other hand, in case of MyISAM, an additional disk I/O is needed as index and data are not on the same page. Below are some points worth mentioning:
  • Insertion speeds depend on how data is inserted into the table. Insertions are fast if data is inserted in primary key order. A bad approach is to insert data randomly (with random keys) but a good approach is to have sequential keys (like AUTO_INCREMENT)
  • Updating primary key may not be a good idea as it forces each updated row to be moved to a different location. Moving rows to a new location may lead to page splits which causes a table to use more disk space.
  • Though Clustered Indexes are efficient in terms of retreival, defining a clustered key having many columns may be a disadvantage. It would be clear why its a disadvantage once you read about the secondary index.
Secondary Indexes
Secondary Indexes are also called non-clustered index. Unlike clustered index, they do not store row data as leaves but they store primary key (clustered index) as leaves. It is due to this reason that it is advised to keep primary key short. The size of primary key will effect the size of a secondary index. As far as lookups are concerned, secondary index look up requires two steps. One to get primary keys matching the secondary key lookup and after that fetching the actual data by looking up the primary keys that are fetched before, from the clustered index.

No comments:

Post a Comment

I appreciate your comments/feedback/questions. Please do not spam or advertise.