How do MySQL indexes work
What is an index?
So, what is an index? Well, an index is a data structure (most commonly a B- tree) that stores the values for a specific column in a table. An index is created on a column of a table. So, the key points to remember are that an index consists of column values from one table, and that those values are stored in a data structure. The index is a data structure – remember that.
Let’s start out our tutorial and explanation of why you would need a database index by going through a very simple example. Suppose that
we have a database table called Employee with three columns – Employee_Name, Employee_Age, and Employee_Address
. Assume that the Employee table has thousands of rows.
Now, let’s say that we want to run a query to
find all the details of any employees who are named ‘Jesus’?
So, we decide to run a simple query like this:
SELECT * FROM Employee
WHERE Employee_Name = 'Jesus'
What would happen without an index on the table?
Once we run that query,
what exactly goes on behind the scenes to find employees who are named Jesus? Well, the database software would literally have to look at every single row in the Employee table to see if the Employee_Name for that row is ‘Jesus’.
And, because we want every row with the name ‘Jesus’ inside it, we can not just stop looking once we find just one row with the name ‘Jesus’, because there could be other rows with the name Jesus. So, every row up until the last row must be searched` – which means thousands of rows in this scenario will have to be examined by the database to find the rows with the name ‘Jesus’. This is what is called a full table scan.
How a database index can help performance
You might be thinking that doing a full table scan sounds inefficient for something so simple – shouldn’t software be smarter? It’s almost like looking through the entire table with the human eye – very slow and not at all sleek. But, as you probably guessed by the title of this article, this is where indexes can help a great deal. The whole point of having an index is to speed up search queries by essentially cutting down the number of records/rows in a table that need to be examined.
What kind of data structure is an index?
B- trees are the most commonly used data structures for indexes. The reason B- trees are the most popular data structure for indexes is due to the fact that they are time efficient – because look-ups, deletions, and insertions can all be done in logarithmic time. And, another major reason B- trees are more commonly used is because the data that is stored inside the B- tree can be sorted. The RDBMS typically determines which data structure is actually used for an index. But, in some scenarios with certain RDBMS’s, you can actually specify which data structure you want your database to use when you create the index itself.
How does an index improve performance?
Because an index is basically a data structure that is used to store column values, looking up those values becomes much faster. And, if an index is using the most commonly used data structure type – a B- tree – then the data structure is also sorted. Having the column values be sorted can be a major performance enhancement – read on to find out why. Let’s say that we create a B- tree index on the Employee_Name column This means that when we search for employees named “Jesus” using the SQL we showed earlier, then the entire Employee table does not have to be searched to find employees named “Jesus”. Instead, the database will use the index to find employees named Jesus, because the index will presumably be sorted alphabetically by the Employee’s name. And, because it is sorted, it means searching for a name is a lot faster because all names starting with a “J” will be right next to each other in the index! It’s also important to note that the index also stores pointers to the table row so that other column values can be retrieved – read on for more details on that.
How to create an index in SQL:
Here’s what the actual SQL would look like to create an index on the Employee_Name column from our example earlier:
CREATE INDEX name_index
ON Employee (Employee_Name)
How to create a multi-column index in SQL:
We could also create an index on two of the columns in the Employee table , as shown in this SQL:
CREATE INDEX name_index
ON Employee (Employee_Name, Employee_Age)
Since then I gained some insight about the downside of creating indexes: if you write into a table (
) with one index, you have actually two writing operations in the file system. One for the table data and another one for the index data (and the resorting of it (and - if clustered - the resorting of the table data)). If table and index are located on the same hard disk this costs more time. Thus a table without an index (a heap) , would allow for quicker write operations. (if you had two indexes you would end up with three write operations, and so on)
Ref link: Stack-Overflow
Comparison of B-Tree and Hash Indexes
Understanding the B-tree and hash data structures can help predict how different queries perform on different storage engines that use these data structures in their indexes, particularly for the
storage engine that lets you choose B-tree or hash indexes.
B-Tree Index Characteristics
A B-tree index can be used for column comparisons in expressions that use the
operators. The index also can be used for
comparisons if the argument to
is a constant string that does not start with a wildcard character. For example, the following
statements use indexes:
SELECT * FROM
key_col LIKE 'Patrick%';
SELECT * FROM
key_col LIKE 'Pat%_ck%';
In the first statement, only rows with
key_col < 'Patricl'
are considered. In the second statement, only rows with
key_col < 'Pau'
statements do not use indexes:
SELECT * FROM
key_col LIKE '%Patrick%';
SELECT * FROM
In the first statement, the
value begins with a wildcard character. In the second statement, the
value is not a constant.
If you use
... LIKE '%
is longer than three characters, MySQL uses the Turbo Boyer-Moore algorithm
to initialize the pattern for the string and then uses this pattern to perform the search more quickly.
A search using
col_name IS NULL
employs indexes if
Any index that does not span all
levels in the
clause is not used to optimize the query. In other words, to be able to use an index, a prefix of the index must be used in every
clauses use indexes:
index = 1 OR
index = 2 */
index=1 OR A=10 AND
/* optimized like "
/* Can use index on
index1 but not on
clauses do not
index_part1 is not used */
/* Index is not used in both parts of the WHERE clause */
index=1 OR A=10
/* No index spans all rows */
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses
to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.
Hash Index Characteristics
Hash indexes have somewhat different characteristics from those just discussed:
- They are used only for equality comparisons that use the
<=> operators (but are very fast). They are not used for comparison operators such as
< that find a range of values. Systems that rely on this type of single-value lookup are known as “key-value stores”; to use MySQL for such applications, use hash indexes wherever possible.
- The optimizer cannot use a hash index to speed up
ORDER BY operations. (This type of index cannot be used to search for the next entry in order.)
- MySQL cannot determine approximately how many rows there are between two values (this is used by the range optimizer to decide which index to use). This may affect some queries if you change a
InnoDB table to a hash-indexed
- Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.)
Ref: MYSQL Official Website