Perf. tuning |
Note that 'execution' here refers to executing the access plan, ie. fetching/sending data from/to the backend database - it does not refer to executing your SQL query. Query execution occurs in the next step (the 'fetching' step).
The reason for using an index is simple - provided we incur an upfront cost of creating (computing) one, runtime lookup costs using the index are vastly cheaper than doing full searches through non-indexed rows.
Eg. given this book, find all the places that discuss table updating..
Example of low sparsity: STU_SEX gender column in a table is either M or F, so indexing this is useless (we'd only have 2 keys, each with 1000s of records!). On the other hand, DOB is a column with relatively higher sparsity, and is worth creating an index for..
A hash index is based on an ordered list of hash values, computed from a key column, using a hashing algorithm - it is much faster to search through the hash values than search the columns. Each hash value then points to the actual (column) data.
A user query (eg. LNAME="Johnson") is converted to a hash 'key' which is then used to search through the pre-computed list of hash values and retrieve an exact match or a small set of values that are all stored with the same key (as a result of 'hash collision').
A bitmap index can use hardware arithmetic to look up actual rows in a table.
We use it when a column can have any of 'N' values, where N is small, eg. a customer's location in the US can only one of 6 values, like so:
In the above case, we'll create 6 bit columns (because there can be one of 6 values in the actual 'CustomerLocation' table column), eg like so: Alaska Hawaii Western Mountain Central Eastern
Then, for each customer, we set the bit to 1 where the customer lives, and the rest to 0. Further, we'll 'pad' with two more 0s, to use 8 bits (a byte). Eg three rows could be
0 0 1 0 0 0 0 0 (lives in CA) 0 1 0 0 0 0 0 0 (lives in Hawaii) 0 0 0 0 0 1 0 0 (lives in Maine) ...
So in the bitmap index, the number of rows will equal the # of rows in the actual table, and the number of columns, and the column names, will equal the possible values for the table column we are indexing.
The DBMS will store each above row as a byte. When we query for just the western region customers, the DBMS will '&' each row with a '00100000' "bitmask", ie efficiently go through each row in the bitmap index to filter just the 00100000 rows, for ex.
Index selectivity: a measure of how likely an index will be used in a query. So we strive to create indexes that have high selectivity..
Indexes are useful when:
Worth creating indexes on single columns that appear in WHERE, HAVING, ORDER BY, GROUP BY and join conditions.
All of the above can be summarized like so: "know what you want, then find a good way to get it" [what do we want to return/compute/generate, how to best go about it (what SQL constructs to use, on what data)].
If you'd like more detail (MySQL-related, but the overall ideas are non-DB-specific), read this doc..