March 03, 2009

Does the order of columns in an index matter?

A single column index is fairly straightforward to understand. You may have heard it compared to the index in the back of a technical book. When you want to find some information in the book, say DBCC INPUTBUFFER, you turn to the index in the back of the book and look up DBCC INPUTBUFFER. The index doesn't actually contain the information on DBCC INPUTBUFFER, it has a pointer to the page in the book where the command is described. So, what do you do? You turn to that page and read all about how DBCC INPUTBUFFER may be used. This is a good analogy for a single column non-clustered index.

Composite Indexes
In Microsoft SQL Server, you can also create an index that contains more than one column. These are known as composite indexes. A good analogy for a composite index is the telephone book.
A telephone book lists every individual in the area who has a publicly available telephone number. It's organized not by one column, but two: last name, first name. (Ignoring the middle initial that is sometimes listed but most often treated as an extension of the person's first name). If you wanted to look up someone in the telephone book, you'd first navigate to the last name and then the first name. For example to find Jake Smith, you'd first locate the Smiths, then within the Smiths, you'd find Jake. The same holds true for a composite SQL Server index.
This is all good to know, but how does it really affect query resolution?
Well, let's consider an example. Let's assume you have a Customers table as described below.CREATE TABLE Customers ( Customer_ID INT NOT NULL IDENTITY(1,1) ,Last_Name VARCHAR(20) NOT NULL ,First_Name VARCHAR(20) NOT NULL ,Email_Address VARCHAR(50) NULL )
It has a clustered index on Customer_ID and composite index on the Last_Name, First_Name columns as shown below.CREATE CLUSTERED INDEX ix_Customer_ID ON Customers(Customer_ID)CREATE INDEX ix_Customer_Name ON Customers(Last_Name, First_Name)
Finding a specific row
To find a specific row, we could execute the following query.SELECT *FROM CustomersWHERE Last_Name = 'smith' AND First_Name = 'Jake'
It should be pretty obvious that the ix_Customer_Name index would work well to satisfy this query. A quick look at the execution plan confirms our expectations.

Finding a last name
Now, let's broaden our search a bit to retrieve all customers whose last name is Smith. The following query may be executed for this. SELECT *FROM CustomersWHERE Last_Name = 'smith'
Looking at the query execution plan, we can see that SQL Server did indeed use the ix_Customer_Name composite index; it performed an index seek to find the rows that satisfied the query, then it used a Key Lookup to retrieve the non-indexed column information. You'll notice that this time, however, more work was expended in the Key Lookup than in the Index Seek.

Returning to our telephone book analogy, we can see why this index was deemed efficient by the Query Optimizer. To find all of the Smiths in the telephone book, we'd navigate to the page that contains the first Smith and keep moving forward until we found something other than Smith.
Finding a first name
Now, let's see what happens if we need to find all people who have a first name of Jake. Let's execute the following query.SELECT *FROM CustomersWHERE First_Name = 'Jake'
This yields the following query execution plan.

Notice that this time, SQL Server used a Clustered Index Scan to resolve the query. This is tantamount to a complete table scan. It did not use our ix_Customer_Name index.
Once again, returning to the telephone book example, we can see why. Think about find all of the Jakes in the telephone book. You'd have to start on the first page of the book and look through every entry. Why? Because it's not organized by first name; it's organized by last name, first name.
Does this mean that composite indexes are worthless? No. They have great value to SQL Server. In our first example we were able to use the ix_Customer_Name index to navigate directly to Smith, Jake. We just need to give considerable forethought to the kinds of queries our applications will be submitting to our server and create the appropriate indexes to handle those queries.
Cheers!

1 comment:

Anonymous said...

get the recover outlook password utility to fix data corruption issues in the files of specified format

Vector Database

  Generative AI processes and applications incorporating Generative AI functionality natively rely on accessing Vector Embeddings. This dat...