Monday, 12 November 2007

SQL Server Query Execution Plan Analysis Part3/4

In most cases, the query optimizer will analyze joins and JOIN the tables using the most efficient join type, and in the most efficient order. But not always. In the graphical query plan you will see icons that represent the different types of JOINs used in the query. In addition, each of the JOIN icons will have two arrows pointing to it. The upper arrow pointing to the JOIN icon represents the outer table in the join, and the lower arrow pointing to the JOIN icon represent the inner table in the join. Follow the arrows back to see the name of the tables being joined.

Sometimes, in queries with multiple JOINs, tracing the arrow back won't reveal a table, but another JOIN. If you place the cursor over the arrows pointing to the upper and lower JOINs, you will see a popup window that tells you how many rows are being sent to the JOIN for processing. The upper arrow should always have fewer rows than the lower arrow. If not, then the JOIN order selected by the query optimizer might be incorrect (see more on this below).

First of all, let's look at JOIN types. SQL Server can JOIN a table using three different techniques: nested loop, hash, and merge. Generally, the fastest type of join in a nested loop, but if that is not feasible, then a hash JOIN or merge JOIN is used (as appropriate), both of which tend to be slower than the nested JOIN.

When very large tables are JOINed, a merge join, not a nested loop join, may be the best option. The only way to know is to try both and see which one is the most efficient.

If a particular query is slow, and you suspect it may be because the JOIN type is not the optimum one for your data, you can override the query optimizer's choice by using a JOIN hint. Before you use a JOIN hint, you will want to take some time to learn about each of the JOIN types and how each one works. This is a complicated subject, beyond the scope of this tip.

JOIN order is also selected by the query optimizer, which it trying to select the most efficient order to JOIN tables. For example, for a nested loop join, the upper table should be the smaller of the two tables. For hash joins, the same is true; the upper table should be the smaller of the two tables. If you feel that the query optimizer is selecting the wrong order, you can override it using JOIN hints.

In many cases, the only way to know for sure if using a JOIN hint to change JOIN type or JOIN order will boost or hinder performance is to give them a try and see what happens. [7.0, 2000, 2005] Updated 5-15-2006

*****

If your SQL Server has multiple CPUs, and you have not changed the default setting in SQL Server to limit SQL Server's ability to use all of the CPUs in the server, then the query optimizer will consider using parallelism to execute some queries. Parallelism refers to the ability to execute a query on more than one CPU at the same time. In many cases, a query that runs on multiple processors is faster than a query that only runs on a single processor, but not always.

The Query Optimizer will not always use parallelism, even though it potentially can. This is because the Query Optimizer takes a variety of different things into consideration before it decides to use parallelism. For example, how many active concurrent connections are there, how busy is the CPU, is there enough available memory to run parallel queries, how many rows are being processed, and what is the type of query being run? Once the Query Optimizer collects all the facts, then it decides if parallelism is best for this particular run of the query. You may find that one time a query runs without parallelism, but later, the same query runs again, but this time, parallelism is used.

In some cases, the overhead of using multiple processors is greater than the resource savings of using them. While the query processor does try to weigh the pros and cons of using a parallel query, it doesn't always guess correctly.

If you suspect that parallelism might be hurting the performance of a particular query, you can turn off parallelism for this particular query by using the OPTION (MAXDOP 1) hint.

The only way to know for sure is to test the query both ways, and see what happens. [7.0, 2000, 2005] Updated 5-15-2006

*****

When you review a graphical execution plan, you may notice that the text of a icon is displayed in red, not black, which is the normal color. This means that the related table is missing some statistics that the Query Optimizer would like to have in order to come up with a better execution plan.

To create the missing statistics, you need to right-click on the icon and select "Create Missing Statistics." This will display the "Create Missing Statistics" dialog box, where you can then easily add the missing statistics.

If you are given the option to update missing statistics, you should always take the opportunity to do so as it will most likely benefit the performance of the query that is being analyzed. [7.0, 2000, 2005] Updated 5-15-2006

*****

Sometimes, when viewing a graphical query execution plan, you see an icon labeled "Assert." All this means is that the query optimizer is verifying a referential integrity or check constraint to see if the query will violate it or not. If not, there is no problem. But if it does, then the Query Optimizer will be unable to create an execution plan for the query and an error will be generated. [7.0, 2000, 2005] Updated 5-15-2006

*****

Often, when viewing a graphical query execution plan, you see an icon labeled "Bookmark Lookup." Bookmark lookups are quite common to see. Essentially, they are telling you that the Query Processor had to look up the row columns it needs from a heap or a clustered index, instead of being able to read it directly from a non-clustered index.

For example, if all of the columns in the SELECT, JOIN, and WHERE clauses of a query don't all exist in the non-clustered index used to locate the rows that meet the query's criteria, then the Query Optimizer has to do extra work and look at the heap or clustered index to find all the columns it needs to satisfy the query.

Another cause of a bookmark lookup is using SELECT *, which should never be used, as in most cases it will return more data that you really need.

Bookmark lookups are not ideal from a performance perspective because extra I/O is required to look up all the columns for the rows to be returned.

If you think that a bookmark lookup is hurting a query's performance, you have four potential options to avoid it. First, you can create a clustered index that will be used by the WHERE clause, you can take advantage of index intersection, you can create a covering non-clustered index, or you can (if you have SQL Server 2000/2005 Enterprise Edition, create an indexed view. If none of these are possible, or if using one of these will use more resources than using the bookmark lookup, then the bookmark lookup is the

No comments: