Hello Everyone,
I have two questions about the query engine. One is easy, one not so much.
First, I am looking into how estimated query costs are calculated. I know the BOL answer of "estimated seconds to complete the query", but I am looking for more detail on how that is calculated. Can anyone throw me a link, etc?
Secondly: I work for a company testing SQL 2005 prior to announcing support as a backend for our calculation. In one of our test databases, we stumbled on the optimizer doing something that concerns me greatly. I have not reproduced this in any other test database with otherwise identical schemas. If anyone can help explain what I am seeing, I would appreciate it.
Here is the query:
select hsi.itemlc.statenum, hsi.itemlc.itemnum... hsi.itemlc.status, hsi.itemdata.itemname, hsi.lcstate.statename, hsi.itemdata.itemnum... hsi.itemdata.institution
from hsi.itemlc, hsi.itemdata with (index(itemdata9)) , hsi.lcstate
where hsi.itemlc.itemnum = hsi.itemdata.itemnum and hsi.itemlc.statenum = hsi.lcstate.statenum and hsi.itemlc.statenum = 211 and hsi.itemlc.status + 0 = 0
Let's see...itemdata9 from the index hint has itemnum as its first column, itemlc.itemnum is covered, statenum is seperately covered in itemlc and lcstate. This query returns 7 rows out of about 7000(unique) in itemdata, 400 (383 itemnums, 65 statenums) in itemlc, and 110 rows(unique) in lcstate.
In the other databases, the estimated and actual query plans does index seeks in the order and way I would expect. This query also works fine on previous versions and other platforms.
In this SQL 2005 database, the estimated and actual plans have the first activity is an index seek for lcstate.statenum 211. Like an algebraic associative quality, it's been described. I realize that this isn't preferred join syntax, but the performance implications of the optimizer deciding to do algebra could affect queries all over our application.Why would it choose to do this?
Then the RID lookup, then table scan of itemlc, then the join with lcstate reports a missing join predicate. Itemdata then seeks on the index from the hint, and does RID lookup on that, etc.
I/O Stats: Table 'itemdata'. Scan count 14, logical reads 42, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'itemlc'. Scan count 2, logical reads 18, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'lcstate'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Thanks for your time, and thanks for any help you can give.
Eric Proegler
Use of query hints should be avoided and this is one of the reasons, One of the things that changes with realeases is the optimiser. The reason being experience, changes to hardware, memory, cpu, disk etc. So the underlying costs associated with a query change. The use of index hints is therefore problematic.
In your case you say the itemdata9 index is a covering index. Given you select clause it can't be because "institution" is not included and therefore your index hint is forcing the engine to read an index but because it doesn't have all the data needed it has to then do a row lookup to get the rest of the data.
You have very small tables and so your performance is likely to always be good.
If you remove the index hint what is the query plan and IO
|||In addition why the need for this line
hsi.itemlc.status + 0 = 0
Is that to handle nulls?
Is so you may be better of doing
hsi.itemlc.status IS NULL OR hsi.itemlc.status = 0
|||In case quoting isn't right, SImon said:
Use of query hints should be avoided and this is one of the reasons, One of the things that changes with realeases is the optimiser. The reason being experience, changes to hardware, memory, cpu, disk etc. So the underlying costs associated with a query change. The use of index hints is therefore problematic.
In your case you say the itemdata9 index is a covering index. Given you select clause it can't be because "institution" is not included and therefore your index hint is forcing the engine to read an index but because it doesn't have all the data needed it has to then do a row lookup to get the rest of the data.
You have very small tables and so your performance is likely to always be good.
If you remove the index hint what is the query plan and IO
In addition why the need for this line
hsi.itemlc.status + 0 = 0
Is that to handle nulls?
Is so you may be better of doing
hsi.itemlc.status IS NULL OR hsi.itemlc.status = 0
Eric says:
Ok, the itemdata9 in covering in the sense that the join condition is in there. I still need columns (excised from the sample query, plus the institution) returned, so I have to do the lookup. And the status + 0 is to force the optimizer to not use a certain index that has been problematic before. Not that I wrote that, but that's why it is there. In this query against this database, removing it doesn't change the plan at all.
The tables in this database are small, but they can be much larger in production systems or other test databases.
Removing the index hint doesn't really change anything. It is still choosing to use the same index:
Table 'itemdata'. Scan count 7, logical reads 21, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'itemlc'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'lcstate'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
My real problem is the questions I asked, about the associative property query parsing and understanding query cost calculation. Anyone have any suggestions?
Thanks Again,
Eric
My understanding of the cost is that it is used as a mechanism for ageing plans. I believe this is discussed in Kalens book or Ken Henderson's. Every sweep of the cache reduces the cost by 1, this means that complex queries (to compile) are kept in the cache the longest.
I often find the cost can be misleading. Where I once worked the unit of cost was referred to as a "Beard" for the plain reason that no one knew what the unit was.
As you have done I prefer to look at the IO output, however this doesn't give an indication of CPU cost (although one could infer that the CPU cost relative to the IO cost was small).
As for the way the optimiser works in producing execution trees etc I suggest going on Kimberly Tripp course she'll explain in depth how its done. I do know quite a bit of work went into this for SQL 2005.
No comments:
Post a Comment