Friday, March 9, 2012

Query Optimizer

Hi,
I have been given a database without any index to tune its indexes and
SPs(Let's say MyDB1). I made a copy of MyDB1 and attached it as MyDB2 to
create indexes in that and compare the results with SPs in MyDB1.
For a particular query (a LEFT JOIN), the execution plan showed that a table
scan(on parent table) and hash join was used. Therefore I created a PK on
parent table.
This time Query Optimizer used clustered index scan and nested loop and
totally the execution plan was quiet improved.
I used a single connection in Query Analyzer to run two copies of the query
on two different databases (MyDB1 without index, MyDB2 with index).
The query cost (relative to the batch) on MyDB1 was 88% and on MyDB2 12%. So
far everything was as I expected.
But when I run query on MyDB1 and MyDB2 several times, I noticed that MyDB2
which had index, was slower. However Query Analyzer had indicated less
cost(12%)!
I turned Statistics Time on to make sure. Yes it was slower and took more
time to produce results.
I used ITW, it suggested exactly the same index as I had created. I
unchecked the Clustered option for the PK to make it nonclustered. A
bookmark lookup was added in execution plan but the query became very fast!
I tried to use different join hints and LEFT HASH JOIN improved query
performance again.
Please help me with these two questions:
1) Can I rely to Query Cost (relative to the batch) to discover that which
indexes help my query? In my case, I say NO because practically I got poor
performance.
2) If I get better performance by using join hints, is it wise to use them
in SPs without letting the query optimizer to choose its own plan?
Any help would be greatly appreciated.
Leila1) Can I rely to Query Cost (relative to the batch) to discover that which
indexes help my query? In my case, I say NO because practically I got poor
performance.
It's usually not a bad guide.
It doesn't take into account what is in memory though - i.e. reading from
disk gives the same cost as reading from memory so if you have two queries o
n
the same data and the first gets all the data into memory then the first wil
l
take a lot longer but the cost won't take this into account.
2) If I get better performance by using join hints, is it wise to use them
in SPs without letting the query optimizer to choose its own plan?
I would use hints sparingly. It's better to recode so that the plan the
system retrieves is reasonable - but sometimes you have no choice.
"Leila" wrote:

> Hi,
> I have been given a database without any index to tune its indexes and
> SPs(Let's say MyDB1). I made a copy of MyDB1 and attached it as MyDB2 to
> create indexes in that and compare the results with SPs in MyDB1.
> For a particular query (a LEFT JOIN), the execution plan showed that a tab
le
> scan(on parent table) and hash join was used. Therefore I created a PK on
> parent table.
> This time Query Optimizer used clustered index scan and nested loop and
> totally the execution plan was quiet improved.
> I used a single connection in Query Analyzer to run two copies of the quer
y
> on two different databases (MyDB1 without index, MyDB2 with index).
> The query cost (relative to the batch) on MyDB1 was 88% and on MyDB2 12%.
So
> far everything was as I expected.
> But when I run query on MyDB1 and MyDB2 several times, I noticed that MyDB
2
> which had index, was slower. However Query Analyzer had indicated less
> cost(12%)!
> I turned Statistics Time on to make sure. Yes it was slower and took more
> time to produce results.
> I used ITW, it suggested exactly the same index as I had created. I
> unchecked the Clustered option for the PK to make it nonclustered. A
> bookmark lookup was added in execution plan but the query became very fast
!
> I tried to use different join hints and LEFT HASH JOIN improved query
> performance again.
> Please help me with these two questions:
> 1) Can I rely to Query Cost (relative to the batch) to discover that which
> indexes help my query? In my case, I say NO because practically I got poor
> performance.
> 2) If I get better performance by using join hints, is it wise to use them
> in SPs without letting the query optimizer to choose its own plan?
> Any help would be greatly appreciated.
> Leila
>
>|||Thanks Nigel,
I changed the order of executing two queries to make sure that the second
query is not always using buffer which first query has prepared. Besides, I
run DBCC FREEPROCCHACHE and DBCC DROPCLEANBUFFERS before each query. But the
result was the same always.
Do you know why clustered PK might have poor performance, but when I change
that to nonclustered it improves considerably?
Thanks,
Leila
"Nigel Rivett" <sqlnr@.hotmail.com> wrote in message
news:C6599BA9-1E50-41BE-A5F4-E15CAC597E84@.microsoft.com...
> 1) Can I rely to Query Cost (relative to the batch) to discover that which
> indexes help my query? In my case, I say NO because practically I got poor
> performance.
> It's usually not a bad guide.
> It doesn't take into account what is in memory though - i.e. reading from
> disk gives the same cost as reading from memory so if you have two queries
on
> the same data and the first gets all the data into memory then the first
will[vbcol=seagreen]
> take a lot longer but the cost won't take this into account.
> 2) If I get better performance by using join hints, is it wise to use them
> in SPs without letting the query optimizer to choose its own plan?
> I would use hints sparingly. It's better to recode so that the plan the
> system retrieves is reasonable - but sometimes you have no choice.
> "Leila" wrote:
>
table[vbcol=seagreen]
on[vbcol=seagreen]
query[vbcol=seagreen]
12%. So[vbcol=seagreen]
MyDB2[vbcol=seagreen]
more[vbcol=seagreen]
fast![vbcol=seagreen]
which[vbcol=seagreen]
poor[vbcol=seagreen]
them[vbcol=seagreen]|||On Sat, 13 Nov 2004 21:20:28 +0330, Leila wrote:

>Do you know why clustered PK might have poor performance, but when I change
>that to nonclustered it improves considerably?
Hi Leila,
That can happen in some situations. To understand, you need to know what
exactly is stored on the leaf pages (the lowest level in the B-tree) of
each kind of index.
For the clustered index, the leaf pages contain all data from the table.
If the size of one row is 400 bytes, each clustered index leaf page will
contain the information of ~20 rows ((8K - some overhead) / 400 bytes).
For a clustered index, the leaf pages contain the values of the indexed
columns plus a "row-locator" - usually the values of the clustered index'
columns; a physical row locator if there is no clustered index. If the
combined size of all columns in the index plus the row locator is 40
bytes, each nonclustered index leaf page will contain the information of
~200 rows ((8K - some overhead) / 40 bytes).
I think that your query used only columns that participate in the index
you used, but used a WHERE clause that prohibited the use of directly
finding the required rows in the index, so all rows had to be scanned. If
the table has 2000 rows, this would take 100 page reads (2000 / 20) if the
index is clustered, but only 10 page reads if the index is nonclustered
(2000 / 200).
Remember: you'll lose this advantage as soon as you start referring to any
column that is not part of the index. The nonclustered index is cheaper
only because there is no need to locate and read the rest of the columns.
(This is called a "covering" index).
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hugo,
Thank you very much indeed! Maybe you're right and the problem is from the
pages' density. On Monday I will go and check. As far as I know, the
developers have used some TEXT and NCHAR data types that might cause the low
density in pages.
Thanks again!
Leila
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:c9vcp09a0hmttg6c9t4nha093190q52314@.
4ax.com...
> On Sat, 13 Nov 2004 21:20:28 +0330, Leila wrote:
>
change[vbcol=seagreen]
> Hi Leila,
> That can happen in some situations. To understand, you need to know what
> exactly is stored on the leaf pages (the lowest level in the B-tree) of
> each kind of index.
> For the clustered index, the leaf pages contain all data from the table.
> If the size of one row is 400 bytes, each clustered index leaf page will
> contain the information of ~20 rows ((8K - some overhead) / 400 bytes).
> For a clustered index, the leaf pages contain the values of the indexed
> columns plus a "row-locator" - usually the values of the clustered index'
> columns; a physical row locator if there is no clustered index. If the
> combined size of all columns in the index plus the row locator is 40
> bytes, each nonclustered index leaf page will contain the information of
> ~200 rows ((8K - some overhead) / 40 bytes).
> I think that your query used only columns that participate in the index
> you used, but used a WHERE clause that prohibited the use of directly
> finding the required rows in the index, so all rows had to be scanned. If
> the table has 2000 rows, this would take 100 page reads (2000 / 20) if the
> index is clustered, but only 10 page reads if the index is nonclustered
> (2000 / 200).
> Remember: you'll lose this advantage as soon as you start referring to any
> column that is not part of the index. The nonclustered index is cheaper
> only because there is no need to locate and read the rest of the columns.
> (This is called a "covering" index).
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||On Sat, 13 Nov 2004 15:58:36 +0330, "Leila" <leilas@.hotpop.com> wrote:
>But when I run query on MyDB1 and MyDB2 several times, I noticed that MyDB2
>which had index, was slower. However Query Analyzer had indicated less
>cost(12%)!
...
>I tried to use different join hints and LEFT HASH JOIN improved query
>performance again.
>Please help me with these two questions:
>1) Can I rely to Query Cost (relative to the batch) to discover that which
>indexes help my query? In my case, I say NO because practically I got poor
>performance.
>2) If I get better performance by using join hints, is it wise to use them
>in SPs without letting the query optimizer to choose its own plan?
>Any help would be greatly appreciated.
You tell an interesting story.
Let's review what the optimizer is all about - no, let's go even
further and consider what optimization is all about! If you have two
tables and no indexes, any join requires the work of a full
cross-join. Optimization suggests you have an index at least on the
PK (SQLServer does that automatically, though it calls the PK a
constraint, only if you don't even have a PK do you really have no
indexes). Especially when building an index also includes keeping
statistics (as it usually does in SQLServer), an optimizer can then
decide if a particular query can use the index. In a lot of cases,
this will drop a thirty-second query to thirty milliseconds. It's
those big, fat wins that the *optimizer* is all about.
Now, it so happens that, if your query requires that some large
percentage of records have to be inspected, that the use of any index
becomes a tougher decision. In general, if your query requires more
than about a quarter of a table's records, it turns out to be cheaper
to scan the whole table than to use a (nonclustered) index (if the
records that need to be scanned fit a range in a *clustered* index,
you get other situations). What I'm trying to say here is that if
it's a relatively close decision, the optimizer can guess wrong. If
it's only a factor of 2x, I'd guess the optimizer will get it wrong
fairly often. The optimizer is about finding 10x and 100x wins, and
it's pretty good at that.
So, I'm guessing your test query involved a large percentage of your
total records, and your database was relatively small, maybe under
100mb or 1m records in the tables involved.
Another thing to try is to put a few more indexes on each table, so
that the optimizer has more statistics to work with. Just a single
index on the left table may not be enough for the optimizer. Would
the ITW tell you if more would help? I'm not sure, I've only played
with that, never tried to use it for real.
I think I've only found one case where I had to use a hint to improve
performance, and that was instead of adding another index, which might
have been a better solution. If you tell SQLServer about the PKs for
each table, and put indices on the obvious fields of the larger
tables, the optimizer is going to give you 90%++ of available
performance. Um, that assumes the database is reasonably normalized.
Further hand-tuning of queries, doing semantic splits of tables, using
materialized views, and tuning physical layouts, and other exotica,
are still there for really huge or really critical databases.
I really enjoy tuning SQLServer, I'm just getting into some of the
available tools and techniques, but so far, in the real world, it
seems what I really find on people's sites are the real basics, tables
with no indexes on heavily used fields, etc. The good news is I put
on the index, show a 99% improvement in time, and everybody is happy.
The bad news is I don't get to use all the fancy stuff.
HTH,
J.|||Thank you very much for your comprehensive explanation.
Actually I'm not too bad at optimization and I guess the problem is
something beyond optimization concepts.
ITW's suggestion was what I had done. The query cost was quite reasonable
after creating index and the execution plan improved. But the execution time
got worse. The clustered PK which helped plan, caused a slower query while
nonclustered PK made it faster.
I have to study more on two tables' structure to find out the reason.
Thanks again,
Leila
"JXStern" <JXSternChangeX2R@.gte.net> wrote in message
news:4e3fp0tq3g00vd7k9a64r73pd6ld6rarbe@.
4ax.com...
> On Sat, 13 Nov 2004 15:58:36 +0330, "Leila" <leilas@.hotpop.com> wrote:
MyDB2[vbcol=seagreen]
> ...
which[vbcol=seagreen]
poor[vbcol=seagreen]
them[vbcol=seagreen]
> You tell an interesting story.
> Let's review what the optimizer is all about - no, let's go even
> further and consider what optimization is all about! If you have two
> tables and no indexes, any join requires the work of a full
> cross-join. Optimization suggests you have an index at least on the
> PK (SQLServer does that automatically, though it calls the PK a
> constraint, only if you don't even have a PK do you really have no
> indexes). Especially when building an index also includes keeping
> statistics (as it usually does in SQLServer), an optimizer can then
> decide if a particular query can use the index. In a lot of cases,
> this will drop a thirty-second query to thirty milliseconds. It's
> those big, fat wins that the *optimizer* is all about.
> Now, it so happens that, if your query requires that some large
> percentage of records have to be inspected, that the use of any index
> becomes a tougher decision. In general, if your query requires more
> than about a quarter of a table's records, it turns out to be cheaper
> to scan the whole table than to use a (nonclustered) index (if the
> records that need to be scanned fit a range in a *clustered* index,
> you get other situations). What I'm trying to say here is that if
> it's a relatively close decision, the optimizer can guess wrong. If
> it's only a factor of 2x, I'd guess the optimizer will get it wrong
> fairly often. The optimizer is about finding 10x and 100x wins, and
> it's pretty good at that.
> So, I'm guessing your test query involved a large percentage of your
> total records, and your database was relatively small, maybe under
> 100mb or 1m records in the tables involved.
> Another thing to try is to put a few more indexes on each table, so
> that the optimizer has more statistics to work with. Just a single
> index on the left table may not be enough for the optimizer. Would
> the ITW tell you if more would help? I'm not sure, I've only played
> with that, never tried to use it for real.
> I think I've only found one case where I had to use a hint to improve
> performance, and that was instead of adding another index, which might
> have been a better solution. If you tell SQLServer about the PKs for
> each table, and put indices on the obvious fields of the larger
> tables, the optimizer is going to give you 90%++ of available
> performance. Um, that assumes the database is reasonably normalized.
> Further hand-tuning of queries, doing semantic splits of tables, using
> materialized views, and tuning physical layouts, and other exotica,
> are still there for really huge or really critical databases.
> I really enjoy tuning SQLServer, I'm just getting into some of the
> available tools and techniques, but so far, in the real world, it
> seems what I really find on people's sites are the real basics, tables
> with no indexes on heavily used fields, etc. The good news is I put
> on the index, show a 99% improvement in time, and everybody is happy.
> The bad news is I don't get to use all the fancy stuff.
> HTH,
> J.
>

No comments:

Post a Comment