Query longer in SP: optimization

Showing posts with label optimization. Show all posts

Friday, March 9, 2012

Query Optimizer

Hi,
I'm very interested in query optimization and I really enjoy this job. I
have read everything in BOL and also books like "Inside SQL Server 2000" and
"Guru's Guide to SQL Server Architecture and Internals" and "Microsoft SQL
Server 2000 Performance Optimization and Tuning Handbook".
But in new groups I read some information from MVPs that is not included in
any of mentioned resources. I'd like to know if there are other references
( like web sites, books ,...) that can give more information about this
Artificial Intelligence (Query Optimizer!).
Thanks in advance,
LeilaLeila,
SQL Server Performance Tuning Tips
http://www.sql-server-performance.com/tips_performance.asp
Lot of info in the MVP's web site.
Erland Sommarskog's home page
http://www.sommarskog.se/
http://vyaskn.tripod.com/
http://www.aspfaq.com/
http://www.karaszi.com/SQLServer/default.asp
http://users.drew.edu/skass/sql/
AMB
"Leila" wrote:
> Hi,
> I'm very interested in query optimization and I really enjoy this job. I
> have read everything in BOL and also books like "Inside SQL Server 2000" and
> "Guru's Guide to SQL Server Architecture and Internals" and "Microsoft SQL
> Server 2000 Performance Optimization and Tuning Handbook".
> But in new groups I read some information from MVPs that is not included in
> any of mentioned resources. I'd like to know if there are other references
> ( like web sites, books ,...) that can give more information about this
> Artificial Intelligence (Query Optimizer!).
> Thanks in advance,
> Leila
>
>|||hmmm. the optimizer is pretty smart, but you are smarter.
rather then learning how the optimizer works, i spent time figuring out
how I would like my queries processed, and how to encourage (FORCE!)
the optimizer to do it the fastest way.|||Yes, people are smarter than computers, but computers are usually faster.
Why is it that so few people can actually beat a computer at chess? The
computer can evaluate millions of possible moves in less time than a human
can physically move a pawn. So the SQL Server optimizer can evaluate 1000s
of different alternatives to find ones that you might never have thought of.
Yes, there are times where you can come up with a better plan for one
particular query at one particular time. However, once you decide to FORCE
the optimizer to do something, it becomes static. As your data changes, your
forced choice may no longer be the best, and may actually hurt performance.
Also, as the optimizer is upgraded to add new techniques, you'll never have
a chance to know if those new choices can make your queries run even faster.
I think better than forcing the optimizer, is learning how to encourage the
optimizer to do the right thing without having to use force. Having the
right indexes in place, writing your queries in the best way and keeping
your statistics up to date can go a long way. Yes, there are times you have
to use hints to force the optimizer's behavior; that's why there are hints
in the product. But they should be a last resort, not a first.
--
HTH
Kalen Delaney, SQL Server MVP
www.solidqualitylearning.com
"Doug" <drmiller100@.hotmail.com> wrote in message
news:1144690845.075086.204670@.t31g2000cwb.googlegroups.com...
> hmmm. the optimizer is pretty smart, but you are smarter.
> rather then learning how the optimizer works, i spent time figuring out
> how I would like my queries processed, and how to encourage (FORCE!)
> the optimizer to do it the fastest way.
>|||Yes, I'm fan of not forcing the optimizer. I remember that I backed up the
database of a customer to bring it to the office for some query
optimizations. In a particulare SP I realized that merge join could be much
faster but optimizer did not chooose it at all. I forced merge join. It
became incredibly faster, When I took my changes to the customer's server,
the merge join was very slow. When I removed the merge join hint, it worked
better! The only difference between my computer and customer's, was the
hardware!
"Kalen Delaney" <replies@.public_newsgroups.com> wrote in message
news:OddYysMXGHA.4704@.TK2MSFTNGP02.phx.gbl...
> Yes, people are smarter than computers, but computers are usually faster.
> Why is it that so few people can actually beat a computer at chess? The
> computer can evaluate millions of possible moves in less time than a human
> can physically move a pawn. So the SQL Server optimizer can evaluate 1000s
> of different alternatives to find ones that you might never have thought
> of.
> Yes, there are times where you can come up with a better plan for one
> particular query at one particular time. However, once you decide to FORCE
> the optimizer to do something, it becomes static. As your data changes,
> your forced choice may no longer be the best, and may actually hurt
> performance.
> Also, as the optimizer is upgraded to add new techniques, you'll never
> have a chance to know if those new choices can make your queries run even
> faster.
> I think better than forcing the optimizer, is learning how to encourage
> the optimizer to do the right thing without having to use force. Having
> the right indexes in place, writing your queries in the best way and
> keeping your statistics up to date can go a long way. Yes, there are times
> you have to use hints to force the optimizer's behavior; that's why there
> are hints in the product. But they should be a last resort, not a first.
> --
> HTH
> Kalen Delaney, SQL Server MVP
> www.solidqualitylearning.com
>
> "Doug" <drmiller100@.hotmail.com> wrote in message
> news:1144690845.075086.204670@.t31g2000cwb.googlegroups.com...
>> hmmm. the optimizer is pretty smart, but you are smarter.
>> rather then learning how the optimizer works, i spent time figuring out
>> how I would like my queries processed, and how to encourage (FORCE!)
>> the optimizer to do it the fastest way.
>|||Even if the hardware is identical, there is as big difference between
your own test box and a production box dealing with dozens of
concurrent requests. The optimizer has no way of knowing what the
occurrence might be, but still it assumes some degree of concurrency
when it chooses the plan. For instance, a merge join might require a
sort - and that could be both CPU and memory hungry, causing
bottlenecks and contention for resources. So on an identical but busy
production server a merge join might run way slower.
Even if a hint seems to be useful at the time you make a hot fix to
your urgent problem at hand, it may start causing harm when the
situation has changed, such as:
- a new constraint
- a new index
- changed statistics
Do not use optimizer hints just because you can - they should be your
last resort. Use proper indexes, index covering and (sometimes only)
indexed views, and you will almost eliminate the need for hints
whatsoever.|||"Leila" <Leilas@.hotpop.com> wrote in message
news:uwlTOANXGHA.4924@.TK2MSFTNGP05.phx.gbl...
> Yes, I'm fan of not forcing the optimizer. I remember that I backed up the
> database of a customer to bring it to the office for some query
> optimizations. In a particulare SP I realized that merge join could be
much
> faster but optimizer did not chooose it at all. I forced merge join. It
> became incredibly faster, When I took my changes to the customer's server,
> the merge join was very slow. When I removed the merge join hint, it
worked
> better! The only difference between my computer and customer's, was the
> hardware!
I haved ONE experience where forcing a hint on a query was the "right"
thing. And that was simply due to a "bug" between SP3 and SP4 in SQL 2K.
As I couldn't upgrade to SP4 right away and w/o the hint, the query was
doing a table scan of a 45 gig table, I think it was the right call. Once
we moved to SP4 and confirmed that on that edition of SQL (Enterprise with
clustering vs. standard) the hint wasn't needed, it was removed. For the
reason you gave above, so that as conditions change, the optimizer can
adatp.
And yes, I agree with Kalen (big surprise there :-) that "encouraging" the
optimizeris generally the right thing.
>
> "Kalen Delaney" <replies@.public_newsgroups.com> wrote in message
> news:OddYysMXGHA.4704@.TK2MSFTNGP02.phx.gbl...
> > Yes, people are smarter than computers, but computers are usually
faster.
> >
> > Why is it that so few people can actually beat a computer at chess? The
> > computer can evaluate millions of possible moves in less time than a
human
> > can physically move a pawn. So the SQL Server optimizer can evaluate
1000s
> > of different alternatives to find ones that you might never have thought
> > of.
> > Yes, there are times where you can come up with a better plan for one
> > particular query at one particular time. However, once you decide to
FORCE
> > the optimizer to do something, it becomes static. As your data changes,
> > your forced choice may no longer be the best, and may actually hurt
> > performance.
> > Also, as the optimizer is upgraded to add new techniques, you'll never
> > have a chance to know if those new choices can make your queries run
even
> > faster.
> >
> > I think better than forcing the optimizer, is learning how to encourage
> > the optimizer to do the right thing without having to use force. Having
> > the right indexes in place, writing your queries in the best way and
> > keeping your statistics up to date can go a long way. Yes, there are
times
> > you have to use hints to force the optimizer's behavior; that's why
there
> > are hints in the product. But they should be a last resort, not a first.
> >
> > --
> > HTH
> > Kalen Delaney, SQL Server MVP
> > www.solidqualitylearning.com
> >
> >
> > "Doug" <drmiller100@.hotmail.com> wrote in message
> > news:1144690845.075086.204670@.t31g2000cwb.googlegroups.com...
> >> hmmm. the optimizer is pretty smart, but you are smarter.
> >>
> >> rather then learning how the optimizer works, i spent time figuring out
> >> how I would like my queries processed, and how to encourage (FORCE!)
> >> the optimizer to do it the fastest way.
> >>
> >
> >
>|||Ok, fair enough.
When I write code, i write generic code without hints.
Almost always that works. Once in a while you run into an
"opportunity" where something isn't running as fast as someone would
like.
in those cases, i'm not afraid to go try to figure out how the data
should be pulled, using either existing indexes or new and needed
indexes.
in those cases, I'm not afraid to force the engine to do it my way.
Documentation obviously occurs.
If you "hint", sometimes something will happen and one day the engine
will choose a poor path based on statistics, the moon, or whatever.
I expressed myself poorly in my original post. My intent was something
along the lines of suggesting to Leila that rather then understanding
the goofy, weird, and strange processes that a given optimizer uses to
find its idea of optimal, you might be better served to think about the
way queries SHOULD be run.
My argument is along the lines that each optimizer, for each version
of SQL, for each data engine, for different settings, will behave
differently.
The best way to pull the data is however probably not that subjective,
and a human's better understanding of how the data REALLY works can
allow opportunties to drastically speed things up.
get a decent sized set of tables (terrabytes, or even gigabytes on old
hardware), and you will KNOW when you get it right.
regards,
doug|||I appreciate everybody. I enjoy discussing experiences with regard of query
optimization, but I would be most grateful if someone could write something
in answer to main question!!
"Doug" <drmiller100@.hotmail.com> wrote in message
news:1144808205.118085.272490@.g10g2000cwb.googlegroups.com...
> Ok, fair enough.
> When I write code, i write generic code without hints.
> Almost always that works. Once in a while you run into an
> "opportunity" where something isn't running as fast as someone would
> like.
> in those cases, i'm not afraid to go try to figure out how the data
> should be pulled, using either existing indexes or new and needed
> indexes.
> in those cases, I'm not afraid to force the engine to do it my way.
> Documentation obviously occurs.
> If you "hint", sometimes something will happen and one day the engine
> will choose a poor path based on statistics, the moon, or whatever.
> I expressed myself poorly in my original post. My intent was something
> along the lines of suggesting to Leila that rather then understanding
> the goofy, weird, and strange processes that a given optimizer uses to
> find its idea of optimal, you might be better served to think about the
> way queries SHOULD be run.
> My argument is along the lines that each optimizer, for each version
> of SQL, for each data engine, for different settings, will behave
> differently.
> The best way to pull the data is however probably not that subjective,
> and a human's better understanding of how the data REALLY works can
> allow opportunties to drastically speed things up.
> get a decent sized set of tables (terrabytes, or even gigabytes on old
> hardware), and you will KNOW when you get it right.
> regards,
> doug
>

Query Optimizer

Hi,
I'm very interested in query optimization and I really enjoy this job. I
have read everything in BOL and also books like "Inside SQL Server 2000" and
"Guru's Guide to SQL Server Architecture and Internals" and "Microsoft SQL
Server 2000 Performance Optimization and Tuning Handbook".
But in new groups I read some information from MVPs that is not included in
any of mentioned resources. I'd like to know if there are other references
( like web sites, books ,...) that can give more information about this
Artificial Intelligence (Query Optimizer!).
Thanks in advance,
LeilaLeila,
SQL Server Performance Tuning Tips
http://www.sql-server-performance.c...performance.asp
Lot of info in the MVP's web site.
Erland Sommarskog's home page
http://www.sommarskog.se/
http://vyaskn.tripod.com/
http://www.aspfaq.com/
http://www.karaszi.com/SQLServer/default.asp
http://users.drew.edu/skass/sql/
AMB
"Leila" wrote:

> Hi,
> I'm very interested in query optimization and I really enjoy this job. I
> have read everything in BOL and also books like "Inside SQL Server 2000" a
nd
> "Guru's Guide to SQL Server Architecture and Internals" and "Microsoft SQL
> Server 2000 Performance Optimization and Tuning Handbook".
> But in new groups I read some information from MVPs that is not included i
n
> any of mentioned resources. I'd like to know if there are other references
> ( like web sites, books ,...) that can give more information about this
> Artificial Intelligence (Query Optimizer!).
> Thanks in advance,
> Leila
>
>|||hmmm. the optimizer is pretty smart, but you are smarter.
rather then learning how the optimizer works, i spent time figuring out
how I would like my queries processed, and how to encourage (FORCE!)
the optimizer to do it the fastest way.|||Yes, people are smarter than computers, but computers are usually faster.
Why is it that so few people can actually beat a computer at chess? The
computer can evaluate millions of possible moves in less time than a human
can physically move a pawn. So the SQL Server optimizer can evaluate 1000s
of different alternatives to find ones that you might never have thought of.
Yes, there are times where you can come up with a better plan for one
particular query at one particular time. However, once you decide to FORCE
the optimizer to do something, it becomes static. As your data changes, your
forced choice may no longer be the best, and may actually hurt performance.
Also, as the optimizer is upgraded to add new techniques, you'll never have
a chance to know if those new choices can make your queries run even faster.
I think better than forcing the optimizer, is learning how to encourage the
optimizer to do the right thing without having to use force. Having the
right indexes in place, writing your queries in the best way and keeping
your statistics up to date can go a long way. Yes, there are times you have
to use hints to force the optimizer's behavior; that's why there are hints
in the product. But they should be a last resort, not a first.
HTH
Kalen Delaney, SQL Server MVP
www.solidqualitylearning.com
"Doug" <drmiller100@.hotmail.com> wrote in message
news:1144690845.075086.204670@.t31g2000cwb.googlegroups.com...
> hmmm. the optimizer is pretty smart, but you are smarter.
> rather then learning how the optimizer works, i spent time figuring out
> how I would like my queries processed, and how to encourage (FORCE!)
> the optimizer to do it the fastest way.
>|||Yes, I'm fan of not forcing the optimizer. I remember that I backed up the
database of a customer to bring it to the office for some query
optimizations. In a particulare SP I realized that merge join could be much
faster but optimizer did not chooose it at all. I forced merge join. It
became incredibly faster, When I took my changes to the customer's server,
the merge join was very slow. When I removed the merge join hint, it worked
better! The only difference between my computer and customer's, was the
hardware!
"Kalen Delaney" <replies@.public_newsgroups.com> wrote in message
news:OddYysMXGHA.4704@.TK2MSFTNGP02.phx.gbl...
> Yes, people are smarter than computers, but computers are usually faster.
> Why is it that so few people can actually beat a computer at chess? The
> computer can evaluate millions of possible moves in less time than a human
> can physically move a pawn. So the SQL Server optimizer can evaluate 1000s
> of different alternatives to find ones that you might never have thought
> of.
> Yes, there are times where you can come up with a better plan for one
> particular query at one particular time. However, once you decide to FORCE
> the optimizer to do something, it becomes static. As your data changes,
> your forced choice may no longer be the best, and may actually hurt
> performance.
> Also, as the optimizer is upgraded to add new techniques, you'll never
> have a chance to know if those new choices can make your queries run even
> faster.
> I think better than forcing the optimizer, is learning how to encourage
> the optimizer to do the right thing without having to use force. Having
> the right indexes in place, writing your queries in the best way and
> keeping your statistics up to date can go a long way. Yes, there are times
> you have to use hints to force the optimizer's behavior; that's why there
> are hints in the product. But they should be a last resort, not a first.
> --
> HTH
> Kalen Delaney, SQL Server MVP
> www.solidqualitylearning.com
>
> "Doug" <drmiller100@.hotmail.com> wrote in message
> news:1144690845.075086.204670@.t31g2000cwb.googlegroups.com...
>|||Even if the hardware is identical, there is as big difference between
your own test box and a production box dealing with dozens of
concurrent requests. The optimizer has no way of knowing what the
occurrence might be, but still it assumes some degree of concurrency
when it chooses the plan. For instance, a merge join might require a
sort - and that could be both CPU and memory hungry, causing
bottlenecks and contention for resources. So on an identical but busy
production server a merge join might run way slower.
Even if a hint seems to be useful at the time you make a hot fix to
your urgent problem at hand, it may start causing harm when the
situation has changed, such as:
- a new constraint
- a new index
- changed statistics
Do not use optimizer hints just because you can - they should be your
last resort. Use proper indexes, index covering and (sometimes only)
indexed views, and you will almost eliminate the need for hints
whatsoever.|||"Leila" <Leilas@.hotpop.com> wrote in message
news:uwlTOANXGHA.4924@.TK2MSFTNGP05.phx.gbl...
> Yes, I'm fan of not forcing the optimizer. I remember that I backed up the
> database of a customer to bring it to the office for some query
> optimizations. In a particulare SP I realized that merge join could be
much
> faster but optimizer did not chooose it at all. I forced merge join. It
> became incredibly faster, When I took my changes to the customer's server,
> the merge join was very slow. When I removed the merge join hint, it
worked
> better! The only difference between my computer and customer's, was the
> hardware!
I haved ONE experience where forcing a hint on a query was the "right"
thing. And that was simply due to a "bug" between SP3 and SP4 in SQL 2K.
As I couldn't upgrade to SP4 right away and w/o the hint, the query was
doing a table scan of a 45 gig table, I think it was the right call. Once
we moved to SP4 and confirmed that on that edition of SQL (Enterprise with
clustering vs. standard) the hint wasn't needed, it was removed. For the
reason you gave above, so that as conditions change, the optimizer can
adatp.
And yes, I agree with Kalen (big surprise there :-) that "encouraging" the
optimizeris generally the right thing.

>
> "Kalen Delaney" <replies@.public_newsgroups.com> wrote in message
> news:OddYysMXGHA.4704@.TK2MSFTNGP02.phx.gbl...
faster.[vbcol=seagreen]
human[vbcol=seagreen]
1000s[vbcol=seagreen]
FORCE[vbcol=seagreen]
even[vbcol=seagreen]
times[vbcol=seagreen]
there[vbcol=seagreen]
>|||Ok, fair enough.
When I write code, i write generic code without hints.
Almost always that works. Once in a while you run into an
"opportunity" where something isn't running as fast as someone would
like.
in those cases, i'm not afraid to go try to figure out how the data
should be pulled, using either existing indexes or new and needed
indexes.
in those cases, I'm not afraid to force the engine to do it my way.
Documentation obviously occurs.
If you "hint", sometimes something will happen and one day the engine
will choose a poor path based on statistics, the moon, or whatever.
I expressed myself poorly in my original post. My intent was something
along the lines of suggesting to Leila that rather then understanding
the goofy, weird, and strange processes that a given optimizer uses to
find its idea of optimal, you might be better served to think about the
way queries SHOULD be run.
My argument is along the lines that each optimizer, for each version
of SQL, for each data engine, for different settings, will behave
differently.
The best way to pull the data is however probably not that subjective,
and a human's better understanding of how the data REALLY works can
allow opportunties to drastically speed things up.
get a decent sized set of tables (terrabytes, or even gigabytes on old
hardware), and you will KNOW when you get it right.
regards,
doug|||I appreciate everybody. I enjoy discussing experiences with regard of query
optimization, but I would be most grateful if someone could write something
in answer to main question!!
"Doug" <drmiller100@.hotmail.com> wrote in message
news:1144808205.118085.272490@.g10g2000cwb.googlegroups.com...
> Ok, fair enough.
> When I write code, i write generic code without hints.
> Almost always that works. Once in a while you run into an
> "opportunity" where something isn't running as fast as someone would
> like.
> in those cases, i'm not afraid to go try to figure out how the data
> should be pulled, using either existing indexes or new and needed
> indexes.
> in those cases, I'm not afraid to force the engine to do it my way.
> Documentation obviously occurs.
> If you "hint", sometimes something will happen and one day the engine
> will choose a poor path based on statistics, the moon, or whatever.
> I expressed myself poorly in my original post. My intent was something
> along the lines of suggesting to Leila that rather then understanding
> the goofy, weird, and strange processes that a given optimizer uses to
> find its idea of optimal, you might be better served to think about the
> way queries SHOULD be run.
> My argument is along the lines that each optimizer, for each version
> of SQL, for each data engine, for different settings, will behave
> differently.
> The best way to pull the data is however probably not that subjective,
> and a human's better understanding of how the data REALLY works can
> allow opportunties to drastically speed things up.
> get a decent sized set of tables (terrabytes, or even gigabytes on old
> hardware), and you will KNOW when you get it right.
> regards,
> doug
>

Query Optimizer

Query optimization question...

I'm trying to optimize some queries on an existing system, and I'm
noticing some odd behavior. I'm performing a join between several
tables, the final table being joined by the optimizer has a clustered
index on the field that it is using to join to the rest of the query,
but for some reason SQL Server doesn't seem to actually use this index
(it's doing an index scan instead of an index seek). Is there some
reason why SQL Server would not use a valid Clustered Index? I've
dropped and readded the index, but that doesn't seem to help. I don't
know if it would be relevant, but the tables I'm working on are fairly
fat (2 to 7K bytes/row).

This is happening for several tables. I've been able to get around it
for some of the tables by creating a non-clustered index on all the
fields that are being queried so that the leaf pages don't need to be
loaded, but this isn't a valid solution for all of the tables I'm
struggling with.

Any ideas? (and no, they aren't willing to redesign any of the
tables)."Mathew Relick" <ticars@.yahoo.com> wrote in message
news:dd84d8a7.0311181017.39d1c69@.posting.google.co m...
> I'm trying to optimize some queries on an existing system, and I'm
> noticing some odd behavior. I'm performing a join between several
> tables, the final table being joined by the optimizer has a clustered
> index on the field that it is using to join to the rest of the query,
> but for some reason SQL Server doesn't seem to actually use this index
> (it's doing an index scan instead of an index seek). Is there some
> reason why SQL Server would not use a valid Clustered Index? I've
> dropped and readded the index, but that doesn't seem to help. I don't

This can happen if your table statistics are out of date. If SQL Server
determines that the join will require more than x % of the table to be
retrieved, then doing a clustered index scan may be faster than doing a
clustered index seek (faster because an index scan can read the rows in a
page sequentially, whereas an index seek has to traverse the B-tree
structure. Sequential read is faster because you cut down on seek time, and
also because you may be able to read more than one page in a single I/O
operation, since the data is sequential.) Try updating statistics and see
if it helps:

UPDATE STATISTICS <table_name> WITH FULLSCAN
GO

> know if it would be relevant, but the tables I'm working on are fairly
> fat (2 to 7K bytes/row).
> This is happening for several tables. I've been able to get around it
> for some of the tables by creating a non-clustered index on all the
> fields that are being queried so that the leaf pages don't need to be
> loaded, but this isn't a valid solution for all of the tables I'm
> struggling with.

This is because when you create a new index, new statistics are generated
for that index, so you have the most up-to-date statistics with your new
index. Same if you rebuild your existing indexes.

HTH,
Dave

>
> Any ideas? (and no, they aren't willing to redesign any of the
> tables).|||"Dave Hau" <nospam_dave_nospam_123@.nospam_netscape_nospam.net_ nospam> wrote
in message news:gCvub.33788$yj4.5497@.newssvr27.news.prodigy.c om...
> "Mathew Relick" <ticars@.yahoo.com> wrote in message
> news:dd84d8a7.0311181017.39d1c69@.posting.google.co m...
> > I'm trying to optimize some queries on an existing system, and I'm
> > noticing some odd behavior. I'm performing a join between several
> > tables, the final table being joined by the optimizer has a clustered
> > index on the field that it is using to join to the rest of the query,
> > but for some reason SQL Server doesn't seem to actually use this index
> > (it's doing an index scan instead of an index seek). Is there some
> > reason why SQL Server would not use a valid Clustered Index? I've
> > dropped and readded the index, but that doesn't seem to help. I don't
> This can happen if your table statistics are out of date. If SQL Server
> determines that the join will require more than x % of the table to be
> retrieved, then doing a clustered index scan may be faster than doing a
> clustered index seek (faster because an index scan can read the rows in a
> page sequentially, whereas an index seek has to traverse the B-tree
> structure. Sequential read is faster because you cut down on seek time,
and
> also because you may be able to read more than one page in a single I/O
> operation, since the data is sequential.) Try updating statistics and see
> if it helps:
> UPDATE STATISTICS <table_name> WITH FULLSCAN
> GO
> > know if it would be relevant, but the tables I'm working on are fairly
> > fat (2 to 7K bytes/row).
> > This is happening for several tables. I've been able to get around it
> > for some of the tables by creating a non-clustered index on all the
> > fields that are being queried so that the leaf pages don't need to be
> > loaded, but this isn't a valid solution for all of the tables I'm
> > struggling with.
> This is because when you create a new index, new statistics are generated
> for that index, so you have the most up-to-date statistics with your new
> index. Same if you rebuild your existing indexes.

I'm going to chime in because I think Dave has some good points here.
However, as I understand it, the original poster did rebuild the clustered
index, so there may be more to this problem than meets the eye.

One other thing that can happen is that the optimizer decides it's faster to
do a scan instead of a seek. This is particularly true if the result it
expects to return is a large percentage of the index. (i.e. if you have 100
rows and will return 80).

I'll be honest, I'm not sure exactly how this applies with a clustered
index.

> HTH,
> Dave
> > Any ideas? (and no, they aren't willing to redesign any of the
> > tables).

Query Optimization NOT IN vs LEFT JOIN

I've read in many places that using "NOT IN" results in worse
performance. I've got two queries producing the same result:
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
LEFT JOIN Repository R
ON WS.LocationID = R.LocationID
WHERE R.LocationID IS NULL
The second one actually takes just a bit slower than the first. Under
which circumstances should I expect a performance gain if any?
I appreciate your time reading my post,
Thank you,
Igor
*** Sent via Developersdex http://www.examnotes.net ***Have you tired this
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
WHERE NOT EXISTS (SELECT LocationID FROM Repository Where LocationID =
WS.LocationID )
Also WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
will return nothing if you have NULL values
Denis the SQL Menace
http://sqlservercode.blogspot.com/|||mEmENT0m0RI,
Compare also the version using "NOT EXISTS".
SELECT WS.*
FROM
(
select a.*
from WHSLocations as a
WHERE
NOT EXISTS (SELECT * FROM Repository as r where r.LocationID = a.LocationID)
) as WS
INNER JOIN
WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
go
Remember to clean the proc cache and the data buffer before executing each
statement (do not do it on production).
dbcc freeproccache
dbcc dropcleanbuffers
AMB
"mEmENT0m0RI" wrote:

> I've read in many places that using "NOT IN" results in worse
> performance. I've got two queries producing the same result:
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> LEFT JOIN Repository R
> ON WS.LocationID = R.LocationID
> WHERE R.LocationID IS NULL
>
> The second one actually takes just a bit slower than the first. Under
> which circumstances should I expect a performance gain if any?
> I appreciate your time reading my post,
> Thank you,
> Igor
>
> *** Sent via Developersdex http://www.examnotes.net ***
>|||"Alejandro Mesa" <AlejandroMesa@.discussions.microsoft.com> wrote in message
news:7C075150-2839-4CC2-A023-8C3DB5AEEA00@.microsoft.com...
> mEmENT0m0RI,
> Compare also the version using "NOT EXISTS".
> SELECT WS.*
> FROM
> (
> select a.*
> from WHSLocations as a
> WHERE
> NOT EXISTS (SELECT * FROM Repository as r where r.LocationID =
> a.LocationID)
> ) as WS
> INNER JOIN
> WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> go
> Remember to clean the proc cache and the data buffer before executing each
> statement (do not do it on production).
> dbcc freeproccache
> dbcc dropcleanbuffers
>
You should not use DBCC DROPCLEANBUFFERS to test the relative performance of
queries. When you do, you skew the results in favor of whichever query
accesses the fewest number of distinct pages. For instance a nested loops
join (or bookmark lookup) might read the same page over and over, whereas a
clustered index scan (or table scan) would read each page once. With the
page cache flushed the loop join might look faster even though it generates
more reads, and is therefore actually worse.
Instead just set
set statistics io on
and compare the total number of reads.
David|||I was comparing the performance by putting all three queries into same
batch and checking "Query cost (realtive to the batch)" values from the
execution plan window. I'm not sure though if that would be a fair way
to compare...
So, now I have three queries:
dbcc freeproccache
dbcc dropcleanbuffers
SET STATISTICS IO ON
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
WHERE NOT EXISTS (SELECT LocationID FROM Repository WHERE LocationID =
WS.LocationID)
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
LEFT JOIN Repository R
ON WS.LocationID = R.LocationID
WHERE R.LocationID IS NULL
Query Costs Relative to the batch:
33.33
33.33
33.34
IO STATISTICS:
(750 row(s) affected)
Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
read-ahead reads 848.
Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
read-ahead reads 1283.
Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
reads 0, read-ahead reads 0.
(750 row(s) affected)
Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
read-ahead reads 0.
Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
read-ahead reads 0.
Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
reads 0, read-ahead reads 0.
(750 row(s) affected)
Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
read-ahead reads 0.
Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
read-ahead reads 0.
Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
reads 0, read-ahead reads 0.
*** Sent via Developersdex http://www.examnotes.net ***|||See
http://groups.google.nl/group/micro...bb?dmode=source
Gert-Jan
mEmENT0m0RI wrote:
> I've read in many places that using "NOT IN" results in worse
> performance. I've got two queries producing the same result:
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> LEFT JOIN Repository R
> ON WS.LocationID = R.LocationID
> WHERE R.LocationID IS NULL
> The second one actually takes just a bit slower than the first. Under
> which circumstances should I expect a performance gain if any?
> I appreciate your time reading my post,
> Thank you,
> Igor
> *** Sent via Developersdex http://www.examnotes.net ***|||>I was comparing the performance by putting all three queries into same
> batch and checking "Query cost (realtive to the batch)" values from the
> execution plan window.
Execution plan is far from the whole story.|||David Browne (davidbaxterbrowne no potted meat@.hotmail.com) writes:
> You should not use DBCC DROPCLEANBUFFERS to test the relative
> performance of queries.
And most of all, you should not run that command on a production
machine!

> When you do, you skew the results in favor of whichever query accesses
> the fewest number of distinct pages. For instance a nested loops join
> (or bookmark lookup) might read the same page over and over, whereas a
> clustered index scan (or table scan) would read each page once. With
> the page cache flushed the loop join might look faster even though it
> generates more reads, and is therefore actually worse.
> Instead just set
> set statistics io on
> and compare the total number of reads.
Actually, neither that is accurate, as this does not account for building
hash tables, sorting etc. The best is wallclock time with getdate().
Obviously this requires a server that is not occupied with other things.
As for caching, it is important to test under the same presumptions. That
is, either flush the cache each time, or make sure all data is in cache
before you start testing.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||mEmENT0m0RI wrote:
> I've read in many places that using "NOT IN" results in worse
> performance. I've got two queries producing the same result:
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> LEFT JOIN Repository R
> ON WS.LocationID = R.LocationID
> WHERE R.LocationID IS NULL
>
> The second one actually takes just a bit slower than the first. Under
> which circumstances should I expect a performance gain if any?
I think that since the subquery in your first query is not co-related,
it only needs to be executed once, whereas in your second query the
Repository table needs to be accessed for every row in the WHSLocations
table.
I suspect that if the subquery was co-related (e.g. NOT IN (SELECT
LocationID FROM Repository where Repository.LocationGroup =
WS.LocationGroup) ) then the LEFT OUTER JOIN would be quicker.
Kris|||Seems you got the same plan for all three queries.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"mEmENT0m0RI" <nospam@.devdex.com> wrote in message news:epbpuVWbGHA.628@.TK2MSFTNGP04.phx.gb
l...
>I was comparing the performance by putting all three queries into same
> batch and checking "Query cost (realtive to the batch)" values from the
> execution plan window. I'm not sure though if that would be a fair way
> to compare...
> So, now I have three queries:
> dbcc freeproccache
> dbcc dropcleanbuffers
> SET STATISTICS IO ON
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> WHERE NOT EXISTS (SELECT LocationID FROM Repository WHERE LocationID =
> WS.LocationID)
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> LEFT JOIN Repository R
> ON WS.LocationID = R.LocationID
> WHERE R.LocationID IS NULL
>
> Query Costs Relative to the batch:
> 33.33
> 33.33
> 33.34
>
> IO STATISTICS:
> (750 row(s) affected)
> Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
> read-ahead reads 848.
> Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
> read-ahead reads 1283.
> Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
> reads 0, read-ahead reads 0.
> (750 row(s) affected)
> Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
> read-ahead reads 0.
> Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
> read-ahead reads 0.
> Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
> reads 0, read-ahead reads 0.
> (750 row(s) affected)
> Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
> read-ahead reads 0.
> Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
> read-ahead reads 0.
> Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
> reads 0, read-ahead reads 0.
>
> *** Sent via Developersdex http://www.examnotes.net ***

Query optimization in query that uses CTE.

Hi Guys,

I want to optimize one query which uses the CTE (server 2005 feature).
I am sending you the abstract query.

currently this query take 4-5 seconds to execute.
but i want to reduce it to 1 sec.
Plz, do help me, if someone know how to do it.

--
DECLARE @.X INT
DECLARE @.LowerGradeRange INT
DECLARE @.UpperGradeRange INT
DECLARE @.Keyword NVARCHAR(500)

SET @.X = 11500001
SET @.LowerGradeRange = NULL
SET @.UpperGradeRange = NULL
SET @.Keyword = ''

IF ISNULL(@.Keyword,'')=''
SET @.Keyword='';

WITH SelPath (path_id,x,y,z,r)
AS
(

-- Anchor member definition (returns base result set)
SELECT path_id,x,y,z,r

FROM tab1 a
INNER JOIN tab2 b ON a.x= b.x

WHERE
a.x = @.X
-- AND (a.parent IS NULL OR a.parent = 0)
AND
CASE
WHEN ISNULL(@.LowerGradeRange,'')='' THEN 1
WHEN ISNULL(@.LowerGradeRange,'')<>'' AND b.lgr >= @.LowerGradeRange THEN 1
END=1
AND
CASE
WHEN ISNULL(@.UpperGradeRange,'')='' THEN 1
WHEN ISNULL(@.UpperGradeRange,'')<>'' AND b.ugr <= @.UpperGradeRange THEN 1

END=1
AND
CASE
WHEN @.Keyword <>'' AND b.y LIKE @.Keyword THEN 1
ELSE 1
END =1

UNION ALL

-- Recursive member definition
-- (returns the direct subordinate(s) of the activity in the anchor member result set)

SELECT path_id,x,y,z,r
FROM SelPath b
INNER JOIN tab1 a ON a.parent = b.path_id
INNER JOIN tab2 c ON a.x = c.x
WHERE
CASE
WHEN ISNULL(@.LowerGradeRange,'')='' THEN 1
WHEN ISNULL(@.LowerGradeRange,'')<>'' AND c.lgr >= @.LowerGradeRange THEN 1
END=1
AND
CASE
WHEN ISNULL(@.UpperGradeRange,'')='' THEN 1
WHEN ISNULL(@.UpperGradeRange,'')<>'' AND c.ugr <= @.UpperGradeRange THEN 1

END=1
AND
CASE
WHEN @.Keyword <>'' AND c.y LIKE @.Keyword THEN 1
ELSE 1
END =1

)

-- Statement that executes the CTE
SELECT path_id,x,y,z,r

FROM SelPath a
INNER JOIN pce.qq c ON a.r = c.r

ORDER BY x

Reply soon...
bye
take care

Regards,
-Surendra

Go to the T-SQL forum. You'll get more help there.

-Jamie

Query optimization help, please.

Below are two versions of a SQL query. The first is the original query, which was not properly returning all of the required records. I modified this query to be what you see in the second SQL statement below. The problem I'm having now is the extremely long query time for this to execute (it keeps timing out or will take at least 20-30 seconds) from what originally took only a second.

The problem is obviously in where I added the "AND CONTENTID IN (SELECT DISTINCT ContentID from Contents_view WHERE ApplicationID = @.ApplicationID)" to the first portion of the WHERE statements (UserID = @.UserID); as it now has to search the associated view twice for each element instead of once.

Thanks in advance for any advice you can provide.

Original Query (did not pull back the proper results, was missing the user specific records):


CREATE PROCEDURE GetContents
(
@.UserID int,
@.Perms nvarchar(1000) = null,
@.UserGroup nvarchar(100) = null,
@.ApplicationID int
)
AS
SELECT DISTINCT * FROM Contents_view
WHERE (UserID = @.UserID OR (CHARINDEX('''' + PermissionName + '''',@.Perms)>0 ))
AND ContentID IN (SELECT DISTINCT ContentID from Contents_view
 WHERE PermissionName = @.UserGroup
 AND ApplicationID = @.ApplicationID)
ORDER BY Rank ASC

Modified Query (properly pulls back all results and user specific records):


CREATE PROCEDURE GetContents
(
@.UserID int,
@.Perms nvarchar(1000) = null,
@.UserGroup nvarchar(100) = null,
@.ApplicationID int
)
AS
SELECT DISTINCT * FROM Contents_view
WHERE ((UserID = @.UserID AND CONTENTID IN (SELECT DISTINCT ContentID from Contents_view
WHERE ApplicationID = @.ApplicationID))
OR ((CHARINDEX('''' + PermissionName + '''',@.Brands)>0 ))
AND ContentID IN (SELECT DISTINCT ContentID from Contents_view
 WHERE PermissionName = @.UserGroup
 AND ApplicationID = @.ApplicationID)
 ORDER BY Rank ASC

Not really having any idea exactly what you are doing, a guess is that the use of CHARINDEX() in your WHERE clause of your subquery is what is slowing you down. THis will force a table scan, since no index will help. Where is @.Brands coming from?

You should reqork the database if possible so that there is a direct way to determine what you are trying to determine. I presume you have a field with permission names comma delimited or similar?|||The @.Brands should actually be @.Perms, like in the first query.

What is happening, is there is a view that combines the related data of two seperate tables into one Contents_view that contains each content record in the database with all of its pertinent information.

The query itself is pulling back the content for a particilar page based on two things.
1) If the supplied userID and applicationID are present in a single record
2) If the usergroup and associated permissions are present in a single record

CHARINDEX() is searching a comma delimited permissions string that is passed to the SPROC.|||ok. Split up the string passed in, create either a temp table (SQL Server 7) or a user defined function that returns a table object (SQL 2000). Then do a join with the temp table or table object. Then you will not need CHARINDEX() which is being evaluated on each and every record in the table.|||Well ... How many rows are returned would have been a very good indicator ...

1. Since you are using an IN clause you DONOT have to use the DISTINCT in the inner co-related sub query ... These are not required.

Change the above query to:

SELECT DISTINCT * FROM Contents_view
WHERE ((UserID = @.UserID AND CONTENTID IN (SELECT ContentID from Contents_view
WHERE (ApplicationID = @.ApplicationID) or (PermissionName = @.UserGroup and ApplicationID = @.ApplicationID))
OR ((CHARINDEX('''' + PermissionName + '''',@.Brands)>0 ))
ORDER BY Rank ASC

PS: Not tested. See if this gives you the desired results and performance ?

Note that use of string operations also consume valuable CPU cycles and hence hamper performance ...

Query Optimization Assistance w/ Joins

I have a couple of tables that look like this (not excactly but close
enough):

[Contact]
id int
fname varchar(50)
lname varchar(50)

[ContactPhoneNumber]
id int
number varchar(15)
ext varchar(6)
contact_id int
priority int (indicates primary, secondary... numbers)
type int (indicates type of number: fax, cell, land line)

I'm looking for a more optimized method of displaying this information
in this format:

fname, primary business phone

Using a derived column like this works, but seems to be slow with many
records, despite tuning indexes:

SELECT c.fname AS [First Name],
( SELECT TOP 1
number
FROM ContactPhoneNumber cpn
WHERE cpn.type = 1
AND cpn.contact_id = c.id
ORDER BY cpn.priority) AS Number
FROM Contact c

I can get the same results using a join, and it's a lot faster. But I'm
not sure how to select only the primary phone number this way...
basically the first phone number whose priority is either NULL or 1.

Any suggestions?

*** Sent via Devdex http://www.devdex.com ***
Don't just participate in USENET...get rewarded for it!On 12 Jul 2004 16:05:51 GMT, Kenneth Courville wrote:

[snip]

> I can get the same results using a join, and it's a lot faster. But I'm
> not sure how to select only the primary phone number this way...
> basically the first phone number whose priority is either NULL or 1.

Select TOP 1 c.fname AS [First Name], cpn.Number as Number
FROM Contact c
INNER JOIN ContactPhoneNumber cpn
ON cpn.contact_id = c.id
WHERE cpn.type=1 AND (cpn.priority = 1 OR cpn.priority IS NULL)
ORDER BY cpn.ID

Since you didn't specify what "first" means if there are multiple phone
numbers matching (priority=1 or priority is null), I took the hint that
cpn.ID might be an autonumber, meaning that lower numbers mean entered into
the table earlier, so the lowest ID would be the first.|||Yes... "first" means if there are multiple phone
numbers matching (priority=1 or priority is null)... then just return
the top one in the list... no matter what the order is... the ORDER BY
cpn.id isn't necessary.

I guess the other thing I should've mentioned is that I'm displaying
contacts that are assigned to a particular client location.

The problem with your modifications is it would only return the first
contact... whereas the output can be:

First Name Number
John 555-1234
Sally 555-7891
Jill 555-9713

*** Sent via Devdex http://www.devdex.com ***
Don't just participate in USENET...get rewarded for it!|||Kenneth Courville (krcourville@.-nospam-msn.com) writes:
> Using a derived column like this works, but seems to be slow with many
> records, despite tuning indexes:
> SELECT c.fname AS [First Name],
> ( SELECT TOP 1
> number
> FROM ContactPhoneNumber cpn
> WHERE cpn.type = 1
> AND cpn.contact_id = c.id
> ORDER BY cpn.priority) AS Number
> FROM Contact c
> I can get the same results using a join, and it's a lot faster. But I'm
> not sure how to select only the primary phone number this way...
> basically the first phone number whose priority is either NULL or 1.

SELECT c.fname AS "First Name", cpn.Number
FROM Contact c
JOIN (SELECT contact_id, MIN(Number)
FROM ContactPhoneNumber a
JOIN (SELECT contact_id, priority = MIN(priority)
FROM ContactPhoneNumber
WHERE type = 1
GROUP BY contact_id) AS b
ON a.contact_id = b.contact_id
AND a.priority = b.priority
-- (OR a.priority IS NULL AND b.priority IS NULL)
WHERE a.type = 1
GROUP BY contact_id) AS cpn
ON c.id = cpn.contact_id

I think this will cut it, but with CREATE TABLE statements and INSERT
statements with sample data, it is difficult to test. (Hint, hint!)

If priority can be NULL, you should uncomment the commented line.
I have assumed that two numbers can have equal priority.

Here I am using a derived table, actually even two. I have found that in
many cases this gives better performance than correlated subqueries in the
SELECT list.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||On 12 Jul 2004 19:15:51 GMT, Kenneth Courville wrote:

> Yes... "first" means if there are multiple phone
> numbers matching (priority=1 or priority is null)... then just return
> the top one in the list... no matter what the order is... the ORDER BY
> cpn.id isn't necessary.
> I guess the other thing I should've mentioned is that I'm displaying
> contacts that are assigned to a particular client location.
> The problem with your modifications is it would only return the first
> contact... whereas the output can be:
> First Name Number
> John 555-1234
> Sally 555-7891
> Jill 555-9713
Ah. My brain appears to have conflated your question with someone else's.
Sorry.

How about this:

Select c.fname AS [First Name], Min(cpn.Number) as Number
FROM Contact c
INNER JOIN ContactPhoneNumber cpn
ON cpn.contact_id = c.id
WHERE cpn.type=1 AND (cpn.priority = 1 OR cpn.priority IS NULL)
GROUP BY c.ID, c.Fname

I know you said you explicitly want the "first" one, but since the order
isn't well defined, SQL server is free to return them in any random order
it wishes ... which means that the "first" one won't necessarily be the
same twice in a row. So maybe you can get by with Min?|||duh.. This should help.

Hadn't thought of a derived table... still kind of new to them.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Query Optimization - Suggestion needed

Hi,

I need an advice from some SQL Guru. I have a Cursor (SQL Server 2005) which calculates the monthly employee wages. The outer query selects all the employees and the inner query computes the monthly salary day by day in a loop. For each day in a month, i insert a record in a temporary table. So for 2 employees, there can be 2 x 31 records (31 are the maximum days in a month). Later i rotate the temporary table to convert the 31 rows for each employee into one record so at the end there are only 2 employee records but with 31 columns. The problem is that the rotation is taking a huge performance hit and almost takes 5 minutes to convert rows to columns. I have even tried the PIVOT statement but that too almost takes the same time.

Could anyone suggest me how can i nail down a performance efficient query for a monthly report. As far as i know, even on Mainframes, monthly/annual reports takes a long time to compile. So if any GURU has a better idea, i would really appreciate it.

I am only asking about how would any Guru/Expert handle monthly/annual reports with huge amount of data in the tables efficiently.

Regards.......

I suggest that you also post on http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=85&SiteID=1, the MSDN forum on TSQL.|||

Thanks for the link, i think its a better place to post my question.

Regards......

|||

Hi,

Cursor has a very poor performance and you should avoid it.

Visit http://www.sql-server-performance.com/default.asp. In lower portion you will get many tips to improve your query performance.

I will suggest you to post this question in sql-server-performance.com forum.

Query Optimization - Please Help

Hi,

Can anyone help me optimize the SELECT statement in the 3rd step? I am actually writing a monthly report. So for each employee (500 employees) in a row, his attendance totals for all days in a month are displayed. The problem is that in the 3rd step, there are actually 31 SELECT statements which are assigned to 31 variables. After I assign these variable, I insert them in a Table (4th step) and display it. The troublesome part is the 3rd step. As there are 500 employees, then 500x31 times the variables are assigned and inserted in the table. This is taking more than 4 minutes which I know is not required :). Can anyone help me optimize the SELECT statements I have in the 3rd step or give a better suggestion.

DECLARE @.EmpID, @.DateFrom, @.Total1 ... // Declaring different variables

SELECT @.DateFrom = // Set to start of any month e.g. 2007-06-01 ..... 1st

Loop (condition -- Get all employees, working fine)

BEGIN

SELECT @.EmpID = // Get EmployeeID ..... 2nd

SELECT @.Total1 = SUM (Abences) ..... 3rd

FROM Attendance

WHERE employee_id_fk = @.EmpID (from 2nd step)

AND Date_Absent = DATEADD ("day", 0, Convert (varchar, @.DateFrom)) (from 1st step)

SELECT @.Total2 ..................... same as above

SELECT @.Total3 ..................... same as above

INSERT IN @.TABLE (@.EmpID, @.Total1, ..... @.Total31) ..... 4th

Iterate (condition) to next employee ..... 5th

END

It's only the loop which consumes the 4 minutes. If I can somehow optimize this part, I will be most satisfied. Thanks for anyone helping me...

What does the Attendance table look like? I have some ideas for you but I need to know how the attendance is stored. Can you give us the schema of that table please? Thanks!

|||

See this sample example-->

========================================================================================

Declare @.fromDate datetime,
@.toDate datetime

Set @.fromDate = '1-Aug-2007'
Set @.toDate = '4-Aug-2007'

Select distinct a.dtAttendate,

'STATUS'=(select
(case
when lv_status='EL' then 'EL'
when lv_status='CL' then 'CL'
when lv_status='SL' then 'SL'
when lv_status='ML' then 'ML'
when (wk_status='S' AND log_status='P') then 'SB'
when (wk_status='S' AND log_status='A') then 'Absent (SB)'
when wk_status='N' then 'Weekend'
when bIsHoliday=1 then 'Holiday'
when log_status='A' then 'Absent'
--when mnyLateHrs>0 then 'Late'
when log_status='P' then 'Present'
end)
from tblLogAbsent
where dtAttendate = a.dtAttendate and intEmpCode=a.intEmpCode),

'TIMEIN'=(select dtEmpTimeIn from tblLogStatus where dtAttendate=a.dtAttendate and intEmpCode=a.intEmpCode),

'TIMEOUT'=(select dtEmpTimeOUT from tblLogStatus where dtAttendate=a.dtAttendate and intEmpCode=a.intEmpCode)

from tblLogStatus a
where a. dtAttendate between @.fromDate and @.toDate

group by a.dtAttendate, intEmpCode

========================================================================================

here i use two different table "tbllogabsent" for his attendance status and "tbllogstatus" as a for additional information... this is not becoming problem... if u cant understand any line of code ask me again...hopefully this will be helpfull to u...

|||

Hmmm... I'm not sure how to use this information. In your initial code, you do a SELECT from a table called Attendance, which seems to have columns like Abences, employee_id_fk and Date_Absent. Can you give us more details on this table please? Thanks.

|||

Hi johram,

Thanks for replying. Yes you are right. The attendance table for employees has columns like EmpID_fk, Attendance_Date, ... , Attendance_Total. The Attendance_Total column is dependent on our business rules which include reason for signing in/out. e.g. If an employee has signed out for some official task, 1 is added to his Attendance_Total column. If he is going away for a business tour, 2 may be added to his Attendance_Total column. So in one day, an employee can have more than one record. The records for an employee may look like the following:

EmpID Attendance_Date ... Attendance_Total

1001 04/28/2006 1

1001 04/28/2006 2

1001 04/28/2006 1

So on 28th April, the total for Emp (1001) = 4. There are other columns in there but im only concerned with Attendance_Total. I need to display the SUM (Attendance_Total) for each day in a month for each employee.

If I further elaborate my report based on the above example, it may look something like:

EmpID EmpName D1 D2 D3 D4 D5 D6 ......

1001 ABC 1 0 4 4 1 0

I have tried a few techniques (under my experienceJ), but when it comes to computing the sum for each day, it takes almost 4-5 minutes which I am sure nobody wants. Also this report can be accessed anytime within a day and employees keep coming and going for business, so cant store the records and need to compute them everytime the report is accessed.

Thanks to you all for helping me...

|||

Hi patuary,

Thanks for your reply aswell. I have tried your technique and understand it - i bet :) Anyways, after applying your technique, there are the following two problems:

1. All records are being returned as rows e.g. The records for Emp 1001 in a month is not in a single row, rather seperate rows are returned for each day

2. Also, this query only returns data for days on which attendance may be marked. But if there is a weekend or the employee was absent, his attendance record for that day is not computed as there are no records. Whereas in my case, if he was absent or no record found on a given day, his attendance record must be marked as 0.

Thanks again for your time...

|||

Guys, please provide your valueable feedback...

|||

Is it important that you get a result with all the days, even if the sum is zero? Cause that will make it a bit more complicated in the SQL. You can have the SQL report back all days that actually have a total (greater than zero), and then in your GUI you can render the rest of the days as empty. In that case, I think we can work out a solution for you. At least that's what we'll start with ;-) I'll see what I can do!

Also, what's the datatype of you Attendance_Date column?

|||

If you are using SQL 2005, there should be a new statement called PIVOT, although it is nowhere to be found in the T-SQL reference manual on MSDN. Maybe you are luckier than me ;-) Pivot is the term for when you shift the layout of a table so that you look on it from a different perspective. In this case, you want to pivot the table on the date so that each date represents a column rather than a row.

Now, this can be done with a function calledCrosstable, which was developed by the legendary Rob Volk. The source code for this function can be foundhere. Note that you need to change the column "pivot" to "tpivot" or something, since "pivot" is a keyword in SQL 2005.

This is the modified version of Crosstable that will work in SQL 2005:

ALTER PROCEDURE crosstab @.select varchar(8000),@.sumfuncvarchar(100), @.pivotvarchar(100), @.table varchar(100)ASDECLARE @.sqlvarchar(8000), @.delimvarchar(1)SET NOCOUNT ONSET ANSI_WARNINGSOFFEXEC ('SELECT ' + @.pivot +' AS tpivot INTO ##pivot FROM ' + @.table +' WHERE 1=2')EXEC ('INSERT INTO ##pivot SELECT DISTINCT ' + @.pivot +' FROM ' + @.table +' WHERE ' + @.pivot +' Is Not Null')SELECT @.sql='', @.sumfunc=stuff(@.sumfunc,len(@.sumfunc), 1,' END)' )SELECT @.delim=CASE Sign( CharIndex('char', data_type)+CharIndex('date', data_type) )WHEN 0THEN''ELSE''''END FROM tempdb.information_schema.columnsWHERE table_name='##pivot'AND column_name='tpivot'SELECT @.sql=@.sql +'''' +convert(varchar(100), tpivot) +''' = ' + stuff(@.sumfunc,charindex('(', @.sumfunc )+1, 0,' CASE ' + @.pivot +' WHEN ' + @.delim +convert(varchar(100), tpivot) + @.delim +' THEN ' ) +', 'FROM ##pivotDROP TABLE ##pivotSELECT @.sql=left(@.sql,len(@.sql)-1)SELECT @.select=stuff(@.select, charindex(' FROM ', @.select)+1, 0,', ' + @.sql +' ')EXEC (@.select)SET ANSI_WARNINGSON

Now, to demonstrate the power of this function I made a quick sample for you to push you in the right direction:

EXECUTE Crosstab'SELECT EmpId FROM Attendance GROUP BY EmpId','SUM(Attendance_total)','Attendance_date','attendance'

This will give you a matrix with all the employees vertically, and horizontally you will have all unique dates, with the respective attendance total for each employee on that day. As I said earlier, this will not give you all the days of the month, unless there are data for each day. So you might need to do some logic in your GUI to render "empty" days correctly. Good luck!

|||

Hi Johram,

Once again thanks for your time. I really appreciate all your help. Yes your rite, we do need to handle all days in a month. But as you mentioned this can be handled in my logic so im lesser concerned about the days without any data.

Anyways, im aware of the Pivot function. Its basically used to convert rows into columns. Indeed the functionality i have in my stored procedure does the same job. What i do is that i have a temporary table in which i insert 31 rows for each employee. so for 500 employee, i insert 500x31 rows. Later i convert the rows into columns and display it. Although i do not use the Pivot function but i did once give it a try and the Processing Time was similar to what i have in there right now.

Still, im not ready to backout and will definitely give a try to your solution. Let me see what can i get out of it. By the way Johram, if you have dealt with any monthly or annual report in the past, usually how much time does it take to display such a report? Do you think that im being over ambitious in displaying such a report or such reports do take their time...

Once again, thanks alot for every help you have provided...

|||

Sorry, haven't done exactly this kind of report before. But it will depend on the amount of data you are trying to cover. Is it relevant to show ALL employees in a list/report? Maybe you should restrict it to region, or last name or something. Try to do a selection out of the 500 if it possible.

Although I havent been able to compare this crosstable thing with your first query, I still think that it might be faster. Try implement it and see for yourself. Good luck!

|||

Hi Johram

Thanks alot for your time and patience. I really appreciate your efforts and the help you have provided.

keep up the good work...

Query optimization - joining a view and a Table

I am having the following situation - there is a view that aggregates and computes some values and a table that I need the details from so I join them filtering on the primary key of the table. The execution plan shows that the view is executed without any filtering so it returns 140 000 rows which are later filtered by the join operation a hash table match. This hash table match takes 47% of the query cost. I tried selecting the same view but directly giving a where clause without the join – it gave a completely different execution plan. Using the second method is in at least 4 folds faster and is going only through Index Seeks and nested loops.

So I tried modifying the query with third version. It gave almost the same execution plan as the version 1 with the join operation.

It seams that by giving the where clause directly the execution plan chosen by the query optimizer is completely different – it filters the view and the results from it and returns it at the same time, in contrast to the first version where the view is executed and return and later filtered. Is it possible to change the query some how so that it filters the view before been joined to the table.

Any suggestions will be appreciated greatly

Stoil Pankov

"vHCItemLimitUsed" - this is the view

"tHCContractInsured" - this is the table

"ixHCContractInsuredID" - is the primary key of the table

Here is a simple representation of the effect:

Version 1:

select *
from dbo.vHCItemLimitUsed
inner join tHCContractInsured on
vHCItemLimitUsed.ixHCContractInsuredID = tHCContractInsured.ixHCContractInsuredID
where tHCContractInsured.ixHCContractInsuredID in (9012,9013,9014,9015)

Version 2:

select *
from vHCItemLimitUsed
where ixHCContractInsuredID in (9012,9013,9014,9015)

Version 3:

select *

from dbo.vHCItemLimitUsed

where ixHCContractInsuredID in

(select ixHCContractInsuredID

from tHCContractInsured

where ixHCContractInsuredID in (9012,9013,9014,9015))

Are we talking milliseconds, seconds, minutes, or hours? If seconds, then it just might be one of those case where it is too costly to do the whole optimization process and it is just faster to execute the query. What version/edition of SQL Server also?

Another version that might work for you is:

select *
from (select *
from vHCItemLimitUsed
where ixHCContractInsuredID in (9012,9013,9014,9015)) as limitUsed
join tHCContractInsured on
limitUsed.ixHCContractInsuredID = tHCContractInsured.ixHCContractInsuredID

If we are talking about minutes then it might be a bug. Can you post some more information:

The plans of the queries
The structure of the view
The structure of the tables
The amount of data

Query Optimization

IS there any way to rewrite this Query in optimized way?

SELECT dbo.Table1.EmpId E from dbo.Table1
where EmpId in(
SELECT dbo.Table1.EmpId
FROM (SELECT DISTINCT PersonID, MAX(dtmStatusDate) AS dtmStatusDate
FROM dbo.Table1
GROUP BY PersonID) derived_table INNER JOIN
dbo.Table1 ON derived_table.PersonID = dbo.Table1.PersonID AND
derived_table.dtmStatusDate = dbo.Table1.dtmStatusDate))

Thanks...jDon't know abiut being faster but I think this is what oyu are trying to do. (get the empid's with max(dtmStatusDate) from each person.

SELECT t1.EmpId
from dbo.Table1 t1
where t1.dtmStatusDate =
(select max(dtmStatusDate) from dbo.Table1 t2 where t1.PersonID = t2.PersonID)

also try

SELECT t1.EmpId
from dbo.Table1 t1
where not exists ( select * from dbo.Table1 t2 where t1.PersonID = t2.PersonID and t1.dtmStatusDate < t2.dtmStatusDate)

Query Optimization

Question regarding performance.

If I have a query such as:

Select UserId, Firstname,lastname from members where country='can'

If I were to call this from an .net executable as straight SQL to the Database vs. encapsulating the command in a stored procedure and calling the procedure.

What would be the performance differences? Would their be any issues (outside of security) that would make me choose to place the call in a Procedure?

Thanks

It probably doesn't matter in this case. Executing the SELECT statement using a parameterized command object will provide same benefit as calling SP with slighly more overhead. RPC execution for SP calls provide better performance than sending the entire SQL text. The answer is that it depends on your needs. Changing SQL statements embedded in applications is often harder than modifyin a SP. You can also take a look at the whitepaper below for more information:

http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx

Query Optimization

Hello all,

Please bear with the long explanation of my scenario.

As I'm relatively new to the query world, I like to write my queries
using the visual toos such as the "View" option in SQL Server or in MS
Access. If I have a complicated query with sub-queries, I create a
query (view1) as (for example):

select ID,count(ID) as NumberOfUsers from tblContact
where Type > 3
GROUP BY ID

then create another query (view2) which uses the first view in another
select statement:

select count(NumberOfUsers) from view1 where NumberOfUsers > 1

What I've noted above is a very simple example but, you get the idea (I
hope).

My question, however, is regarding a very complex query with
cascading/nested views.

Scenario:

View1 is joining 5 tables

View2 is using a join between View1 and 3 more tables

View3 is using a join between View2 and 5 more tables

View4 is using a join between view3 and 2 more tables

When I run VIEW3, it executes within 10 seconds. When I run VIEW4, it
takes 4 hours! What I did to get around this problem was this:

1. I renamed View3 to something else - like View3_Test
2. I then exported the new View3_Test into a table called View3

After this modification, when I run View4, it executes in 15 seconds!

It seems to me that SQL Server is bundling the joins used in
View4...all as 1 view.

Is there any way, I can make the execution plan in such a way that it
executes (the original) view3 FIRST and then proceeds to execute the
rest of the joins in View4?

This would be extremely helpful for me because in the interim (sp?), I
will have to schedule a DTS package to export the View3_Test to the
table called View3 first...upon completion, run the rest of the report.
That's just cheating. I'd like to be able to provide a better
solution to my employer.

Any help will be much appreciated.

Thank you in advance for your expert advice,

-Umar Farooq.You can't really control how MSSQL processes the views, because the
query optimizer never 'sees' them - when you query a view, its name is
replaced by its definition, and the optimizer then finds a plan for the
expanded query. Unfortunately, it seems that in your case it isn't
finding a very good plan.

You can use Query Analyzer to check the query plan, and see which joins
or other operations are the 'heavy' ones. That may give you a clue as
to how to improve your queries.

Alternatively, if your views are used for reporting, and if the data in
the tables doesn't change very much, you could consider using indexed
views. These are more like tables, and they do store the data in the
view, so the optimizer doesn't need to go to the base tables. But this
will make changes to data much slower, so it's probably most useful in
a reporting scenario.

Simon|||Hello Simon,

Thanks for your reply.

I guess my next question would be "How do I set up an indexed view?" I
look in the Enterprise manager and when I right-click on the view,
under "All Tasks" it gives me "Manage Triggers" and "Permissions" but
"Manage Indexes" is grayed out.

I downloaded the Production.BAK file to the QA and restored it and ran
these views on it. Another point to note is that on the QA server, the
QA implementation of the same database, this query runs just
fine...max, 35 seconds. But on the Production database, it chokes.

I tried doing a DBCC "soft reindexing" on all the tables and then took
a SQL trace and ran the Query Optimizer on the Production database on
the QA server. Still no luck.

I'd like to learn more about the "Indexed Views" as that sounds like a
viable solution. You said the data will be stored in the view? I'm
not sure I understand how that works. It's a bit of an "overload" for
my Microsoft brain.

While on that subject, would anyone happen to know if I can set up
defaults in views? I know how to set it up on the table but I have a
scenario where I'd like for different people to see a table through a
view and on their "UserID" column, for example, I'd like to default
their UserID value on the view.

Thank you again for your help,

-Umar Farooq.|||"Umar Farooq" <UmarAlFarooq@.gmail.com> wrote in message
news:1112795648.697684.308450@.g14g2000cwa.googlegr oups.com...
> Hello Simon,
> Thanks for your reply.
> I guess my next question would be "How do I set up an indexed view?" I
> look in the Enterprise manager and when I right-click on the view,
> under "All Tasks" it gives me "Manage Triggers" and "Permissions" but
> "Manage Indexes" is grayed out.

I don't use EM much myself, so I don't know if it's possible to create an
indexed view from there. However, not all views are indexable, and you have
to make sure all the conditions are met, so it may simply be that your view
is not indexable without some changes - see "Creating an Indexed View" in
Books Online for a list of requirements.

> I downloaded the Production.BAK file to the QA and restored it and ran
> these views on it. Another point to note is that on the QA server, the
> QA implementation of the same database, this query runs just
> fine...max, 35 seconds. But on the Production database, it chokes.
> I tried doing a DBCC "soft reindexing" on all the tables and then took
> a SQL trace and ran the Query Optimizer on the Production database on
> the QA server. Still no luck.

I'm not really sure I follow this - you mean you copied your DB from
Production to QA, and the same query runs in 35 seconds in QA but 4 hours in
Production? If so, there should be something fairly obviously different in
the query plans to suggest what's going on.

> I'd like to learn more about the "Indexed Views" as that sounds like a
> viable solution. You said the data will be stored in the view? I'm
> not sure I understand how that works. It's a bit of an "overload" for
> my Microsoft brain.

See "Designing an Indexed View" in Books Online. Basically, you first create
a clustered index on the view (you can create additional nonclustered
indexes after that); since a clustered index always contains data in its
leaf level, the new index now has the view data in it. You can check Books
Online for more details.

An important point to remember with indexed views is that you can create
them in any edition of MSSQL, but only Enterprise Edition will automatically
use them - other editions will continue to expand the view and ignore the
indexes. To force them to use the indexes, you need to use the WITH
(NOEXPAND) hint in your queries - see the sections on hints under "FROM" in
BOL.

> While on that subject, would anyone happen to know if I can set up
> defaults in views? I know how to set it up on the table but I have a
> scenario where I'd like for different people to see a table through a
> view and on their "UserID" column, for example, I'd like to default
> their UserID value on the view.

I don't know what you mean by a default, but if you want to show different
users different data based on their login, then something like this may
work:

create view dbo.MyCustomers
as
select
CustID,
CustName,
...
from
dbo.Customers
where
CustomerAccountRep = system_user -- see BOL

> Thank you again for your help,
> -Umar Farooq.

You're welcome.

Simon|||/*
> I downloaded the Production.BAK file to the QA and restored it and
ran
> these views on it. Another point to note is that on the QA server,
the
> QA implementation of the same database, this query runs just
> fine...max, 35 seconds. But on the Production database, it chokes.

> I tried doing a DBCC "soft reindexing" on all the tables and then
took
> a SQL trace and ran the Query Optimizer on the Production database on

> the QA server. Still no luck.

I'm not really sure I follow this - you mean you copied your DB from
Production to QA, and the same query runs in 35 seconds in QA but 4
hours in
Production? If so, there should be something fairly obviously different
in
the query plans to suggest what's going on.
*/

I mean that I copied the DB from Production to QA and ran the query and
it took just as long as it did on production. However, the same tables
and structures and most of the data is in the QA DB on the QA server.
On that DB, the query runs super fast. Since the QA DB and the
Production DB are identical in structure, I thought re-indexing the
Production DB will make it run faster. No such luck. Not only that,
once I have the query running in 20 seconds on the QA DB, even if I
make a slight modification to the query (like adding some additional
fields, etc.) the query starts to bog down again.

I wish there was a way by which I could tell the query to treat the
steps I want as a self-contained query and not to combine the execution
plan with the other queries in the view.

By default, I mean for inserts. On a table, if I wanted to enter a
default date, for example, I'd just do a "getdate()" function on the
default property of the date field and it will insert a date in the
field by default. I'd like a view to to insert a default value in a
column just like the table default. For example:

tblSuggestion
ID, SuggestorID, Category, Suggestion, SuggestionDetails,
DateSubmitted, DateModified

Let's say there are 3 Suggestors in tblSuggestor: James, John and Mike

I want to create 3 views:

View1
ID, 1, Category, Suggestion, SuggestionDetails, DateSubmitted,
DateModified

View2
ID, 2, Category, Suggestion, SuggestionDetails, DateSubmitted,
DateModified

View3
ID, 3, Category, Suggestion, SuggestionDetails, DateSubmitted,
DateModified

I give James View1 which will automatically enter a 1 in the
SuggestorID field so he doesn't have to do it every time. All he has
to enter will be Category, Suggestion and SuggestionDetails.
DateSubmitted is a table default of "getdate()" and any changes to the
record will fire up a trigger to update DateModified with "getdate()"
value.

The problem is that the scenario I've listed above is only for Select
views and not for update views. I know I can accomplish this task with
a stored procedure which will accept values and hard-code the
information into the table but I wanted to do that via a view if
possible.

Thanks,

-Ashwin Sharma.|||Although it is generally not recommended, you could try to add the query
hint "OPTION (FORCE ORDER)", and see how that works for you.

HTH,
Gert-Jan

Umar Farooq wrote:
> Hello all,
> Please bear with the long explanation of my scenario.
> As I'm relatively new to the query world, I like to write my queries
> using the visual toos such as the "View" option in SQL Server or in MS
> Access. If I have a complicated query with sub-queries, I create a
> query (view1) as (for example):
> select ID,count(ID) as NumberOfUsers from tblContact
> where Type > 3
> GROUP BY ID
> then create another query (view2) which uses the first view in another
> select statement:
> select count(NumberOfUsers) from view1 where NumberOfUsers > 1
> What I've noted above is a very simple example but, you get the idea (I
> hope).
> My question, however, is regarding a very complex query with
> cascading/nested views.
> Scenario:
> View1 is joining 5 tables
> View2 is using a join between View1 and 3 more tables
> View3 is using a join between View2 and 5 more tables
> View4 is using a join between view3 and 2 more tables
> When I run VIEW3, it executes within 10 seconds. When I run VIEW4, it
> takes 4 hours! What I did to get around this problem was this:
> 1. I renamed View3 to something else - like View3_Test
> 2. I then exported the new View3_Test into a table called View3
> After this modification, when I run View4, it executes in 15 seconds!
> It seems to me that SQL Server is bundling the joins used in
> View4...all as 1 view.
> Is there any way, I can make the execution plan in such a way that it
> executes (the original) view3 FIRST and then proceeds to execute the
> rest of the joins in View4?
> This would be extremely helpful for me because in the interim (sp?), I
> will have to schedule a DTS package to export the View3_Test to the
> table called View3 first...upon completion, run the rest of the report.
> That's just cheating. I'd like to be able to provide a better
> solution to my employer.
> Any help will be much appreciated.
> Thank you in advance for your expert advice,
> -Umar Farooq.|||<snip
> I wish there was a way by which I could tell the query to treat the
> steps I want as a self-contained query and not to combine the execution
> plan with the other queries in the view.

Gert-Jan's FORCE ORDER suggestion is probably the closest thing there is
without using indexed views - generally you don't want to impose a certain
plan on the optimizer, but sometimes it just gets it wrong. You might also
want to make sure you have the latest servicepack installed, as they may
include changes to the optimizer.

> By default, I mean for inserts. On a table, if I wanted to enter a
> default date, for example, I'd just do a "getdate()" function on the
> default property of the date field and it will insert a date in the
> field by default. I'd like a view to to insert a default value in a
> column just like the table default. For example:

<snip
> The problem is that the scenario I've listed above is only for Select
> views and not for update views. I know I can accomplish this task with
> a stored procedure which will accept values and hard-code the
> information into the table but I wanted to do that via a view if
> possible.

A stored procedure is usually a better way to modify data than a view, since
it gives you more flexibility in terms of validating input, performing other
actions, cascading changes to other tables, applying procedural logic and so
on.

In your example, I don't really see why you need a view - you could just
SELECT/INSERT on the table directly. Unless tblSuggestion is in fact a view
and not a table? One view per user is not a very scalable solution anyway,
and maintenance would be an issue.

If you can replace SuggestorID with a user's login name, then you can simply
make SYSTEM_USER the default for that column, as you've already done with
GETDATE() elsewhere. If that isn't possible, then I would use a stored
procedure; if you absolutely have to use a view for some reason, then you
could create an INSTEAD OF trigger on the view. That assumes you have
SQL2000 - I don't think you've mentioned which version you have.

If this isn't helpful, I suggest you post CREATE TABLE and INSERT statements
to show exactly what you're trying to achieve.

http://www.aspfaq.com/etiquette.asp?id=5006

Simon

Query Optimization

I have a query..if you look at the bottom of the where clause you'll
see an "NOT IN" statement that is really hanging up the query..i'm
trying to replace with a "NOT EXISTS" but it isnt appearing to
work...I need to get a result set where the email address of the
outter most query is not in that sub query...thanks:

-->Code Begins Here

SELECT
DISTINCT
'5367' AS SURVEYID,
ISNULL(B.EMAIL_ADDR,'') AS C_EMAIL_ADDR,
A.CASE_ID,
A.SITE_ID,
C.SITE_NAME,
CONVERT(VARCHAR(12), A.ROW_ADDED_DTTM, 101) AS C_OPENED_DT,
CONVERT(VARCHAR(12), A.ROW_ADDED_DTTM, 108) AS C_OPENED_TM,
CONVERT(VARCHAR(12), A.CLOSED_DTTM, 101) AS C_CLOSED_DT,
CONVERT(VARCHAR(12), A.CLOSED_DTTM, 108) AS C_CLOSED_TM,
A.RC_STATUS,
A.SOLUTION_ID,
A.RC_SOURCE,
CASE WHEN LEN(ISNULL(A.PRODUCT_GROUP, '')) = 0 THEN '[blank]' ELSE
D.PRODUCT_GROUP_DESCR END AS PRODUCT_GROUP,
B.FIRST_NAME AS C_FIRST_NAME,
B.LAST_NAME AS C_LAST_NAME,
B.TITLE AS C_TITLE,
B.PHONE AS C_PHONE,
CASE WHEN LEN(ISNULL(B.EXTENSION, '')) = 0 THEN '[blank]' ELSE
B.EXTENSION END AS EXTENSION,
CASE WHEN LEN(ISNULL(A.LOCATION, '')) = 0 THEN '[blank]' ELSE
A.LOCATION END AS LOCATION,
CASE WHEN LEN(ISNULL(B.CITY, '')) = 0 THEN '[blank]' ELSE B.CITY END
AS CITY,
CASE WHEN LEN(ISNULL(B.STATE, '')) = 0 THEN '[blank]' ELSE B.STATE
END AS STATE,
CASE WHEN LEN(ISNULL(B.POSTAL, '')) = 0 THEN '[blank]' ELSE B.POSTAL
END AS POSTAL,
CASE WHEN LEN(ISNULL(B.REGION_ID, '')) = 0 THEN '[blank]' ELSE
B.REGION_ID END AS REGION_ID,
CASE WHEN LEN(ISNULL(C.COUNTRY, '')) = 0 THEN '[blank]' ELSE
C.COUNTRY END AS COUNTRY,
CASE WHEN LEN(ISNULL(C.SITE_TYP_ID, '')) = 0 THEN '[blank]' ELSE
C.SITE_TYP_ID END AS SITE_TYPE,
CASE WHEN LEN(ISNULL(A.CASE_TYPE, '')) = 0 THEN '[blank]' ELSE
A.CASE_TYPE END AS CASE_TYPE,
CASE WHEN LEN(ISNULL(A.COMPETENCY, '')) = 0 THEN '[blank]' ELSE
A.COMPETENCY END AS DEVICE_TYPE,
CASE WHEN LEN(ISNULL(A.PROVIDER_GRP_ID, '')) = 0 THEN '[blank]' ELSE
A.PROVIDER_GRP_ID END AS PROVIDER_GRP,
A.CLOSED_BY_OPRID,
A.ROW_ADDED_OPRID,
CASE WHEN LEN(ISNULL(A.PX_LEVEL, '')) = 0 THEN '[blank]' ELSE
A.PX_LEVEL END AS PX_LEVEL,
CASE WHEN LEN(ISNULL(A.RC_PRIORITY, '')) = 0 THEN '[blank]' ELSE
A.RC_PRIORITY END AS PRIORITY,
CASE WHEN LEN(ISNULL(A.RC_SEVERITY, '')) = 0 THEN '[blank]' ELSE
A.RC_SEVERITY END AS SEVERITY,
CASE WHEN LEN(ISNULL(E.SO_ID,'')) = 0 THEN '[blank]' ELSE E.SO_ID END
AS SO_ID,
CASE WHEN LEN(ISNULL(E.EMAIL_ADDR,'')) = 0 THEN '[blank]' ELSE
E.EMAIL_ADDR END AS S_EMAIL_ADDR,
CASE WHEN LEN(ISNULL(E.OPENED_DT,'')) = 0 THEN'[blank]' ELSE
E.OPENED_DT END AS S_OPENED_DT,
CASE WHEN LEN(ISNULL(E.OPENED_TM,'')) = 0 THEN'[blank]' ELSE
E.OPENED_TM END AS S_OPENED_TM,
CASE WHEN LEN(ISNULL(E.CLOSED_DT,'')) = 0 THEN'[blank]' ELSE
E.CLOSED_DT END AS S_CLOSED_DT,
CASE WHEN LEN(ISNULL(E.CLOSED_TM,'')) = 0 THEN'[blank]' ELSE
E.CLOSED_TM END AS S_CLOSED_TM,
CASE WHEN LEN(ISNULL(E.FIRST_NAME,'')) = 0 THEN'[blank]' ELSE
E.FIRST_NAME END AS S_FIRST_NAME,
CASE WHEN LEN(ISNULL(E.LAST_NAME,'')) = 0 THEN'[blank]' ELSE
E.LAST_NAME END AS S_LAST_NAME,
CASE WHEN LEN(ISNULL(E.TITLE,'')) = 0 THEN'[blank]' ELSE E.TITLE END
AS S_TITLE,
CASE WHEN LEN(ISNULL(E.PHONE,'')) = 0 THEN '[blank]' ELSE E.PHONE END
AS S_PHONE,
CASE WHEN LEN(ISNULL(A.RC_SUMMARY,'')) = 0 THEN '[blank]' ELSE
A.RC_SUMMARY END AS CASE_SUMMARY,
CASE WHEN LEN(ISNULL(E.SERIAL_ID,'')) = 0 THEN '[blank]' ELSE
E.SERIAL_ID END AS S_SERIAL_ID,
CASE WHEN LEN(ISNULL(E.CONTACT_PERSON_ID,'')) = 0 THEN '[blank]' ELSE
E.CONTACT_PERSON_ID END AS S_PERSON_ID,
CASE WHEN LEN(ISNULL(A.CASE_CONTACT,'')) = 0 THEN '[blank]' ELSE
A.CASE_CONTACT END AS C_PERSON_ID,
CASE WHEN LEN(ISNULL(E.LOCN,'')) = 0 THEN '[blank]' ELSE E.LOCN END
AS S_LOCN,
CASE WHEN LEN(ISNULL(E.TECH_NAME,'')) = 0 THEN '[blank]' ELSE
E.TECH_NAME END AS S_TECH_NAME,
'N' AS SERVICEORDER_FLG,
CASE WHEN C.STATE = 'QB' THEN 1 ELSE 3 END AS FC_LANGUAGE_FLG
FROM
FCT_CASE A
-->JOINS LU_SITE_REP TO GET THE CASE CONTACT NAME
INNER JOIN
LU_SITE_REP B
ON
A.CASE_CONTACT = B.PERSON_ID
AND A.SITE_ID = B.SITE_ID
-->JOINS DIM_SITE TO GET THE SITE INFORMATION FOR THE CASE
INNER JOIN
DBO.DIM_SITE C
ON
A.SITE_ID = C.SITE_ID
-->LEFT JOIN TO GET THE PRODUCT GROUP NAME IF THE PRODUCT GROUP FOR
THE CASE IS NOT BLANK
LEFT JOIN
LU_PRODUCT_GROUP D
ON
A.PRODUCT_GROUP = D.PRODUCT_GROUP

LEFT JOIN
-->BEGIN DERIVED TABLE E
-->E RETRIEVES ALL THE INFO FROM THE SERVICE ORDER IF A SERVICE ORDER
EXISTS FOR THE CASE
(
SELECT
S.SO_ID,
P.EMAIL_ADDR,
S.CASE_ID,
CONTACT_PERSON_ID,
(
SELECT
FIRST_NAME + ' ' + LAST_NAME
FROM
DIM_WORKER W
WHERE
W.PERSON_ID = S.PERSON_ID

) AS TECH_NAME,

CONVERT(VARCHAR(12), S.ROW_ADDED_DTTM, 101) AS OPENED_DT,
CONVERT(VARCHAR(12), S.ROW_ADDED_DTTM, 108) AS OPENED_TM,
CONVERT(VARCHAR(12), S.PX_COMPLETED_DTTM, 101) AS CLOSED_DT,
CONVERT(VARCHAR(12), S.PX_COMPLETED_DTTM, 108) AS CLOSED_TM,
P.FIRST_NAME,
P.LAST_NAME,
P.TITLE,
P.PHONE,
S.SERIAL_ID,
S.LOCN
FROM
DIM_SO_HDR S,
LU_SITE_REP P
WHERE
S.CONTACT_PERSON_ID = P.PERSON_ID
AND S.CASE_ID <>0
AND S.ROW_ADDED_DTTM BETWEEN DATEADD(dd, DATEDIFF(dd,0,GETDATE()-2),
0) AND DATEADD(dd, DATEDIFF(dd,0,GETDATE()), 0)
) E -->END DERIVED TABLE E
ON
A.CASE_ID = E.CASE_ID
WHERE
(
-->RESTRICTIONS FOR RC_STATUS=CLOSR
(ISNULL(A.RC_STATUS,'') = 'CLOSR'AND ISNULL(A.SOLUTION_STATE,'1') =
'1' AND ISNULL(A.SOLUTION_ID,0) NOT IN (319852, 319716, 319825,
319775, 319776) )
-->RESTRICTION FOR RC_STATUS= OPENC
OR(ISNULL(A.RC_STATUS,'')='OPENC')
)
AND CAST(A.SITE_ID AS INT) NOT IN (909, 900, 903)
AND B.EMAIL_ADDR <'noc.eon@.e.pagenet.ca'
AND B.EMAIL_ADDR NOT LIKE '%cardinal.com'
AND B.EMAIL_ADDR LIKE '%@.%'
AND A.CUSTSAT_EXCLUDE_FLG <'Y'
AND CASE WHEN LEN(ISNULL(A.PRODUCT_GROUP, '')) = 0 THEN '[blank]'
ELSE D.PRODUCT_GROUP_DESCR END <'PYXISSTATION'
AND DATEADD(dd, DATEDIFF(dd,0,A.ROW_ADDED_DTTM), 0) = DATEADD(dd,
DATEDIFF(dd,0,getdate()-1), 0)
-->RESTRICTION THAT ONLY PULLS IN CASES WHERE THE EMAIL ADDRESS HAS
NOT BEEN INCLUDED IN A SURVEY IN THE LAST 30 DAYS
AND B.EMAIL_ADDR NOT IN
(
SELECT
DISTINCT
H.EMAIL_ADDR
FROM
FCT_CASE G,
LU_SITE_REP H
WHERE

G.CASE_CONTACT = H.PERSON_ID
AND(
-->RESTRICTIONS FOR RC_STATUS=CLOSR
(ISNULL(G.RC_STATUS,'') = 'CLOSR'AND ISNULL(G.SOLUTION_STATE,'1') =
'1' AND ISNULL(G.SOLUTION_ID,0) NOT IN (319852, 319716, 319825,
319775, 319776) )
-->RESTRICTION FOR RC_STATUS= OPENC
OR(ISNULL(G.RC_STATUS,'')='OPENC')
)
AND CAST(G.SITE_ID AS INT) NOT IN (909, 900, 903)
AND G.CUSTSAT_EXCLUDE_FLG <'Y'
AND DATEADD(dd, DATEDIFF(dd,0,G.ROW_ADDED_DTTM), 0) BETWEEN
DATEADD(dd, DATEDIFF(dd,0,getdate()-30), 0)AND DATEADD(dd, DATEDIFF(dd,
0,getdate()-2), 0)
AND H.EMAIL_ADDR IS NOT NULL

)
ORDER BY A.CASE_ID DESCJimbo (jim.ferris@.motorola.com) writes:

Quote:

Originally Posted by

It's a little unclear what you ask for. Your subject line talks about
performance, but the question more sounds like you have problems to get
the desired result.

In any case, without knowledge of the tables, it's difficult to say what
you need to do to get the right result, and for advice performance I would
need to know indexes and indiciation of table sizes. However, the initial
DISTINCT is an indication of that you have insufficient join conditions.
Or that you are just using it as a matter of routine. DISTINCT is something
which in my experience there rarely is a need for. But since DISTICT calls
for a sorting operation, it can be costly.

Another reflection:

Quote:

Originally Posted by

CASE WHEN LEN(ISNULL(B.EXTENSION, '')) = 0 THEN '[blank]' ELSE
B.EXTENSION END AS EXTENSION,

You could write this as :

coalesce(nullif(B.EXTENSION, ''), '[blank]') AS EXTENSION

Not that it is a performance winner, but just that it is a little briefer.

As for rewriting your NOT IN with NOT EXISTS, I would expect that to
improve performance, as they usually result in the same plan. But NOT IN
can cause undesired results when NULL is involved, and NOT EXISTS circum-
vents that. Also, you have a second DISTINCT in the subquery, and that
may be bad for performance.

Here is a rewrite of your subquery:

AND NOT EXISTS
(
SELECT *
FROM FCT_CASE G, LU_SITE_REP H
WHERE G.CASE_CONTACT = H.PERSON_ID
AND( -->RESTRICTIONS FOR RC_STATUS=CLOSR
(ISNULL(G.RC_STATUS,'') = 'CLOSR'AND ISNULL(G.SOLUTION_STATE,'1') =
'1' AND ISNULL(G.SOLUTION_ID,0) NOT IN (319852, 319716, 319825,
319775, 319776) )
-->RESTRICTION FOR RC_STATUS= OPENC
OR(ISNULL(G.RC_STATUS,'')='OPENC')
)
AND CAST(G.SITE_ID AS INT) NOT IN (909, 900, 903)
AND G.CUSTSAT_EXCLUDE_FLG <'Y'
AND DATEADD(dd, DATEDIFF(dd,0,G.ROW_ADDED_DTTM), 0) BETWEEN
DATEADD(dd, DATEDIFF(dd,0,getdate()-30), 0) AND
DATEADD(dd, DATEDIFF(dd, 0,getdate()-2), 0)
AND H.EMAIL_ADDR = B.EMAIL_ADDR)

I cannot vouch for that it gives the correct result.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

Friday, March 9, 2012

Query Optimizer

Query Optimizer

Query Optimizer

Query optimization question...

Query Optimization NOT IN vs LEFT JOIN

Query optimization in query that uses CTE.

Query optimization help, please.

Query Optimization Assistance w/ Joins

Query Optimization - Suggestion needed

Query Optimization - Please Help

Query optimization - joining a view and a Table

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query longer in SP

Blog Archive

About Me