Showing posts with label worseperformance. Show all posts
Showing posts with label worseperformance. Show all posts

Friday, March 9, 2012

Query Optimization NOT IN vs LEFT JOIN

I've read in many places that using "NOT IN" results in worse
performance. I've got two queries producing the same result:
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
LEFT JOIN Repository R
ON WS.LocationID = R.LocationID
WHERE R.LocationID IS NULL
The second one actually takes just a bit slower than the first. Under
which circumstances should I expect a performance gain if any?
I appreciate your time reading my post,
Thank you,
Igor
*** Sent via Developersdex http://www.examnotes.net ***Have you tired this
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
WHERE NOT EXISTS (SELECT LocationID FROM Repository Where LocationID =
WS.LocationID )
Also WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
will return nothing if you have NULL values
Denis the SQL Menace
http://sqlservercode.blogspot.com/|||mEmENT0m0RI,
Compare also the version using "NOT EXISTS".
SELECT WS.*
FROM
(
select a.*
from WHSLocations as a
WHERE
NOT EXISTS (SELECT * FROM Repository as r where r.LocationID = a.LocationID)
) as WS
INNER JOIN
WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
go
Remember to clean the proc cache and the data buffer before executing each
statement (do not do it on production).
dbcc freeproccache
dbcc dropcleanbuffers
AMB
"mEmENT0m0RI" wrote:

> I've read in many places that using "NOT IN" results in worse
> performance. I've got two queries producing the same result:
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> LEFT JOIN Repository R
> ON WS.LocationID = R.LocationID
> WHERE R.LocationID IS NULL
>
> The second one actually takes just a bit slower than the first. Under
> which circumstances should I expect a performance gain if any?
> I appreciate your time reading my post,
> Thank you,
> Igor
>
> *** Sent via Developersdex http://www.examnotes.net ***
>|||"Alejandro Mesa" <AlejandroMesa@.discussions.microsoft.com> wrote in message
news:7C075150-2839-4CC2-A023-8C3DB5AEEA00@.microsoft.com...
> mEmENT0m0RI,
> Compare also the version using "NOT EXISTS".
> SELECT WS.*
> FROM
> (
> select a.*
> from WHSLocations as a
> WHERE
> NOT EXISTS (SELECT * FROM Repository as r where r.LocationID =
> a.LocationID)
> ) as WS
> INNER JOIN
> WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> go
> Remember to clean the proc cache and the data buffer before executing each
> statement (do not do it on production).
> dbcc freeproccache
> dbcc dropcleanbuffers
>
You should not use DBCC DROPCLEANBUFFERS to test the relative performance of
queries. When you do, you skew the results in favor of whichever query
accesses the fewest number of distinct pages. For instance a nested loops
join (or bookmark lookup) might read the same page over and over, whereas a
clustered index scan (or table scan) would read each page once. With the
page cache flushed the loop join might look faster even though it generates
more reads, and is therefore actually worse.
Instead just set
set statistics io on
and compare the total number of reads.
David|||I was comparing the performance by putting all three queries into same
batch and checking "Query cost (realtive to the batch)" values from the
execution plan window. I'm not sure though if that would be a fair way
to compare...
So, now I have three queries:
dbcc freeproccache
dbcc dropcleanbuffers
SET STATISTICS IO ON
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
WHERE NOT EXISTS (SELECT LocationID FROM Repository WHERE LocationID =
WS.LocationID)
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
SELECT WS.*
FROM
WHSLocations WS
INNER JOIN WHSLocations20060428D WHD
ON WS.LocationID = WHD.LocationID
LEFT JOIN Repository R
ON WS.LocationID = R.LocationID
WHERE R.LocationID IS NULL
Query Costs Relative to the batch:
33.33
33.33
33.34
IO STATISTICS:
(750 row(s) affected)
Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
read-ahead reads 848.
Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
read-ahead reads 1283.
Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
reads 0, read-ahead reads 0.
(750 row(s) affected)
Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
read-ahead reads 0.
Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
read-ahead reads 0.
Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
reads 0, read-ahead reads 0.
(750 row(s) affected)
Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
read-ahead reads 0.
Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
read-ahead reads 0.
Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
reads 0, read-ahead reads 0.
*** Sent via Developersdex http://www.examnotes.net ***|||See
http://groups.google.nl/group/micro...bb?dmode=source
Gert-Jan
mEmENT0m0RI wrote:
> I've read in many places that using "NOT IN" results in worse
> performance. I've got two queries producing the same result:
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> LEFT JOIN Repository R
> ON WS.LocationID = R.LocationID
> WHERE R.LocationID IS NULL
> The second one actually takes just a bit slower than the first. Under
> which circumstances should I expect a performance gain if any?
> I appreciate your time reading my post,
> Thank you,
> Igor
> *** Sent via Developersdex http://www.examnotes.net ***|||>I was comparing the performance by putting all three queries into same
> batch and checking "Query cost (realtive to the batch)" values from the
> execution plan window.
Execution plan is far from the whole story.|||David Browne (davidbaxterbrowne no potted meat@.hotmail.com) writes:
> You should not use DBCC DROPCLEANBUFFERS to test the relative
> performance of queries.
And most of all, you should not run that command on a production
machine!

> When you do, you skew the results in favor of whichever query accesses
> the fewest number of distinct pages. For instance a nested loops join
> (or bookmark lookup) might read the same page over and over, whereas a
> clustered index scan (or table scan) would read each page once. With
> the page cache flushed the loop join might look faster even though it
> generates more reads, and is therefore actually worse.
> Instead just set
> set statistics io on
> and compare the total number of reads.
Actually, neither that is accurate, as this does not account for building
hash tables, sorting etc. The best is wallclock time with getdate().
Obviously this requires a server that is not occupied with other things.
As for caching, it is important to test under the same presumptions. That
is, either flush the cache each time, or make sure all data is in cache
before you start testing.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||mEmENT0m0RI wrote:
> I've read in many places that using "NOT IN" results in worse
> performance. I've got two queries producing the same result:
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> LEFT JOIN Repository R
> ON WS.LocationID = R.LocationID
> WHERE R.LocationID IS NULL
>
> The second one actually takes just a bit slower than the first. Under
> which circumstances should I expect a performance gain if any?
I think that since the subquery in your first query is not co-related,
it only needs to be executed once, whereas in your second query the
Repository table needs to be accessed for every row in the WHSLocations
table.
I suspect that if the subquery was co-related (e.g. NOT IN (SELECT
LocationID FROM Repository where Repository.LocationGroup =
WS.LocationGroup) ) then the LEFT OUTER JOIN would be quicker.
Kris|||Seems you got the same plan for all three queries.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"mEmENT0m0RI" <nospam@.devdex.com> wrote in message news:epbpuVWbGHA.628@.TK2MSFTNGP04.phx.gb
l...
>I was comparing the performance by putting all three queries into same
> batch and checking "Query cost (realtive to the batch)" values from the
> execution plan window. I'm not sure though if that would be a fair way
> to compare...
> So, now I have three queries:
> dbcc freeproccache
> dbcc dropcleanbuffers
> SET STATISTICS IO ON
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> WHERE NOT EXISTS (SELECT LocationID FROM Repository WHERE LocationID =
> WS.LocationID)
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> WHERE WS.LocationID NOT IN (SELECT LocationID FROM Repository)
>
> SELECT WS.*
> FROM
> WHSLocations WS
> INNER JOIN WHSLocations20060428D WHD
> ON WS.LocationID = WHD.LocationID
> LEFT JOIN Repository R
> ON WS.LocationID = R.LocationID
> WHERE R.LocationID IS NULL
>
> Query Costs Relative to the batch:
> 33.33
> 33.33
> 33.34
>
> IO STATISTICS:
> (750 row(s) affected)
> Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
> read-ahead reads 848.
> Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
> read-ahead reads 1283.
> Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
> reads 0, read-ahead reads 0.
> (750 row(s) affected)
> Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
> read-ahead reads 0.
> Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
> read-ahead reads 0.
> Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
> reads 0, read-ahead reads 0.
> (750 row(s) affected)
> Table 'Repository'. Scan count 1, logical reads 67, physical reads 0,
> read-ahead reads 0.
> Table 'WHSLocations'. Scan count 1, logical reads 581, physical reads 0,
> read-ahead reads 0.
> Table 'WHSLocations20060428D'. Scan count 1, logical reads 10, physical
> reads 0, read-ahead reads 0.
>
> *** Sent via Developersdex http://www.examnotes.net ***