Query longer in SP: query performance question regarding ISNULL

I have a sql statement:
SELECT A1.C_JOBBIDID, A2.ACCOUNTID A2_ACCOUNTID, A1.PROD_PR_GRP_C, A1.WON,
A2.DESCRIPTION A2_DESCRIPTION, A1.OPPORTUNITYID, A3.USERFIELD5 A3_USERFIELD5
,
A2.STATUS A2_STATUS, A1.SPEC_PR_PROD_C, A1.SPR_ITEM_ALLOW_Q, A1.DISTPRICE1ST
,
A1.SUBPRICE1ST, A1.EXPDATE1ST, A1.DISTPRICE2ND, A1.SUBPRICE2ND, A1.EXPDATE2N
D,
A1.DISTPRICE3RD, A1.SUBPRICE3RD, A1.EXPDATE3RD, A1.DISTPRICE4TH, A1.SUBPRICE
4TH,
A1.EXPDATE4TH
FROM C_JOBBID A1 INNER JOIN OPPORTUNITY A2 ON (A1.OPPORTUNITYID = A2.OPPORTU
NITYID)
INNER JOIN C_OPPORTUNITY_EXT A3 ON (A2.OPPORTUNITYID = A3.OPPORTUNITYID)
INNER JOIN C_OPPTOACCOUNT A4 ON (A4.OPPORTUNITYID = A1.OPPORTUNITYID)
WHERE A4.ACCOUNTID = 'A6UJ9A0069NH'
AND A3.METRO IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE USERID = 'ADMIN
')
AND A1.PROD_PR_GRP_C = 'C167'
ORDER BY A1.OPPORTUNITYID ASC, A1.SPEC_PR_PROD_C ASC, A1.EXPDATE1ST ASC
This query runs slower than:
SELECT A1.C_JOBBIDID, A2.ACCOUNTID A2_ACCOUNTID, A1.PROD_PR_GRP_C, A1.WON,
A2.DESCRIPTION A2_DESCRIPTION, A1.OPPORTUNITYID, A3.USERFIELD5 A3_USERFIELD5
,
A2.STATUS A2_STATUS, A1.SPEC_PR_PROD_C, A1.SPR_ITEM_ALLOW_Q, A1.DISTPRICE1ST
,
A1.SUBPRICE1ST, A1.EXPDATE1ST, A1.DISTPRICE2ND, A1.SUBPRICE2ND, A1.EXPDATE2N
D,
A1.DISTPRICE3RD, A1.SUBPRICE3RD, A1.EXPDATE3RD, A1.DISTPRICE4TH, A1.SUBPRICE
4TH,
A1.EXPDATE4TH
FROM C_JOBBID A1 INNER JOIN OPPORTUNITY A2 ON (A1.OPPORTUNITYID = A2.OPPORTU
NITYID)
INNER JOIN C_OPPORTUNITY_EXT A3 ON (A2.OPPORTUNITYID = A3.OPPORTUNITYID)
INNER JOIN C_OPPTOACCOUNT A4 ON (A4.OPPORTUNITYID = A1.OPPORTUNITYID)
WHERE A4.ACCOUNTID = 'A6UJ9A0069NH'
AND ISNULL(A3.METRO, '') IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE
USERID = 'ADMIN')
AND A1.PROD_PR_GRP_C = 'C167'
ORDER BY A1.OPPORTUNITYID ASC, A1.SPEC_PR_PROD_C ASC, A1.EXPDATE1ST ASC
As you can see, the only difference is checking the field for null and conve
rting
it to a blank string. You'd think the second one involving "more work" woul
d
take longer. Can someone explain why this is so?
Thanks in advance.
Jiho Han
jihohan@.yahoo.comWhat do you mean, slower? Considerably, or minimally? Can you check the
plan, or post the plan using SET SHOWPLAN_TEXT ON? It will probably reveal
the reason if it is considerable.
You might also try rewriting

> AND ISNULL(A3.METRO, '') IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE
> USERID = 'ADMIN')
as
AND EXISTS ( SELECT 1
FROM C_USERMETROS
WHERE C_USERMETROS.USERID = 'ADMIN'
AND PRICINGMETRO = A3.METRO)
assuming that there is not a A3.METRO that actually equals ''
----
Louis Davidson - http://spaces.msn.com/members/drsql/
SQL Server MVP
"Arguments are to be avoided: they are always vulgar and often convincing."
(Oscar Wilde)
"Jiho Han" <jihohan@.yahoo.com> wrote in message
news:a19ab9b65ef08c806917750fbb2@.msnews.microsoft.com...
>I have a sql statement:
> SELECT A1.C_JOBBIDID, A2.ACCOUNTID A2_ACCOUNTID, A1.PROD_PR_GRP_C, A1.WON,
> A2.DESCRIPTION A2_DESCRIPTION, A1.OPPORTUNITYID, A3.USERFIELD5
> A3_USERFIELD5,
> A2.STATUS A2_STATUS, A1.SPEC_PR_PROD_C, A1.SPR_ITEM_ALLOW_Q,
> A1.DISTPRICE1ST,
> A1.SUBPRICE1ST, A1.EXPDATE1ST, A1.DISTPRICE2ND, A1.SUBPRICE2ND,
> A1.EXPDATE2ND,
> A1.DISTPRICE3RD, A1.SUBPRICE3RD, A1.EXPDATE3RD, A1.DISTPRICE4TH,
> A1.SUBPRICE4TH, A1.EXPDATE4TH FROM C_JOBBID A1 INNER JOIN OPPORTUNITY A2
> ON (A1.OPPORTUNITYID = A2.OPPORTUNITYID)
> INNER JOIN C_OPPORTUNITY_EXT A3 ON (A2.OPPORTUNITYID = A3.OPPORTUNITYID)
> INNER JOIN C_OPPTOACCOUNT A4 ON (A4.OPPORTUNITYID = A1.OPPORTUNITYID)
> WHERE A4.ACCOUNTID = 'A6UJ9A0069NH' AND A3.METRO IN (SELECT PRICINGMETRO
> FROM C_USERMETROS WHERE USERID = 'ADMIN')
> AND A1.PROD_PR_GRP_C = 'C167'
> ORDER BY A1.OPPORTUNITYID ASC, A1.SPEC_PR_PROD_C ASC, A1.EXPDATE1ST ASC
> This query runs slower than:
> SELECT A1.C_JOBBIDID, A2.ACCOUNTID A2_ACCOUNTID, A1.PROD_PR_GRP_C, A1.WON,
> A2.DESCRIPTION A2_DESCRIPTION, A1.OPPORTUNITYID, A3.USERFIELD5
> A3_USERFIELD5,
> A2.STATUS A2_STATUS, A1.SPEC_PR_PROD_C, A1.SPR_ITEM_ALLOW_Q,
> A1.DISTPRICE1ST,
> A1.SUBPRICE1ST, A1.EXPDATE1ST, A1.DISTPRICE2ND, A1.SUBPRICE2ND,
> A1.EXPDATE2ND,
> A1.DISTPRICE3RD, A1.SUBPRICE3RD, A1.EXPDATE3RD, A1.DISTPRICE4TH,
> A1.SUBPRICE4TH, A1.EXPDATE4TH FROM C_JOBBID A1 INNER JOIN OPPORTUNITY A2
> ON (A1.OPPORTUNITYID = A2.OPPORTUNITYID)
> INNER JOIN C_OPPORTUNITY_EXT A3 ON (A2.OPPORTUNITYID = A3.OPPORTUNITYID)
> INNER JOIN C_OPPTOACCOUNT A4 ON (A4.OPPORTUNITYID = A1.OPPORTUNITYID)
> WHERE A4.ACCOUNTID = 'A6UJ9A0069NH' AND ISNULL(A3.METRO, '') IN (SELECT
> PRICINGMETRO FROM C_USERMETROS WHERE USERID = 'ADMIN') AND
> A1.PROD_PR_GRP_C = 'C167'
> ORDER BY A1.OPPORTUNITYID ASC, A1.SPEC_PR_PROD_C ASC, A1.EXPDATE1ST ASC
> As you can see, the only difference is checking the field for null and
> converting it to a blank string. You'd think the second one involving
> "more work" would take longer. Can someone explain why this is so?
> Thanks in advance.
> Jiho Han
> jihohan@.yahoo.com
>|||Hello Louis,
I am attaching the plan. It's the first time I've used that option so, I di
dn't know if this will be in the right format.
Also, I've tried the EXISTS approach and the result is same (I mean the perf
ormance). The plain option takes ~150ms(duration) vs ISNULL takes ~30ms. S
o it's not a huge difference but
it's significant enough that it's noticeable.
Thank you
Jiho Han
Senior Software Engineer
Infinity Info Systems
The Sales Technology Experts
Tel: 212.563.4400 x216
Fax: 212.760.0540
jhan@.infinityinfo.com
www.infinityinfo.com
> What do you mean, slower? Considerably, or minimally? Can you check
> the plan, or post the plan using SET SHOWPLAN_TEXT ON? It will
> probably reveal the reason if it is considerable.
>
> You might also try rewriting
>
> as
> AND EXISTS ( SELECT 1
> FROM C_USERMETROS
> WHERE C_USERMETROS.USERID = 'ADMIN'
> AND PRICINGMETRO = A3.METRO)
> assuming that there is not a A3.METRO that actually equals ''
>
> "Jiho Han" <jihohan@.yahoo.com> wrote in message
> news:a19ab9b65ef08c806917750fbb2@.msnews.microsoft.com...
>|||ISNULL (the function) is not equivalent to IS NULL (the syntax construct) in
SQL 2000 or 2005 to its optimizer.
You might try to use the latter since it is more completely supported.
Essentially, we're not reasoning about the output distribution on ISNULL().
That may cause suboptimal plans in complex queries.
Conor Cunningham
SQL Server Query Optimization Development Lead
"Jiho Han" <jihohan@.yahoo.com> wrote in message
news:a19ab9b661698c806aa7485348c@.msnews.microsoft.com...
Hello Louis,
I am attaching the plan. It's the first time I've used that option so, I
didn't know if this will be in the right format.
Also, I've tried the EXISTS approach and the result is same (I mean the
performance). The plain option takes ~150ms(duration) vs ISNULL takes
~30ms. So it's not a huge difference but it's significant enough that it's
noticeable.
Thank you
Jiho Han
Senior Software Engineer
Infinity Info Systems
The Sales Technology Experts
Tel: 212.563.4400 x216
Fax: 212.760.0540
jhan@.infinityinfo.com
www.infinityinfo.com
> What do you mean, slower? Considerably, or minimally? Can you check
> the plan, or post the plan using SET SHOWPLAN_TEXT ON? It will
> probably reveal the reason if it is considerable.
> You might also try rewriting
>
> as
> AND EXISTS ( SELECT 1
> FROM C_USERMETROS
> WHERE C_USERMETROS.USERID = 'ADMIN'
> AND PRICINGMETRO = A3.METRO)
> assuming that there is not a A3.METRO that actually equals ''
> "Jiho Han" <jihohan@.yahoo.com> wrote in message
> news:a19ab9b65ef08c806917750fbb2@.msnews.microsoft.com...
>|||Ok, that one just went over me completely.
I understand that ISNULL and IS NULL may not be equivalent - I would hope
not.
What I wanted to know was why
A3.METRO IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE USERID = 'ADMIN')
runs faster than
ISNULL(A3.METRO, '') IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE USERID
= 'ADMIN')
What I did notice when viewing the execution plan was that while using ISNUL
L
as above caused the query engine to use a hash match, the plain one caused
the engine to use a distinct sort followed by a nested loop.
I am very curious as to why it works this way and to determine whether I
need to employ ISNULL in all my other queries for query performance improvem
ent.
Thanks for your assistance!
Jiho Han
Senior Software Engineer
Infinity Info Systems
The Sales Technology Experts
Tel: 212.563.4400 x216
Fax: 212.760.0540
jhan@.infinityinfo.com
www.infinityinfo.com
> ISNULL (the function) is not equivalent to IS NULL (the syntax
> construct) in SQL 2000 or 2005 to its optimizer.
> You might try to use the latter since it is more completely supported.
> Essentially, we're not reasoning about the output distribution on
> ISNULL(). That may cause suboptimal plans in complex queries.
> Conor Cunningham
> SQL Server Query Optimization Development Lead
> "Jiho Han" <jihohan@.yahoo.com> wrote in message
> news:a19ab9b661698c806aa7485348c@.msnews.microsoft.com... Hello Louis,
> I am attaching the plan. It's the first time I've used that option
> so, I didn't know if this will be in the right format.
> Also, I've tried the EXISTS approach and the result is same (I mean
> the performance). The plain option takes ~150ms(duration) vs ISNULL
> takes ~30ms. So it's not a huge difference but it's significant
> enough that it's noticeable.
> Thank you
> Jiho Han
> Senior Software Engineer
> Infinity Info Systems
> The Sales Technology Experts
> Tel: 212.563.4400 x216
> Fax: 212.760.0540
> jhan@.infinityinfo.com
> www.infinityinfo.com|||It would be interesting to see what the execution plan would by if you
changed that portion of the query to
A3.METRIO IN (SLECT PRICINGMETRO FROM C_USERMETROS WHERE USERID = 'Admin'
AND PRICINGMETRO IS NOT NULL)
This is is just a wild guess, but maybe SQL is optimizing based on the fact
that ISNULL can't return a null result, and therefore can exclude null
results from the sub-query.
> What I wanted to know was why
> A3.METRO IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE USERID = 'ADMIN')
> runs faster than
> ISNULL(A3.METRO, '') IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE USERI
D
> = 'ADMIN')
> What I did notice when viewing the execution plan was that while using ISN
ULL
> as above caused the query engine to use a hash match, the plain one caused
> the engine to use a distinct sort followed by a nested loop.
> I am very curious as to why it works this way and to determine whether I
> need to employ ISNULL in all my other queries for query performance improvement.[/
color]|||Apologies - let me try to rephrase.
The query optimizer uses a tree model of operations that represent your SQL
statement. Additionally, statistical information (such as a histogram of
data for a column) is recorded for base tables and then pushed up through
the tree, being modified during each step. So, if you scan rows from a base
table and then filter, the filter operation would have an estimated output
distribution *after* the filter. This information is used to estimate
cardinality for each operator and to eventually cost various alternatives.
If the cardinality estimate is high, we may pick things like the hash join.
If it's low, we're more likely to pick a nested loops join. So, this
information is very important to picking an efficient plan.
Some constructs do not have full support in the optimizer. When we do not
have that information, we may not be able to come up with a good output
distribution for it, and thus the cardinality and cost may be incorrect.
ISNULL() is such an operator. When this happens, we may pick the loops join
when the hash join would have been better. This is when you see performance
issues.
In this case, you are using it to join with a subquery, and the distribution
is quite important here. I would recommend that you try to avoid this if
you are seeing plan issues. Once you start using the result of the function
is some other operation (a join, a subquery, a filter, etc), then I would
recommend that you consider avoiding it, if possible.
I hope that helps.
Thanks,
Conor
"Jiho Han" <jihohan@.yahoo.com> wrote in message
news:a19ab9b662c18c80742337e046d@.msnews.microsoft.com...
> Ok, that one just went over me completely.
> I understand that ISNULL and IS NULL may not be equivalent - I would hope
> not.
> What I wanted to know was why
> A3.METRO IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE USERID = 'ADMIN')
> runs faster than
> ISNULL(A3.METRO, '') IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE
> USERID = 'ADMIN')
> What I did notice when viewing the execution plan was that while using
> ISNULL as above caused the query engine to use a hash match, the plain one
> caused the engine to use a distinct sort followed by a nested loop.
> I am very curious as to why it works this way and to determine whether I
> need to employ ISNULL in all my other queries for query performance
> improvement.
> Thanks for your assistance!
> Jiho Han
> Senior Software Engineer
> Infinity Info Systems
> The Sales Technology Experts
> Tel: 212.563.4400 x216
> Fax: 212.760.0540
> jhan@.infinityinfo.com
> www.infinityinfo.com
>
>|||The option you specified came back slower actually. ~250 ms.
I've done several variations using IS NULL:
A3.METRO IS NOT NULL AND A3.METRO IN (SELECT PRICINGMETRO FROM C_USERMETROS
WHERE USERID = 'ADMIN')
A3.METRO IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE USERID = 'ADMIN'
AND PRICINGMETRO IS NOT NULL)
A3.METRO IS NOT NULL AND IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE
USERID = 'ADMIN' AND PRICINGMETRO IS NOT NULL)
They are all slower and are around ~250 ms. It seems to me that by adding
IS NOT NULL check, it's actually adding more work.
Nothing beats:
ISNULL(A3.METRO, '') IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE USERID
= 'ADMIN')
which consistently executes around ~30ms.
Jiho Han
Senior Software Engineer
Infinity Info Systems
The Sales Technology Experts
Tel: 212.563.4400 x216
Fax: 212.760.0540
jhan@.infinityinfo.com
www.infinityinfo.com
> It would be interesting to see what the execution plan would by if you
> changed that portion of the query to
> A3.METRIO IN (SLECT PRICINGMETRO FROM C_USERMETROS WHERE USERID =
> 'Admin' AND PRICINGMETRO IS NOT NULL)
> This is is just a wild guess, but maybe SQL is optimizing based on the
> fact that ISNULL can't return a null result, and therefore can exclude
> null results from the sub-query.
>|||Thanks Conor, that was very helpful although I can't say that I can absorb
everything you've said.
Just to be clear, and since I am wondering whether there is a misunderstandi
ng
here, I am reporting that the use of ISNULL() is performing better than the
lack of, or using IS NULL. If you see my other reply, you'll see some numbe
rs
I got trying to use IS NULL instead and they are all slower, even more so
than not using IS NULL.
Also, could you rephrase your last paragraph? Are you saying that I should
avoid using subqueries? or ISNULL()?
Are you suggesting that I use ISNULL() only in SELECT clause in your last
statement?
Thanks
Jiho
> Apologies - let me try to rephrase.
> The query optimizer uses a tree model of operations that represent
> your SQL statement. Additionally, statistical information (such as a
> histogram of data for a column) is recorded for base tables and then
> pushed up through the tree, being modified during each step. So, if
> you scan rows from a base table and then filter, the filter operation
> would have an estimated output distribution *after* the filter. This
> information is used to estimate cardinality for each operator and to
> eventually cost various alternatives. If the cardinality estimate is
> high, we may pick things like the hash join. If it's low, we're more
> likely to pick a nested loops join. So, this information is very
> important to picking an efficient plan.
> Some constructs do not have full support in the optimizer. When we do
> not have that information, we may not be able to come up with a good
> output distribution for it, and thus the cardinality and cost may be
> incorrect. ISNULL() is such an operator. When this happens, we may
> pick the loops join when the hash join would have been better. This
> is when you see performance issues.
> In this case, you are using it to join with a subquery, and the
> distribution is quite important here. I would recommend that you try
> to avoid this if you are seeing plan issues. Once you start using the
> result of the function is some other operation (a join, a subquery, a
> filter, etc), then I would recommend that you consider avoiding it, if
> possible.
> I hope that helps.
> Thanks,
> Conor
> "Jiho Han" <jihohan@.yahoo.com> wrote in message
> news:a19ab9b662c18c80742337e046d@.msnews.microsoft.com...
>|||Xref: TK2MSFTNGP08.phx.gbl microsoft.public.sqlserver.programming:587012
Jiho Han wrote:
> Thanks Conor, that was very helpful although I can't say that I can absorb
> everything you've said.
> Just to be clear, and since I am wondering whether there is a misunderstan
ding
> here, I am reporting that the use of ISNULL() is performing better than th
e
> lack of, or using IS NULL. If you see my other reply, you'll see some num
bers
> I got trying to use IS NULL instead and they are all slower, even more so
> than not using IS NULL.
> Also, could you rephrase your last paragraph? Are you saying that I shoul
d
> avoid using subqueries? or ISNULL()?
> Are you suggesting that I use ISNULL() only in SELECT clause in your last
> statement?
> Thanks
> Jiho
I am not Conor, but I will answer it anyway.
The advice of Conor is to avoid using functions/expressions in
combination with subqueries, because the optimizer will have better
information if you only use the column.
However, if I understand you correctly, in your case, the optimizer is
picking a faster plan for the ISNULL() query. If I undestand you
correctly, the query
AND ISNULL(A3.METRO, '') IN (SELECT PRICINGMETRO FROM C_USERMETROS
WHERE USERID = 'ADMIN')
uses a hash match to finish in ~30ms
and the query
AND A3.METRO IN (SELECT PRICINGMETRO FROM C_USERMETROS WHERE USERID =
'ADMIN')
uses a distinct sort and nested loop to finish in ~150ms
So although the optimizer has better information for the second query,
it actually picks a query plan that performs worse. Aparently, in your
situation, the hash match is the fastest solution. Maybe the optimizer
is unable to accurately estimate the output of the subquery (maybe it
returns more distinct values than expected).
You could also try to rewrite it differently, and see if that helps.
For example, you could try this:
AND EXISTS (
SELECT *
FROM C_USERMETROS
WHERE USERID = 'ADMIN'
AND PRICINGMETRO = A3.METRO
)
If the subquery returns unique values for PRICINGMETRO, then you could
also try this:
INNER JOIN C_USERMETROS
ON USERID = 'ADMIN'
AND PRICINGMETRO = A3.METRO
If the optimizer is actually misjudging the result from the subquery,
then it could help if you added an index on C_USERMETROS (USERID,
PRICINGMETRO).
HTH,
Gert-Jan

Friday, March 23, 2012

query performance question regarding ISNULL

No comments:

Post a Comment

Query longer in SP

Blog Archive

About Me