Wednesday, March 28, 2012

Query problem

I am trying the following query in and oleDbCommand:
SELECT PartLocations.LocationName, Sum(PartsJournal.Quantity) AS
SumOfQuantity, PartsJournal.PartsLotNumber
FROM PartLocations INNER JOIN PartsJournal ON PartLocations.LocationID
= PartsJournal.LocationID
GROUP BY PartLocations.LocationName, PartsJournal.PartsLotNumber,
PartsJournal.PartNumber, PartsJournal.LocationID
HAVING (((Sum(PartsJournal.Quantity))>0) AND
((PartsJournal.PartsLotNumber) Is Not Null) AND
((Min(PartsJournal.Date))>'5/1/2001') AND
((PartsJournal.PartNumber)='030020') AND
((PartsJournal.LocationID)<>1))
ORDER BY PartsJournal.PartNumber, MIN(PartsJournal.Date);

It generates an excpetion. If I take out the MIN(PartsJournal.Date) in
the ORDER BY clause it works fine. But when I check the query as is
in SQL Query Analyzer it runs fine with the MIN(PartsJournal.Date).
Why the difference?First, clean up the posting. DATE is a reserved word in SQL as well as
too vague to be a name. All those silly parens did was make the code
almost impossible to read. You need to learn about alias names for
tables, ISO-8601 for dates and ISO-11179 for data element names. Is
this what you meant to post? This looked like machine generated code,
not something meant for a human programmer to maintain. Try this
version:

SELECT L1.location_name, J1.parts_lot_nbr,
SUM(J1.part_qty) AS total_qty,
MIN(J1.foo_date) AS first_foo_date
FROM PartLocations AS L1,
PartsJournal AS J1
WHERE P1.location_id = J1.location_id
AND J1.parts_lot_nbr IS NOT NULL
AND J1.part_nbr = '03*002 0'
AND J1.location_id <> 1
GROUP BY L1.location_name, J1.parts_lot_nbr
HAVING SUM(J1.parts_qty) > 0
AND MIN(J1.foo_date) > '2001-05-01'
ORDER BY J1.parts_lot_nbr, first_foo_date;

The extra columns in your original GROUP BY would work, but would
produce rows without complete information. A typo?

>> It generates an excpetion. If I take out the MIN(J1.foo_date) in
the ORDER BY clause it works fine. But when I check the query as is
in SQL Query Analyzer it runs fine with the MIN(J1.foo_date). Why the
difference? <<

Dialect versus Standards. In Standard SQL, the ORDER BY clause is part
of a cursor. The cursor uses the result set of the query and converts
it into a sequential file structure for the host language. It can only
use column names that appear in the SELECT list. This is one reason
why I tell people not to write proprietary code.|||Thank you very much for the help.
--CELKO-- wrote:
> First, clean up the posting. DATE is a reserved word in SQL as well
as
> too vague to be a name.

Yes, sadly this is as a column name throughout the database tables and
I am reluctant to change it do to the amount of coding that would be
required to be changed and since I have not found any queries which
give problems due to this.

All those silly parens did was make the code
> almost impossible to read.

Acutally I didn't post the actual code, but did post a machine
generated version that I was playing around with for testing. The
acutal code wasn't quite as bad. The difficulty I have is that the
actual code was in vb and is terrible to read:
"SELECT TOP 1 PartLocations.LocationName, Sum(PartsJournal.Quantity),"
_
& " PartsJournal.PartsLotNumber, MIN(PartsJournal.Date) AS Minofdate" _
...etc

When coding is there a method to keep code such as above more readable?

>You need to learn about alias names for

I'll definitely keep this in mind. I ocassionaly use aliases but often
feel that even though it keeps the query shorter it actually makes it
harder to read. For instance it took me a little while to realize you
used P1 by mistake in your post instead of L1.

> tables, ISO-8601 for dates and ISO-11179 for data element names.

Thanks - I just studied both and will try to incorporate some of that
into all future designs. I only wish I had the time to redo what we
have already setup.

> this what you meant to post? This looked like machine generated
code,
> not something meant for a human programmer to maintain. Try this

No I didn't per above.

> version:
> SELECT L1.location_name, J1.parts_lot_nbr,
> SUM(J1.part_qty) AS total_qty,
> MIN(J1.foo_date) AS first_foo_date
> FROM PartLocations AS L1,
> PartsJournal AS J1
> WHERE P1.location_id = J1.location_id
> AND J1.parts_lot_nbr IS NOT NULL
> AND J1.part_nbr = '03*002 0'
> AND J1.location_id <> 1
> GROUP BY L1.location_name, J1.parts_lot_nbr
> HAVING SUM(J1.parts_qty) > 0
> AND MIN(J1.foo_date) > '2001-05-01'
> ORDER BY J1.parts_lot_nbr, first_foo_date;
> The extra columns in your original GROUP BY would work, but would
> produce rows without complete information. A typo?

Not a typo, just a mistake.

> >> It generates an excpetion. If I take out the MIN(J1.foo_date) in
> the ORDER BY clause it works fine. But when I check the query as is
> in SQL Query Analyzer it runs fine with the MIN(J1.foo_date). Why the
> difference? <<
> Dialect versus Standards. In Standard SQL, the ORDER BY clause is
part
> of a cursor. The cursor uses the result set of the query and
converts
> it into a sequential file structure for the host language. It can
only
> use column names that appear in the SELECT list. This is one reason
> why I tell people not to write proprietary code.

Your query worked. After some playing around I found that the fix to
my query was to move most of my HAVING clause to a WHERE clause. I'm
still not quite sure I understand what was wrong with having it there.
Can you explain? The final query with minimum change that worked is:

SELECT TOP 1 PartLocations.LocationName, Sum(PartsJournal.Quantity),
PartsJournal.PartsLotNumber, MIN(PartsJournal.Date) AS Minofdate
FROM PartLocations, PartsJournal
WHERE PartLocations.LocationID = PartsJournal.LocationID
AND PartsJournal.PartNumber = '030020'
AND PartsJournal.PartsLotNumber IS NOT NULL
AND PartsJournal.LocationID <> 1
GROUP BY PartsJournal.PartsLotNumber, PartLocations.LocationName
HAVING MIN(PartsJournal.Date) > '5/1/2001'
AND Sum(PartsJournal.Quantity) > 0
ORDER BY PartsJournal.PartsLotNumber, Minofdate|||>> After some playing around I found that the fix to
my query was to move most of my HAVING clause to a WHERE clause. I'm
still not quite sure I understand what was wrong with having it there.
Can you explain? <<

Here is how a SELECT works in SQL ... at least in theory. Real
products will optimize things, but the code has to produce the same
results.

a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors are
there. The table expression> AS <correlation name> option allows you
give a name to this working table which you then have to use for the
rest of the containing query.

b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (i.e. reject UNKNOWN and FALSE). The
WHERE clause is applied to the working set in the FROM clause.

c) Go to the optional GROUP BY clause, make groups and reduce each
group to a single row, replacing the original working table with the
new grouped table. The rows of a grouped table must be group
characteristics: (1) a grouping column (2) a statistic about the group
(i.e. aggregate functions) (3) a function or (4) an expression made up
those three items.

d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group. In your case, youhad columns that were in violation of
rule (c) above.

e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The
"AS" operator can also give names to expressions in the SELECT
list. These new names come into existence all at once, but after the
WHERE clause, GROUP BY clause and HAVING clause has been executed; you
cannot use them in the SELECT list or the WHERE clause for that reason.

If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).

f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.

g) The ORDER BY clause is part of a cursor, not a query. The result
set is passed to the cursor, which can only see the names in the SELECT
clause list, and the sorting is done there. The ORDER BY clause cannot
have expression in it, or references to other columns because the
result set has been converted into a sequential file structure and that
is what is being sorted.

As you can see, things happen "all at once" in SQL, not "from left to
right" as they would in a sequential file/procedural language model. In
those languages, these two statements produce different results:
READ (a, b, c) FROM File_X;
READ (c, a, b) FROM File_X;

while these two statements return the same data:

SELECT a, b, c FROM Table_X;
SELECT c, a, b FROM Table_X;

Think about what a confused mess this statement is in the SQL model.

SELECT f(c2) AS c1, f(c1) AS c2 FROM Foobar;

That is why such nonsense is illegal syntax.|||Thanks again. I noticed in the query you posted that you did not
include INNER JOIN notation but just did the join through the where
clause. Isn't the ANSI standard to include the JOIN instructions in
the table selection?|||One other question I wondered if you had any insight into? When does
it make sense to do the query definiton on the server side and just
reference this view from code. For example in my query above I could
have created a parameter query on the server and just provided the
parameters from code. Would this be superior? Should I do this for
all queries? It certainly seems as though it would make for more
readable code and testing of the queries would be better.|||(signaturefactory@.signaturefactory.com) writes:
> Thanks again. I noticed in the query you posted that you did not
> include INNER JOIN notation but just did the join through the where
> clause. Isn't the ANSI standard to include the JOIN instructions in
> the table selection?

Both:

SELECT ...
FROM a, b
WHERE a.col = b.col

and

SELECT ...
FROM a
JOIN b ON a.col = b.col

is compliant with ANSI standards. However, if you need to do an outer
join, you can only use the latter form. So many people feel that they
could just as well go with the latter form always. It's also good because
it separates the join conditions from the filter conditions. That for
also makes it a little harder to write uninteded cross-joins:

SELECT ...
FROM a, b, c
WHERE a.col = b.col

Oops! Forgot to take "c" out of the query.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||(signaturefactory@.signaturefactory.com) writes:
> I am trying the following query in and oleDbCommand:
> SELECT PartLocations.LocationName, Sum(PartsJournal.Quantity) AS
> SumOfQuantity, PartsJournal.PartsLotNumber
> FROM PartLocations INNER JOIN PartsJournal ON PartLocations.LocationID
>= PartsJournal.LocationID
> GROUP BY PartLocations.LocationName, PartsJournal.PartsLotNumber,
> PartsJournal.PartNumber, PartsJournal.LocationID
> HAVING (((Sum(PartsJournal.Quantity))>0) AND
> ((PartsJournal.PartsLotNumber) Is Not Null) AND
> ((Min(PartsJournal.Date))>'5/1/2001') AND
> ((PartsJournal.PartNumber)='030020') AND
> ((PartsJournal.LocationID)<>1))
> ORDER BY PartsJournal.PartNumber, MIN(PartsJournal.Date);
> It generates an excpetion. If I take out the MIN(PartsJournal.Date) in
> the ORDER BY clause it works fine. But when I check the query as is
> in SQL Query Analyzer it runs fine with the MIN(PartsJournal.Date).
> Why the difference?

And the text of the exception was? I have no idea why the query
failed, but if problem is in SQLOLEDB or OleDb client, it does not help
us if you keep the error message a secret.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||(signaturefactory@.signaturefactory.com) writes:
> One other question I wondered if you had any insight into? When does
> it make sense to do the query definiton on the server side and just
> reference this view from code. For example in my query above I could
> have created a parameter query on the server and just provided the
> parameters from code. Would this be superior? Should I do this for
> all queries? It certainly seems as though it would make for more
> readable code and testing of the queries would be better.

In many shops, direct SQL statements from the client are not permitted,
but all calls have to be made from stored procedures. There are a
couple of advantages with this. For a longer discussion of these
advantages, see the first section "Why Stored Procedures" in my
article on dynamic SQL on http://www.sommarskog.se/dynamic_sql.html.

The drawback with stored procedures is that for applications that
have a moderate amount of SQL code, having all the SQL code in the
application, makes deployment a little easier, since there is one
less tier to install to. This breaks down, though, if there plentyful
of chages to the tables, since in this case you have a deployemnt
to do on the SQL Server side anyway.

As for whether you should write:

sql = "SELECT ... FROM tbl WHERE col = '" & Value & "'"

or

sql = "SELECT ... FROM tbl WHERE col = ?"

and supply a value for ? with OleDbParameters, there is little to
discussion: you should to the latter. Then the client library handles
format of difficult data types such as dates and decimal values that
are subject different interpretation because of regional settings. But
the most important issue is something known as SQL injection. What if
Value above contains "'-- DROP TABLE tbl"? Don't laugh, this is a very
popular technique to gain access to systems.

Finally, since you are using SQL Server, why do you use OleDB client
and not SqlClient? If your app needs to connect to other engines,
for instance Oracle or Access, that's a good reason to use OleDB
Client, but else go for SqlClient.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thank you very much.

>In many shops, direct SQL statements from the client are not
permitted,
>but all calls have to be made from stored procedures. There are a
>couple of advantages with this. For a longer discussion of these
>advantages, see the first section "Why Stored Procedures" in my
>article on dynamic SQL on http://www.sommarskog.se/dynamic_sql.html.

I am a definite convert on this. I am not goiing to allow direct SQL
statements any longer. I have been working with this today and the
advantages of stored procedures are very apparent. Fortunately
deployment is not an issue for our application.

>Finally, since you are using SQL Server, why do you use OleDB client
>and not SqlClient? If your app needs to connect to other engines,
>for instance Oracle or Access, that's a good reason to use OleDB
>Client, but else go for SqlClient.

This was a legacy app that was orginally written for access. We will
definitely use SqlClient in the future. Thanks.|||(signaturefactory@.signaturefactory.com) writes:
> I am a definite convert on this. I am not goiing to allow direct SQL
> statements any longer. I have been working with this today and the
> advantages of stored procedures are very apparent. Fortunately
> deployment is not an issue for our application.

Glad to hear that you got the message!

> This was a legacy app that was orginally written for access. We will
> definitely use SqlClient in the future. Thanks.

OK, that's at least a half-good reason.

For ADO .Net 1.1 the advantages with SqlClient over OleDb Client is
moderate. You can expect some better performance, and there are a
few issues with OleDb Client that SqlClient does not have.

For ADO .Net 2.0, currently in beta, the picture changes, as SqlClient
gets more features that is not in OleDB client.

Thankfully, porting is fairly easy as a large part of the work is a
search/replace thing. The one other difference is that with OleDb Client
you use ? as placeholder for parameters, with SqlClient, you use named
parameters like in T-SQL:

SELECT ... FROM tbl WHERE col = @.value

instead of

SELECT ... FROM tbl WHERE col = ?

I should also add that OleDb Client has one feature that SqlClient does
not: OleDB Client has methods for dealing with recordsets from ADO, which
is useful if you have an app with a mix of native code and .Net code of
legacy reasons.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||No, that is optional and I find it to be confusing and crowded most of
the time, You have to use it for OUTER JOINs, tho. Think about trying
to write this with infixed operators:

SELECT *
FROM T1, T2, T3
WHERE T1.a BETWEEN T2.b AND T3.c;

Do you see how the "between-ness" is hidden from the reader?|||--CELKO-- (jcelko212@.earthlink.net) writes:
> No, that is optional and I find it to be confusing and crowded most of
> the time, You have to use it for OUTER JOINs, tho. Think about trying
> to write this with infixed operators:
> SELECT *
> FROM T1, T2, T3
> WHERE T1.a BETWEEN T2.b AND T3.c;

Yes, the good thing about that is just that. That you have to think.
Because while the above may make sense at first glance, well, assuming
that T1, T2 and T3 have hundred rows each, what real business problem
would be described this way?

(Of course, if at least one of T2 and T3 are one-row tables, then above
makes perfect sense. But then you could easily write it with JOIN operators
as well.)

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||>>.. what real business problem would be described this way? <<

1) Make T1, T2 and T3 into subqueries on a Sales table

Let T1 = My sales in month (m)
T2 = Worst sales for month (m) by someone else
T3 = Best sales for month (m) by someone else

Now do a little math in the SELECT to see where I fall in the range for
each month as a percentage.

2) Make T2 and T3 the same look-up table for postal code ranges (the
ZIP code in the US is a geographical hierarchy, so this works; I do not
know about Sweden) -- T1.zip_code BETWEEN T2.low_zip_code AND
T2.high_zip_code. This is VERY common in reports that use any kind of
report hierarchy.

Another example of the ease of reading the "old way"

SELECT *
FROM T1, T2. T3
WHERE 42 IN (T1.a, T2.b, T3.c);

Versus somethging like this:

SELECT *
FROM T1
INNER JOIN
T2
ON (1 = 1) -- actually a cross join in disguise
INNER JOIN
T3
ON 42 IN (T1.a, T2.b, T3.c);

ORM recognized that there are n-ary relationships where (n > 2), but
when you have an infixed notation, you are stuck with binary
relationships. Newbies like this notation for two reasons:

1) It looks like ACCESS output and that is where they started
2) It looks like an ER diagram, rather than an ORM diagram.|||--CELKO-- (jcelko212@.earthlink.net) writes:
>>>.. what real business problem would be described this way? <<
> 1) Make T1, T2 and T3 into subqueries on a Sales table
> Let T1 = My sales in month (m)
> T2 = Worst sales for month (m) by someone else
> T3 = Best sales for month (m) by someone else
> Now do a little math in the SELECT to see where I fall in the range for
> each month as a percentage.

SELECT ...
FROM (SELECT m, sales)
FROM tbl
WHERE salesperson = @.me
GROUP BY m) me
JOIN (SELECT m, sales = MIN(sales)
FROM tbl
WHERE salesperson <> @.me) bad OM me.m = bad.m
JOIN (SELECT m, sales = MAX(sales)
FROM tbl
WHERE salesperson <> @.me) good OM me.m = good.m
WHERE me.sales BETWEEN AND bad.sales AND good.sales

Call it cheating if you like. I say that the BETWEEN condition is a
filter condition - not a join condition.

> 2) Make T2 and T3 the same look-up table for postal code ranges (the
> ZIP code in the US is a geographical hierarchy, so this works; I do not
> know about Sweden) -- T1.zip_code BETWEEN T2.low_zip_code AND
> T2.high_zip_code. This is VERY common in reports that use any kind of
> report hierarchy.

Either T1 or T2 are single-row tables, or, I suspect, there is some
more join condition.

> Another example of the ease of reading the "old way"
> SELECT *
> FROM T1, T2. T3
> WHERE 42 IN (T1.a, T2.b, T3.c);
> Versus somethging like this:
> SELECT *
> FROM T1
> INNER JOIN
> T2
> ON (1 = 1) -- actually a cross join in disguise
> INNER JOIN
> T3
> ON 42 IN (T1.a, T2.b, T3.c);

You can write:

SELECT ...
FROM T1
CROSS JOIN T2
CROSS JOIN T3
WHERE 42 IN (T1.a, T2.b, T3.c)

And I have might have written a few queries like this. The situations
where the JOIN condition does not fit JOIN ... ON are so rare, that
the extra CROSS JOIN is not a problem. And you see very clear that the
cross join is intended.

Now, if only this had been outlawed:

SELECT *
FROM A
JOIN B ON A.col = A.col

I would have been saved some stupid errors.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

No comments:

Post a Comment