Wednesday, March 21, 2012

Query performance on partitioned table with partitioned-index

Hi all,

I noticed that there are some big query performance difference when using partitioned-index on partitioned table vs. normal index on normal table. I posted this question again with more details and hope some gurus from Microsoft can help.

I have this huge table CHECKS with over 300million rows running on SQL 2000. Since we are going to upgrade to SQL 2005, so I am thinking of partition it to make it more scalable. So I copied it over to a SQL 2005 box and partition it into about 20 pieces based on the column PDATE and named this new partitioned table as CHECKS_NEW.

I created same indexes on this CHECKS_NEW table and used partitioned index for all non-unique indexes. Then I ran some queries on both tables and noticed the performances are comparable (differences are 1 or 2 seconds) for some queries. But for some queries, the partitioned table gives a much worse performance. For example, for following query, (I have partitioned index on both checknum and account column based on pdate column)

Select*fromchecks_new Where checknum ='1743'and account ='8888888888'

it can take several minutes to finish while it only take a couple of seconds on CHECKS table. So I took a look at the query plan and I noticed that when running this query on CHECKS table, it uses both indexes to do "index seek" and then do a "hash match", however it only uses index on checknum column to do index scan when running on the partitioned table even though the DENSITY value on checknum is even higher than on account column. All the statistics are newly created and no data changes have been made on both tables except building indexes.

Another interesting sympton is when I set "1743" to another value, sometimes the query plan can suddenly become smarter and uses both indexes to do index scan.

Could somebody help me with this?

thanks.

New update:

I just applied SQL 2005 SP2 since in its fixed bug list that there are a couple related with query optimizer.

Out of 6 or 7 query samples I collected which have a slower performance in SQL 2005, only 1 or 2 queries now performs as good as in SQL 2000.

Before this experience with SQL Server, I was always thinking MSSQL is sub-par with Oracle or other databases is more related with the windows OS, now I have realized that the database itself still has a long way to go.

|||Could you please provide more specifics? The table/index creation scripts and the partitioning function/schema would be good as well as some rough data on the cardinality/density and the exact queries.|||

Table partitioning in SQL Server 2005 is really more for ease of administration and maintenance than for performance or scalability. Table partitioning is a fairly advanced technique that you may not want to use until you are more comfortable with SQL Server 2005. A table with 300 million rows is not very large by SQL Server 2005 standards, especially with proper indexing.

You should give us your table and index creation scripts for this table, along with your partition function and partition schema. Regarding the query, you should not use SELECT *. Do you actually need all of the columns in the table for the query? Even if you do, it is better to explicitly list them. You should also schema qualify your table name, i.e. SELECT col1, col2, col3 FROM dbo.checks_new WHERE CheckNum = 1743 AND Account = 888888 (the default schema in SQL Server 2005 is dbo).

What are the datatypes for checknum and account? Are they int, bigint or varchar? If they are int datatypes, the way you have specified '1743' with single quotes causes an implicit data conversion, which will make your query much less efficient.

I would also like to see your query plans and the results from SET STATISTICS IO ON for these two queries.

|||

In my case, it is more for scalability. With over 400million rows and loading half million rows every day, the loading process takes longer and longer.

I am just looking for query performance same or at least comparable after partitioning, not tens or hundreds times slower!

Here are two samples I did comparison between original (non-partitioned) and partitioned table in SQL 2005 SP2.

For sample 1, the time difference is between 0.5 seconds to around 18 seconds. It may not sound much, but the query is generated through web-access whenever a customer try to get their check info. So it is very significant to the application.

Here is the definition for the CHECKS table (have non-unique,non-cluster index on each column of account,checknum,amount,bdate,pdate):

Checks and checks_old tables are exactly same except checks table is partitioned with partitioned index on PDATE column.

CREATE TABLE [tower].[checks] (

[npages] [int] NULL ,[ifnds] [int] NULL ,[ifnid] [int] NULL ,[dtype] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[account] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[checknum] [varchar] (6) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[amount] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[branch] [smallint] NULL ,[bdate] [datetime] NULL ,[routetr] [varchar] (9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[pdate] [datetime] NULL ,

[wdate] [datetime] NULL ,[seqnum] [int] NULL ,[dinnum] [int] NULL ,[status] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[slevel] [smallint] NULL ,[ddaseq] [int] NULL ,[towid] [int] IDENTITY (1, 1) NOT NULL

) ON pDateRangePS (pdate);

CREATE PARTITION FUNCTION pDateRangePF (datetime) AS RANGE RIGHT FOR VALUES (

convert(datetime,'20021001',112),convert(datetime,'20030101',112),convert(datetime,'20030401',112),convert(datetime,'20030701',112),convert(datetime,'20031001',112),

convert(datetime,'20040101',112),convert(datetime,'20040401',112),convert(datetime,'20040701',112),convert(datetime,'20041001',112),convert(datetime,'20050101',112),

convert(datetime,'20050401',112),convert(datetime,'20050701',112),convert(datetime,'20051001',112),convert(datetime,'20060101',112),convert(datetime,'20060401',112),

convert(datetime,'20060701',112),convert(datetime,'20061001',112),convert(datetime,'20070101',112)

);

CREATE PARTITION SCHEME pDateRangePS AS PARTITION pDateRangePF TO (

CHKS200205,CHKS200210,CHKS200301,CHKS200304,CHKS200307,CHKS200310,CHKS200401,CHKS200404,CHKS200407,CHKS200410,CHKS200501,

CHKS200504,CHKS200507,CHKS200510,CHKS200601,CHKS200604,CHKS200607,CHKS200610,CHKS200701

);

Query:

select npages,ifnds,ifnid,checknum,amount,pdate,status from tower.checks_old
where
account = 'XXXXXXXXXX' and checknum = '4876' and bdate = '12-13-2006' and amount = '20.00' and status not in ('invalid acct', 'uncollected', 'pending')

Execution time(mi:ss): 00:00 - GOOD!

IO stats:
(1 row(s) affected)
Table 'checks_old'. Scan count 2, logical reads 121, physical reads 8, read-ahead reads 112, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Query plan:
select npages,ifnds,ifnid,checknum,amount,pdate,status from tower.checks_old --with (index(checks_idx_acct,checks_idx_bdate)) where account = 'XXXXXXXXXX' and checknum = '4876' and bdate = '12-13-2006' and amount = '20.00' and status not in ('invalid acct', 'uncollected', 'pending')
|--Filter(WHERE:([towerdb].[tower].[checks_old].[bdate]='2006-12-13 00:00:00.000' AND [towerdb].[tower].[checks_old].[amount]='20.00' AND ([towerdb].[tower].[checks_old].[status]<'invalid acct' OR [towerdb].[tower].[checks_old].[status]>'invalid acct') AND ([towerdb].[tower].[checks_old].[status]<'pending' OR [towerdb].[tower].[checks_old].[status]>'pending') AND ([towerdb].[tower].[checks_old].[status]<'uncollected' OR [towerdb].[tower].[checks_old].[status]>'uncollected')))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000]))
|--Hash Match(Inner Join, HASH:([Bmk1000])=([Bmk1000]), RESIDUAL:([Bmk1000] = [Bmk1000]))
| |--Index Seek(OBJECT:([towerdb].[tower].[checks_old].[INFXKSKT]), SEEK:([towerdb].[tower].[checks_old].[account]='XXXXXXXXXX') ORDERED FORWARD)
| |--Index Seek(OBJECT:([towerdb].[tower].[checks_old].[INFXKSKU]), SEEK:([towerdb].[tower].[checks_old].[checknum]='4876') ORDERED FORWARD)
|--RID Lookup(OBJECT:([towerdb].[tower].[checks_old]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

Statistics for INDEX 'INFXKSKT'.
--

Updated Rows Rows Sampled Steps Density Average Key Length
--
INFXKSKT Feb 6 2007 9:16AM 490713145 973366 173 0.5091495 9.749025 YES

All Density Average Length Columns
--
1.652082E-06 9.749025 account

Statistics for INDEX 'INFXKSKU'.
--

Updated Rows Rows Sampled Steps Density Average Key Length
--
INFXKSKU Feb 12 2007 9:58AM 490713145 973303 184 0.1064844 3.811202 YES

All Density Average Length Columns
--
1.099505E-05 3.811202 checknum


Query:

select npages,ifnds,ifnid,checknum,amount,pdate,status from tower.checks --with (index(checks_idx_acct))
where
account = 'XXXXXXXXXX' and checknum = '4876' and bdate = '12-13-2006' and amount = '20.00' and status not in ('invalid acct', 'uncollected', 'pending')
--
Execution time(mi:ss): 00:18 BAD!

IO stats:
(1 row(s) affected)
Table 'checks'. Scan count 19, logical reads 32712, physical reads 36, read-ahead reads 32508, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

IO stats: (with the hint, execution time 00:01 --GOOD! )
(1 row(s) affected)
Table 'checks'. Scan count 19, logical reads 678, physical reads 68, read-ahead reads 4808, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Query plan (without hint):

select npages,ifnds,ifnid,checknum,amount,pdate,status from tower.checks --with (index(checks_idx_acct,checks_idx_bdate)) where account = 'XXXXXXXXXX' and checknum = '4876' and bdate = '12-13-2006' and amount = '20.00' and status not in ('invalid acct', 'uncollected', 'pending')
|--Filter(WHERE:([towerdb].[tower].[checks].[account]='XXXXXXXXXX' AND [towerdb].[tower].[checks].[bdate]='2006-12-13 00:00:00.000' AND [towerdb].[tower].[checks].[amount]='20.00' AND ([towerdb].[tower].[checks].[status]<'invalid acct' OR [towerdb].[tower].[checks].[status]>'invalid acct') AND ([towerdb].[tower].[checks].[status]<'pending' OR [towerdb].[tower].[checks].[status]>'pending') AND ([towerdb].[tower].[checks].[status]<'uncollected' OR [towerdb].[tower].[checks].[status]>'uncollected')))
|--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1005], [Bmk1000], [Expr1010]) PARTITION ID:([PtnIds1005]) WITH UNORDERED PREFETCH)
|--Compute Scalar(DEFINE:([PtnIds1005]=RangePartitionNew([towerdb].[tower].[checks].[pdate],(1),'2002-10-01 00:00:00.000','2003-01-01 00:00:00.000','2003-04-01 00:00:00.000','2003-07-01 00:00:00.000','2003-10-01 00:00:00.000','2004-01-01 00:00:00.000','2004-04-01 00:00:00.000','2004-07-01 00:00:00.000','2004-10-01 00:00:00.000','2005-01-01 00:00:00.000','2005-04-01 00:00:00.000','2005-07-01 00:00:00.000','2005-10-01 00:00:00.000','2006-01-01 00:00:00.000','2006-04-01 00:00:00.000','2006-07-01 00:00:00.000','2006-10-01 00:00:00.000','2007-01-01 00:00:00.000')))
| |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1006]) PARTITION ID:([PtnIds1006]))
| |--Constant Scan(VALUES:(((1)),((2)),((3)),((4)),((5)),((6)),((7)),((8)),((9)),((10)),((11)),((12)),((13)),((14)),((15)),((16)),((17)),((18)),((19))))
| |--Index Seek(OBJECT:([towerdb].[tower].[checks].[checks_idx_chknum]), SEEK:([towerdb].[tower].[checks].[checknum]='4876') ORDERED FORWARD PARTITION ID:([PtnIds1006]))
|--RID Lookup(OBJECT:([towerdb].[tower].[checks]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD PARTITION ID:([PtnIds1005]))


Statistics for INDEX 'checks_idx_chknum'.
--

Updated Rows Rows Sampled Steps Density Average Key Length
--
checks_idx_chknum Feb 13 2007 1:06PM 490714960 490714960 195 0.03139412 11.80515 YES

All Density Average Length Columns
--
7.362998E-08 3.805163 checknum
2.338586E-09 11.80515 checknum, pdate

Statistics for INDEX 'checks_IDX_ACCT'.
--

Updated Rows Rows Sampled Steps Density Average Key Length
--
checks_IDX_ACCT Feb 13 2007 11:19AM 490714960 490714960 186 0.07741183 17.74232 YES

All Density Average Length Columns
--
2.701669E-08 9.742324 account
2.140619E-09 17.74232 account, pdate

Sample 2: an AD-HOC query against STMTS table, partitioned table performs much worse.

Here is the definition of the table, STMTS is the original non-partitioned table, STMTS_NEW is the new partitioned table, all have indexes on SSN,ACCOUNT and other columns.

CREATE TABLE [tower].[stmts_new] (

[npages] [int] NULL ,[ifnds] [int] NULL ,[ifnid] [int] NULL ,[dtype] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[account] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[instnum] [varchar] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[ssn] [varchar] (11) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

[stdate] [datetime] NULL ,[slevel] [smallint] NULL ,[towid] [int] IDENTITY (1, 1) NOT NULL

) ON stDateRangePS (stdate) ;

CREATE PARTITION FUNCTION stDateRangePF (datetime) AS RANGE RIGHT FOR VALUES (

convert(datetime,'20020701',112),convert(datetime,'20030101',112), convert(datetime,'20030701',112),

convert(datetime,'20040101',112), convert(datetime,'20040701',112),convert(datetime,'20050101',112), convert(datetime,'20050701',112),

convert(datetime,'20060101',112), convert(datetime,'20060701',112)

);

CREATE PARTITION SCHEME stDateRangePS AS PARTITION stDateRangePF TO (

STMTS200201, STMTS200207,STMTS200301, STMTS200307,STMTS200401, STMTS200407,STMTS200501, STMTS200507,

STMTS200601, STMTS200607

);

select max(ssn) from tower.stmts where ssn>'077777777'

Execution time (mi:ss) 00:00 -GOOD!
IO stats:
(1 row(s) affected)
Table 'stmts'. Scan count 1, logical reads 4, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Query plan:
select max(ssn) from tower.stmts where ssn>'077777777'
|--Stream Aggregate(DEFINE:([Expr1004]=MAX([towerdb].[tower].[stmts].[ssn])))
|--Top(TOP EXPRESSION:((1)))
|--Index Seek(OBJECT:([towerdb].[tower].[stmts].[INEWEFQZ]), SEEK:([towerdb].[tower].[stmts].[ssn] > '077777777') ORDERED BACKWARD)

Statistics for INDEX 'INEWEFQZ':
--

Updated Rows Rows Sampled Steps Density Average Key Length
--
INEWEFQZ Feb 5 2007 5:01PM 153149497 579099 188 0.405048 9 YES

All Density Average Length Columns
--
3.794807E-06 9 ssn

Query:

select max(ssn) from tower.stmts_new where ssn>'077777777'

Execution time (mi:ss) 02:01 -- VERY BAD!

IO stats:
(1 row(s) affected)
Table 'stmts_new'. Scan count 10, logical reads 650415, physical reads 38, read-ahead reads 650367, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Query plan:
select max(ssn) from tower.stmts_new --with (index(stmts_idx_acct,stmts_idx_ssn)) where ssn>'077777777'
|--Stream Aggregate(DEFINE:([Expr1004]=MAX([partialagg1006])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1006]=MAX([towerdb].[tower].[stmts_new].[ssn])))
|--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1007]) PARTITION ID:([PtnIds1007]))
|--Parallelism(Distribute Streams, Demand Partitioning)
| |--Constant Scan(VALUES:(((1)),((2)),((3)),((4)),((5)),((6)),((7)),((8)),((9)),((10))))
|--Index Seek(OBJECT:([towerdb].[tower].[stmts_new].[stmts_idx_ssn]), SEEK:([towerdb].[tower].[stmts_new].[ssn] > '077777777') ORDERED FORWARD PARTITION ID:([PtnIds1007]))

Statistics for INDEX 'stmts_idx_ssn'.
--

Updated Rows Rows Sampled Steps Density Average Key Length
--
stmts_idx_ssn Feb 7 2007 2:50PM 153153079 153153079 198 0.07559369 16.99962 YES

All Density Average Length Columns
--
8.648554E-08 9 ssn
7.8114E-09 16.99962 ssn, stdate


|||

The last query looks like it might be an optimizer bug because the partitioned plan is scanning forwards instead of backwards. I'll have someone from the optimizer test team follow up on this.

|||

The reason the last query is so much worse is the way the index is organized: in the non-partitioned case, if we seek the index to get the first qualifying row (scanning backward) we would be guaranteed to get the maximum, so we generate the TOP-over-index-seek plan you see. In the partitioned case, since the index is partitioned on a column that is not the leading index key, this simple plan wouldn't work (we don't know which of the partitions has the maximum value). Hence we fall back on scanning the entire index which is of course much slower than the simple plan of the non-partitioned case.

Conceptually, it seems to me that we could make the partitioned case perform almost as well as the non-partitioned by getting the maximum from each partition and then getting the maximum of those. I'm not sure if this is something that is just not implemented in the server or if there is a deeper technical reason why this is so. I'll investigate more in this and your other bad performing queries and check if there are any workarounds you can use and get back to you.

We really do appreciate your input on this. We're working hard on improving the performance of partitioned tables in the future releases of SQL Server and hopefully cut down on performance regression incidents like the ones you're experiencing. As you can see by the above example it's not always straightforward to make all queries perform as well in non-partitioned as in partitioned tables so some performance tuning is unfortunately required after partitioning.

No comments:

Post a Comment