Query across a date range

From: David Jaquay <djaquay(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Query across a date range
Date: 2005-01-07 19:17:31
Message-ID: ad4aa5a805010711171a3c1696@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Summary: Doing a two or three table join for a date range performs
worse than doing the same query individually for each date in the
range.

What works: Doing a query just on a single date or a date range
(against just one table) runs quick; 'explain' says it uses an index
scan. Doing a query on a single date for one store or for one market
uses all index scans, and runs quick as well.

The problem: Doing a query for a date range on a particular store or
market, though, for a date range of more than a few days does a
sequential scan of sales_tickets, and performs worse than doing one
single date query for each date. My 'explain' for one such query is
below.

Background: I have two or three tables involved in a query. One table
is holds stores (7 rows at present), one holds sales tickets (about 5
million) and one holds line items (about 10 million). It's test data
that I've generated and loaded using '\copy from'. Each has a primary
key, and line items have two dates, written and delivered, that are
indexed individually. Store has a market id; a market contains
multiple stores (in my case, 2 or 3). Each sales ticket has 1-3 line
items.

Is there a way to tell postgres to use an index scan on sales_tickets?

Curiously, in response to recent postings in the "Low Performance for
big hospital server" thread, when I flatten the tables by putting
storeid into line_items, it runs somewhat faster in all cases, and
much faster in some; (I have times, if anyone is interested).

Thanks,
Dave

mydb=> explain select * from line_items t, sales_tickets s where
writtenDate >= '12/01/2002' and writtenDate <= '12/31/2002' and
t.ticketId = s.ticketId and s.storeId = 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Hash Join (cost=93865.46..114054.74 rows=19898 width=28)
Hash Cond: ("outer".ticketId = "inner".ticketId)
-> Index Scan using line_items_written on line_items t
(cost=0.00..3823.11 rows=158757 width=16)
Index Cond: ((writtendate >= '2002-12-01'::date) AND
(writtendate <= '2002-12-31'::date))
-> Hash (cost=89543.50..89543.50 rows=626783 width=12)
-> Seq Scan on sales_tickets s (cost=0.00..89543.50
rows=626783 width=12)
Filter: (storeid = 1)
(7 rows)

mydb=> explain select * from line_items t, sales_tickets s where
writtenDate = '12/01/2002' and t.ticketId = s.ticketId and s.storeid =
1;
QUERY PLAN
---------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..16942.25 rows=697 width=28)
-> Index Scan using line_items_written on soldtrx t
(cost=0.00..121.97 rows=5554 width=16)
Index Cond: (writtendate = '2002-12-01'::date)
-> Index Scan using sales_tickets_pkey on sales_tickets s
(cost=0.00..3.02 rows=1 width=12)
Index Cond: ("outer".ticketId = s.ticketId)
Filter: (storeid = 1)
(6 rows)

The tables:

create table stores -- 7 rows
(
storeId integer not null,
marketId integer not null
);

create table sales_tickets -- 500,000 rows
(
ticketId integer primary key,
storeId integer not null,
custId integer not null
);

create table line_items -- 1,000,000 rows
(
lineItemId integer primary key,
ticketId integer references sales_tickets,
writtenDate date not null,
deliveredDate date not null
);

create index line_items_written on line_items (writtenDate);
create index line_items_delivered on line_items (deliveredDate);

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-01-07 19:35:04 Re: Query across a date range
Previous Message Steve Poe 2005-01-07 18:48:52 Re: Does "HYPERTHREADING" do any harm if we use with RH9.0