Quick Links

Re: selects with large offset really slow

From:	wsheldah(at)lexmark(dot)com
To:	John Smith <john_smith_45678(at)yahoo(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: selects with large offset really slow
Date:	2003-02-07 22:10:20
Message-ID:	OFBDA18151.F73A0FC3-ON85256CC6.007991B3@lexmark.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

If PG uses the url index before it does the join, it may be fetching rows
that won't satisfy the join criteria; to be accurate, it really needs to do
the join first, before doing the limit and offset. Since the index is on
the whole column and not just on the join results, I don't think it can be
used the way you're thinking. Does this make sense at all?

Wes Sheldahl

John Smith <john_smith_45678(at)yahoo(dot)com>@postgresql.org on 02/07/2003
03:54:21 PM

Sent by: pgsql-general-owner(at)postgresql(dot)org

To: pgsql-general(at)postgresql(dot)org
cc:
Subject: Re: [GENERAL] selects with large offset really slow

Thanks, I'll try those suggestions. But...
Why can't PG just use an index? Say, look at the index for 'url', go to
entry 90000, then get the next 100 entries? I was suprised that it
retrieves *all* records then sorts them (when there's already a sorted
index). I'm trying to switch from mysql - the same exact query with it is
very fast with 100-500K+ rows, and a large offset doesn't seem to affect
the query's speed.
John
Richard Huxton <dev(at)archonet(dot)com> wrote:On Friday 07 Feb 2003 5:01 am,
John Smith wrote:
> There are 90K-100K records in each of two tables. This simple join is
> really slow and the larger the offset, the longer it takes. Anything I
can
> do to speed it up (a lot)? I've double-checked and there are indexes on
> everything used for joins and ordering.

> QUERY PLAN
>
---------------------------------------------------------------------------
>-----------------------
Limit (cost=19546.62..19546.87 rows=100 width=62)
> (actual time=20557.00..20558.00 rows=100 loops=1)
> -> Sort (cost=19321.62..19571.32 rows=99881 width=62) (actual
> time=19775.00..20410.00 rows=90101 loops=1)
> Sort Key: l.url
> -> Hash Join (cost=2471.00..7662.54 rows=99881 width=62) (actual
> time=3013.00..12002.00 rows=100000 loops=1) Hash Cond: ("outer".id =
> "inner".link_id)

It's the join and sort that's getting you. PG has to fetch and sort all the
rows so it can discard 90,000 of them. I can't think of a good way for it
to
optimise this, though you might want to check your sort_mem is set high
enough.

> explain analyze select l.id, l.url
> from links l
> inner join stats s
> on l.id = s.link_id
> and s.referrer_id = 1
> order by l.url
> limit 100
> offset 90000;

There are three options you might want to look at:

1. Use a temporary table, then select from that for each page.
2. Use a cursor, and just fetch 100 records at a time from it.
3. Cheat and fetch where l.url>=X, remembering X as the highest url from
the
last set of results. This of course means pages of results will overlap.

--
Richard Huxton

---------------------------------
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now
(See attached file: C.htm)

Attachment	Content-Type	Size
C.htm	text/html	2.6 KB

Browse pgsql-general by date

	From	Date	Subject
Next Message	John Smith	2003-02-07 22:19:02	Re: selects with large offset really slow
Previous Message	Edmund Dengler	2003-02-07 21:41:47	Soft Updates/FFS and Postgresql