Re: [HACKERS] What about LIMIT in SELECT ?

From: jwieck(at)debis(dot)com (Jan Wieck)
To: maillist(at)candle(dot)pha(dot)pa(dot)us (Bruce Momjian)
Cc: maillist(at)candle(dot)pha(dot)pa(dot)us, lockhart(at)alumni(dot)caltech(dot)edu, jwieck(at)debis(dot)com, eric(at)linux-hw(dot)com, jeff(at)remapcorp(dot)com, hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] What about LIMIT in SELECT ?
Date: 1998-10-14 21:05:07
Message-ID: m0zTY6V-000EBRC@orion.SAPserv.Hamburg.dsh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

> I have had more time to think about this. Basically, for pre-sorted
> data, our psort code is very fast, because it does not need to sort
> anything. It just moves the rows in and out of the sort memory. Yes,
> it could be removed in some cases, and probably should be, but it is not
> going to produce great speedups.

And I got the time to hack around about this.

I hacked in a little check into the planner, that compares
the sortClause against the key field list of an index scan
and just suppresses the sort node if it exactly matchs and
all sort operators are "<".

I tested with a 10k row table where key is a text field. The
base query is a

SELECT ... WHERE key > 'val' ORDER BY key;

The used 'val' is always a key that is close to the first of
all keys in the table ('' on the first query and the last
selected value on subsequent ones).

Scenario 1 (S1) uses exactly the above query but processes
only the first 20 rows from the result buffer. Thus the
frontend receives nearly the whole table.

Scenario 2 (S2) uses a cursor and FETCH 20. But closes the
cursor and creates a new one for the next selection (only
with another 'val') as it would occur in a web application.

If there is no index on key, the backend will allways do a
Sort->SeqScan and due to the 'val' close to the lowest
existing key nearly all tuples get scanned and put into the
sort. S1 here runs about 10 seconds and S2 about 6 seconds.
The speedup in S2 results from the reduced overhead of
sending not wanted tuples into the frontend.

Now with a btree index on key and an unpatched backend.
Produced plan is always a Sort->IndexScan. S1 needs 16
seconds and S2 needs 12 seconds. Again nearly all data is put
into the sort but this time over the index scan and that is
slower.

Last with the btree index on key and the patched backend.
This time the plan is a plain IndexScan because the ORDER BY
clause exactly matches the sort order of the choosen index.
S1 needs 13 seconds and S2 less than 0.2! This dramatic
speedup comes from the fact, that this time the index scan is
the toplevel executor node and the executor run is stopped
after 20 tuples have been selected.

Analysis of the above timings:

If there is an ORDER BY clause, using an index scan is the
clever way if the indexqual dramatically reduces the the
amount of data selected and sorted. I think this is the
normal case (who really selects nearly all rows from a 5M row
table?). So choosing the index path is correct. This will
hurt if someone really selects most of the rows and the index
scan jumps over the disc. But here the programmer should use
an unqualified query to perform a seqscan and do the
qualification in the frontend application.

The speedup for the cursor/fetch scenario is so impressive
that I'll create a post 6.4 patch. I don't want it in 6.4
because there is absolutely no query in the whole regression
test, where it suppresses the sort node. So we have
absolutely no check that it doesn't break anything.

For a web application, that can use a unique key to select
the next amount of rows, it will be a big win.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck(at)debis(dot)com (Jan Wieck) #

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Bruce Momjian 1998-10-15 05:52:06 Re: [HACKERS] What about LIMIT in SELECT ?
Previous Message Bruce Momjian 1998-10-14 18:27:05 Re: [HACKERS] What about LIMIT in SELECT ?

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 1998-10-14 21:06:19 Re: [HACKERS] PostgreSQL v6.4 BETA2 ...
Previous Message Marc G. Fournier 1998-10-14 19:29:03 Re: [HACKERS] PostgreSQL v6.4 BETA2 ...