Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2018-09-06 18:04:44
Message-ID: CAAaqYe_tFxFN-+ESfvw5iMEc9GhX1FuMHhr0xZ2tUNFxkvJ46w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I disagree because it's not ideal to basically have to append pk to every
index in the system just to get the ability to tie-break when there's
actually very low likelihood of ties anyway.

A similar use case is trying to batch through a result set, in which case
you need a stable sort as well.

The benefit is retaining the generality of indexes (and saving space in
them etc.) while still allowing using them for more specific sorts. Any
time you paginate or batch this way you benefit from this, which in many
applications applies to a very high percentage of indexes.

On Thu, Sep 6, 2018 at 10:39 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> James Coleman <jtc331(at)gmail(dot)com> writes:
> > A fairly common planning problem for us is what we call "most recent
> first" queries; i.e., "the 50 most recent <table> rows for a <foreign key>".
>
> > Here's a basic setup:
>
> > -- created_at has very high cardinality
> > create table foo(pk serial primary key, owner_fk integer, created_at
> timestamp);
> > create index idx_foo_on_owner_and_created_at on foo(owner_fk,
> created_at);
>
> > -- technically this data guarantees unique created_at values,
> > -- but there's no reason it couldn't be modified to have a few
> > -- random non-unique values to prove the point
> > insert into foo(owner_fk, created_at)
> > select i % 100, now() - (i::text || ' minutes')::interval
> > from generate_series(1, 1000000) t(i);
>
> > And here's the naive query to get the results we want:
>
> > select *
> > from foo
> > where owner_fk = 23
> > -- pk is only here to disambiguate/guarantee a stable sort
> > -- in the rare case that there are collisions in the other
> > -- sort field(s)
> > order by created_at desc, pk desc
> > limit 50;
>
> If you're concerned about the performance of this case, why don't you make
> an index that actually matches the query?
>
> regression=# create index on foo (owner_fk, created_at, pk);
> CREATE INDEX
> regression=# explain analyze select * from foo where owner_fk = 23 order
> by created_at desc, pk desc limit 50;
>
> QUERY PLAN
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.42..110.92 rows=50 width=16) (actual time=0.151..0.280
> rows=50 loops=1)
> -> Index Only Scan Backward using foo_owner_fk_created_at_pk_idx on
> foo (cost=0.42..20110.94 rows=9100 width=16) (actual time=0.146..0.255
> rows=50 loops=1)
> Index Cond: (owner_fk = 23)
> Heap Fetches: 50
> Planning Time: 0.290 ms
> Execution Time: 0.361 ms
> (6 rows)
>
> There may be use-cases for Alexander's patch, but I don't find this
> one to be terribly convincing.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-09-06 18:56:27 Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace
Previous Message R, Siva 2018-09-06 18:02:20 Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun