Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-03-13 18:23:10
Message-ID: CAAaqYe8-TCKPskcMytCMX2aM8QnnrgcJP6=tSnJOuQ2CcuQJfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 10, 2020 at 10:44 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> 3) Most of the execution plans look reasonable, except that some of the
> plans look like this:
>
>
> QUERY PLAN
> ---------------------------------------------------------
> Limit
> -> GroupAggregate
> Group Key: t.a, t.b, t.c, t.d
> -> Incremental Sort
> Sort Key: t.a, t.b, t.c, t.d
> Presorted Key: t.a, t.b, t.c
> -> Incremental Sort
> Sort Key: t.a, t.b, t.c
> Presorted Key: t.a, t.b
> -> Index Scan using t_a_b_idx on t
> (10 rows)
>
> i.e. there are two incremental sorts on top of each other, with
> different prefixes. But this this is not a new issue - it happens with
> queries like this:
>
> SELECT a, b, c, d, count(*) FROM (
> SELECT * FROM t ORDER BY a, b, c
> ) foo GROUP BY a, b, c, d limit 1000;
>
> i.e. there's a subquery with a subset of pathkeys. Without incremental
> sort the plan looks like this:
>
> QUERY PLAN
> ---------------------------------------------
> Limit
> -> GroupAggregate
> Group Key: t.a, t.b, t.c, t.d
> -> Sort
> Sort Key: t.a, t.b, t.c, t.d
> -> Sort
> Sort Key: t.a, t.b, t.c
> -> Seq Scan on t
> (8 rows)
>
> so essentially the same plan shape. What bugs me though is that there
> seems to be some sort of memory leak, so that this query consumes
> gigabytes os RAM before it gets killed by OOM. But the memory seems not
> to be allocated in any memory context (at least MemoryContextStats don't
> show anything like that), so I'm not sure what's going on.
>
> Reproducing it is fairly simple:
>
> CREATE TABLE t (a bigint, b bigint, c bigint, d bigint);
> INSERT INTO t SELECT
> 1000*random(), 1000*random(), 1000*random(), 1000*random()
> FROM generate_series(1,10000000) s(i);
> CREATE INDEX idx ON t(a,b);
> ANALYZE t;
>
> EXPLAIN ANALYZE SELECT a, b, c, d, count(*)
> FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d
> LIMIT 100;

While trying to reproduce this, instead of lots of memory usage, I got
the attached assertion failure instead.

James

Attachment Content-Type Size
assertion_bt.txt text/plain 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-03-13 18:44:33 Re: proposal: schema variables
Previous Message Tomas Vondra 2020-03-13 18:01:50 Re: [PATCH] Incremental sort (was: PoC: Partial sort)