Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-03-14 18:41:09
Message-ID: CAAaqYe_NjXv9TZ3rjO=hMzL3KvEj-9Ado3qUKRS3F0DCcS3R4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 14, 2020 at 12:36 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Sat, Mar 14, 2020 at 12:24 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> >
> > On Sat, Mar 14, 2020 at 12:07 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Mar 13, 2020 at 8:23 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> > > >
> > > > On Friday, March 13, 2020, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > > >>
> > > >> On Fri, Mar 13, 2020 at 04:31:16PM -0400, James Coleman wrote:
> > > >>>
> > > >>> On Fri, Mar 13, 2020 at 2:23 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> > > >>>>
> > > >>>>
> > > >>>> On Tue, Mar 10, 2020 at 10:44 PM Tomas Vondra
> > > >>>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > > >>>> > 3) Most of the execution plans look reasonable, except that some of the
> > > >>>> > plans look like this:
> > > >>>> >
> > > >>>> >
> > > >>>> > QUERY PLAN
> > > >>>> > ---------------------------------------------------------
> > > >>>> > Limit
> > > >>>> > -> GroupAggregate
> > > >>>> > Group Key: t.a, t.b, t.c, t.d
> > > >>>> > -> Incremental Sort
> > > >>>> > Sort Key: t.a, t.b, t.c, t.d
> > > >>>> > Presorted Key: t.a, t.b, t.c
> > > >>>> > -> Incremental Sort
> > > >>>> > Sort Key: t.a, t.b, t.c
> > > >>>> > Presorted Key: t.a, t.b
> > > >>>> > -> Index Scan using t_a_b_idx on t
> > > >>>> > (10 rows)
> > > >>>> >
> > > >>>> > i.e. there are two incremental sorts on top of each other, with
> > > >>>> > different prefixes. But this this is not a new issue - it happens with
> > > >>>> > queries like this:
> > > >>>> >
> > > >>>> > SELECT a, b, c, d, count(*) FROM (
> > > >>>> > SELECT * FROM t ORDER BY a, b, c
> > > >>>> > ) foo GROUP BY a, b, c, d limit 1000;
> > > >>>> >
> > > >>>> > i.e. there's a subquery with a subset of pathkeys. Without incremental
> > > >>>> > sort the plan looks like this:
> > > >>>> >
> > > >>>> > QUERY PLAN
> > > >>>> > ---------------------------------------------
> > > >>>> > Limit
> > > >>>> > -> GroupAggregate
> > > >>>> > Group Key: t.a, t.b, t.c, t.d
> > > >>>> > -> Sort
> > > >>>> > Sort Key: t.a, t.b, t.c, t.d
> > > >>>> > -> Sort
> > > >>>> > Sort Key: t.a, t.b, t.c
> > > >>>> > -> Seq Scan on t
> > > >>>> > (8 rows)
> > > >>>> >
> > > >>>> > so essentially the same plan shape. What bugs me though is that there
> > > >>>> > seems to be some sort of memory leak, so that this query consumes
> > > >>>> > gigabytes os RAM before it gets killed by OOM. But the memory seems not
> > > >>>> > to be allocated in any memory context (at least MemoryContextStats don't
> > > >>>> > show anything like that), so I'm not sure what's going on.
> > > >>>> >
> > > >>>> > Reproducing it is fairly simple:
> > > >>>> >
> > > >>>> > CREATE TABLE t (a bigint, b bigint, c bigint, d bigint);
> > > >>>> > INSERT INTO t SELECT
> > > >>>> > 1000*random(), 1000*random(), 1000*random(), 1000*random()
> > > >>>> > FROM generate_series(1,10000000) s(i);
> > > >>>> > CREATE INDEX idx ON t(a,b);
> > > >>>> > ANALYZE t;
> > > >>>> >
> > > >>>> > EXPLAIN ANALYZE SELECT a, b, c, d, count(*)
> > > >>>> > FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d
> > > >>>> > LIMIT 100;
> > > >>>>
> > > >>>> While trying to reproduce this, instead of lots of memory usage, I got
> > > >>>> the attached assertion failure instead.
> > > >>>
> > > >>>
> > > >>> And, without the EXPLAIN ANALYZE was able to get this one, which will
> > > >>> probably be a lot more helpful.
> > > >>>
> > > >>
> > > >> Hmmm, I'll try reproducing it, but can you investigate the values in the
> > > >> Assert? I mean, it fails on this:
> > > >>
> > > >> Assert(total_allocated == context->mem_allocated);
> > > >>
> > > >> so can you get a core or attach to the process using gdb, and see what's
> > > >> the expected / total value?
> > >
> > > I've reproduced this on multiple machines (though all are Ubuntu or
> > > Debian derivatives...I don't think that's likely to matter). A core
> > > dump is ~150MB, so I've uploaded to Dropbox [1].
> > >
> > > I didn't find an obvious first-level member of Tuplesortstate that was
> > > covered by either of the two blocks in the AllocSet (both are 8KB in
> > > size).
> > >
> > > James
> > >
> > > [1]: https://www.dropbox.com/s/jwndwp4634hzywk/aset_assertion_failure.core?dl=0
> >
> > And...I think I might have found out the issue (though haven't proved
> > it 100% yet or fixed it):
> >
> > The incremental sort node calls `tuplesort_puttupleslot`, which
> > switches the memory context to `sortcontext`. It then calls
> > `puttuple_common`. `puttuple_common` may then call `grow_memtuples`
> > which reallocs space for `sortstate->memtuples`, but `memtuples` is
> > elsewhere allocated in the memory context maincontext.
> >
> > I had earlier in this debugging process noticed that `sortcontext` was
> > allocated in `maincontext`, which seemed conceptually odd if our goal
> > is to reuse the sort state, and I also found a comment that needed to
> > be changed relative to cleaning up the per-sort context (that talks
> > about it freeing the sort state itself), but the `memtuples` array was
> > in fact freed additionally at reset, so it seemed safe.
> >
> > Given this issue though, I think I'm going to go ahead and rework so
> > that the `memtuples` array lies within the `sortcontext` instead.
>
> Perhaps I spoke too soon: I didn't realize repalloc(_huge) didn't need
> a memory context switch, so this likely isn't the issue.

It looks like the issue is actually into the `tuplecontext`, which is
currently a child context of `sortcontext`:

#3 0x0000558cd153b565 in AllocSetCheck
(context=context(at)entry=0x558cd28e0b70) at aset.c:1573
1573 Assert(total_allocated == context->mem_allocated);
(gdb) p total_allocated
$1 = 16384
(gdb) p context->mem_allocated
$2 = 8192
(gdb) p context->name
$3 = 0x558cd16c8ccd "Caller tuples"

I stuck in several more AllocSetCheck calls in aset.c and got the
attached backtrace.

James

Attachment Content-Type Size
updated_aset_bt.txt text/plain 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-03-14 18:45:35 Re: PATCH: add support for IN and @> in functional-dependency statistics use
Previous Message Paul A Jungwirth 2020-03-14 18:13:54 Re: range_agg