Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Date: 2017-08-05 23:03:02
Message-ID: 20170805230302.yy3uf6emcwe5ytsw@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2017-08-03 11:42:25 -0700, Peter Geoghegan wrote:
> On Thu, Aug 3, 2017 at 8:49 AM, Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> > With query #2 it ends up crashing after ~5hours and produces
> > the log in log-valgrind-2.txt.gz with some other entries than
> > case #1, but AFAICS still all about reading uninitialised values
> > in space allocated by datumCopy().
>
> Right. This part is really interesting to me:
>
> ==48827== Uninitialised value was created by a heap allocation
> ==48827== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
> ==48827== by 0x80B597: AllocSetAlloc (aset.c:771)
> ==48827== by 0x810ADC: palloc (mcxt.c:862)
> ==48827== by 0x72BFEF: datumCopy (datum.c:171)
> ==48827== by 0x81A74C: tuplesort_putdatum (tuplesort.c:1515)
> ==48827== by 0x5E91EB: advance_aggregates (nodeAgg.c:1023)
>
> If you actually go to datum.c:171, you see that that's a codepath for
> pass-by-reference datatypes that lack a varlena header. Text is a
> datatype that has a varlena header, though, so that's clearly wrong. I
> don't know how this actually happened, but working back through the
> relevant tuplesort_begin_datum() caller, initialize_aggregate(), does
> suggest some things. (tuplesort_begin_datum() is where datumTypeLen is
> determined for the entire datum tuplesort.)
>
> I am once again only guessing, but I have to wonder if this is a
> problem in commit b8d7f053. It seems likely that the problem begins
> before tuplesort_begin_datum() is even called, which is the basis of
> this suspicion. If the problem is within tuplesort, then that could
> only be because get_typlenbyval() gives wrong answers, which seems
> very unlikely.

Not saying it's not the fault of b8d7f053 et al, but I don't quite see
how - whether something is a varlena tuple or not isn't really something
expression evaluation has an influence over if it doesn't happen from
within its code. That's the responsibility of the calling code, not from
within the datum. So I don't quite understand how you got to b8d7f053?

- Andres

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2017-08-05 23:28:59 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message Noah Misch 2017-08-05 22:56:59 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2017-08-05 23:24:54 Re: [HACKERS] git.postgresql.org (and other services) down
Previous Message Noah Misch 2017-08-05 22:56:59 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values