Re: Consistent segfault in complex query

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Kyle Samson <kysamson(at)tripadvisor(dot)com>, "pgsql-hackers\(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Matthew Kelly <mkelly(at)tripadvisor(dot)com>
Subject: Re: Consistent segfault in complex query
Date: 2018-09-12 15:24:56
Message-ID: 4752.1536765896@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Kyle" == Kyle Samson <kysamson(at)tripadvisor(dot)com> writes:
> Kyle> We encountered a query that has been able to frequently segfault
> Kyle> one of our postgres instances under certain conditions which we
> Kyle> have not fully been able to isolate for reproduction. We were
> Kyle> able to get a core dump out of one of the crashes and have poked
> Kyle> at it, but we believe the answer is beyond our knowledge of
> Kyle> postgres internals. This is on a 9.3.19 server and we saw no
> Kyle> mention of a fix in the release notes since this version and we
> Kyle> do not know if it affects later major releases as well.

> There's a relevant commit from Feb this year (ea6d67cf8) specifically
> referring to the case of CTEs inside subplans inside EvalPlanQual, which
> is exactly the scenario you have in your query. So you need to try this
> in 9.3.22 or later (ideally 9.3.24, the latest) which contain this fix.

I'm not entirely convinced that that fix will cure this, but certainly
it seems related, and we should find out whether it has any effect.

The reason this seems possibly different is that we're apparently
returning wrong data out of the sub-select (a zero Datum value, but
not marked isnull --- if it were, arraycontains wouldn't be reached).
The previously fixed bug would have caused either multiple or missed
returns of a valid CTE tuple.

> If this is indeed the problem, you may be able to narrow down the
> required conditions more tightly: the problem will occur only if the row
> to be updated was concurrently updated by another transaction.

Yeah, the presence of EvalPlanQual in the backtrace is sufficient
to confirm that. It should be pretty easy to make a reproducible
test case once you understand that prerequisite.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-09-12 16:06:35 Re: pg_dump test instability
Previous Message Amit Langote 2018-09-12 15:23:48 Re: executor relation handling