From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)? |
Date: | 2018-03-04 03:07:19 |
Message-ID: | acdb289e-5ce7-c5fa-2d93-295ce073a99b@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 03/04/2018 03:40 AM, Andres Freund wrote:
>
>
> On March 3, 2018 6:36:51 PM PST, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On 03/04/2018 03:20 AM, Thomas Munro wrote:
>>> Hi,
>>>
>>> I saw a one-off failure like this:
>>>
>>> QUERY PLAN
>>>
>> --------------------------------------------------------------------------
>>> Aggregate (actual rows=1 loops=1)
>>> ! -> Nested Loop (actual rows=98000 loops=1)
>>> -> Seq Scan on tenk2 (actual rows=10 loops=1)
>>> Filter: (thousand = 0)
>>> Rows Removed by Filter: 9990
>>> ! -> Gather (actual rows=9800 loops=10)
>>> Workers Planned: 4
>>> Workers Launched: 4
>>> -> Parallel Seq Scan on tenk1 (actual rows=1960
>> loops=50)
>>> --- 485,495 ----
>>> QUERY PLAN
>>>
>> --------------------------------------------------------------------------
>>> Aggregate (actual rows=1 loops=1)
>>> ! -> Nested Loop (actual rows=97984 loops=1)
>>> -> Seq Scan on tenk2 (actual rows=10 loops=1)
>>> Filter: (thousand = 0)
>>> Rows Removed by Filter: 9990
>>> ! -> Gather (actual rows=9798 loops=10)
>>> Workers Planned: 4
>>> Workers Launched: 4
>>> -> Parallel Seq Scan on tenk1 (actual rows=1960
>> loops=50)
>>>
>>>
>>> Two tuples apparently went missing.
>>>
>>> Similar failures on the build farm:
>>>
>>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=okapi&dt=2018-03-03%2020%3A15%3A01
>>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2018-03-03%2018%3A13%3A32
>>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2018-03-03%2017%3A55%3A11
>>>
>>> Could this be related to commit
>>> 34db06ef9a1d7f36391c64293bf1e0ce44a33915 or commit
>>> 497171d3e2aaeea3b30d710b4e368645ad07ae43?
>>>
>>
>> I think the same failure (or at least very similar plan diff) was
>> already mentioned here:
>>
>> https://www.postgresql.org/message-id/17385.1520018934@sss.pgh.pa.us
>>
>> So I guess someone else already noticed, but I don't see the cause
>> identified in that thread.
>
> Robert and I started discussing it a bit over IM. No conclusion. Robert tried to reproduce locally, including disabling atomics, without luck.
>
> Can anybody reproduce locally?
>
I've started "make check" with parallel_schedule tweaked to contain many
select_parallel runs, and so far I've seen a couple of failures like
this (about 10 failures out of 1500 runs):
select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and
tenk2.thousand=0;
! ERROR: lost connection to parallel worker
I have no idea why the worker fails (no segfaults in dmesg, nothing in
posgres log), or if it's related to the issue discussed here at all.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2018-03-04 03:11:52 | Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)? |
Previous Message | Thomas Munro | 2018-03-04 02:51:07 | Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)? |