Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org,Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>,Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>,Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?
Date: 2018-03-04 02:40:08
Message-ID: 3A8FCC1A-E6E4-4279-9312-851FCBDDE08F@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On March 3, 2018 6:36:51 PM PST, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>On 03/04/2018 03:20 AM, Thomas Munro wrote:
>> Hi,
>>
>> I saw a one-off failure like this:
>>
>> QUERY PLAN
>>
>--------------------------------------------------------------------------
>> Aggregate (actual rows=1 loops=1)
>> ! -> Nested Loop (actual rows=98000 loops=1)
>> -> Seq Scan on tenk2 (actual rows=10 loops=1)
>> Filter: (thousand = 0)
>> Rows Removed by Filter: 9990
>> ! -> Gather (actual rows=9800 loops=10)
>> Workers Planned: 4
>> Workers Launched: 4
>> -> Parallel Seq Scan on tenk1 (actual rows=1960
>loops=50)
>> --- 485,495 ----
>> QUERY PLAN
>>
>--------------------------------------------------------------------------
>> Aggregate (actual rows=1 loops=1)
>> ! -> Nested Loop (actual rows=97984 loops=1)
>> -> Seq Scan on tenk2 (actual rows=10 loops=1)
>> Filter: (thousand = 0)
>> Rows Removed by Filter: 9990
>> ! -> Gather (actual rows=9798 loops=10)
>> Workers Planned: 4
>> Workers Launched: 4
>> -> Parallel Seq Scan on tenk1 (actual rows=1960
>loops=50)
>>
>>
>> Two tuples apparently went missing.
>>
>> Similar failures on the build farm:
>>
>>
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=okapi&dt=2018-03-03%2020%3A15%3A01
>>
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2018-03-03%2018%3A13%3A32
>>
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2018-03-03%2017%3A55%3A11
>>
>> Could this be related to commit
>> 34db06ef9a1d7f36391c64293bf1e0ce44a33915 or commit
>> 497171d3e2aaeea3b30d710b4e368645ad07ae43?
>>
>
>I think the same failure (or at least very similar plan diff) was
>already mentioned here:
>
>https://www.postgresql.org/message-id/17385.1520018934@sss.pgh.pa.us
>
>So I guess someone else already noticed, but I don't see the cause
>identified in that thread.

Robert and I started discussing it a bit over IM. No conclusion. Robert tried to reproduce locally, including disabling atomics, without luck.

Can anybody reproduce locally?

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-03-04 02:48:35 Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?
Previous Message Tomas Vondra 2018-03-04 02:36:51 Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?