From: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com> |
---|---|
To: | 'Amit Langote' <amitlangote09(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Japin Li <japinli(at)hotmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Forget close an open relation in ReorderBufferProcessTXN() |
Date: | 2021-05-21 07:26:32 |
Message-ID: | OSBPR01MB48880D3B760074587D4D2424ED299@OSBPR01MB4888.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Friday, May 21, 2021 3:55 PM I wrote:
> On Thursday, May 20, 2021 9:59 PM Amit Langote
> <amitlangote09(at)gmail(dot)com> wrote:
> > Here are updated/divided patches.
> Thanks for your updates.
>
> But, I've detected segmentation faults caused by the patch, which can
> happen during 100_bugs.pl in src/test/subscription.
> This happens more than one in ten times.
>
> This problem would be a timing issue and has been introduced by v3 already.
> I used v5 for HEAD also and reproduced this failure, while OSS HEAD doesn't
> reproduce this, even when I executed 100_bugs.pl 200 times in a tight loop.
> I aligned the commit id 4f586fe2 for all check. Below logs are ones I got from v3.
>
> * The message of the failure during TAP test.
>
> # Postmaster PID for node "twoways" is 5015 Waiting for replication conn
> testsub's replay_lsn to pass pg_current_wal_lsn() on twoways #
> poll_query_until timed out executing this query:
> # SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming'
> FROM pg_catalog.pg_stat_replication WHERE application_name = 'testsub';
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> # psql: error: connection to server on socket
> "/tmp/cs8dhFOtZZ/.s.PGSQL.59345" failed: No such file or directory
> # Is the server running locally and accepting connections on that
> socket?
> timed out waiting for catchup at t/100_bugs.pl line 148.
>
>
> The failure produces core file and its back trace is below.
> My first guess of the cause is that between the timing to get an entry from
> hash_search() in get_rel_sync_entry() and to set the map by
> convert_tuples_by_name() in maybe_send_schema(), we had invalidation
> message, which tries to free unset descs in the entry ?
Sorry, this guess was not accurate at all.
Please ignore this because we need to have the entry->map set
to free descs. Sorry for making noises.
Best Regards,
Takamichi Osumi
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2021-05-21 07:42:42 | Re: Forget close an open relation in ReorderBufferProcessTXN() |
Previous Message | Dilip Kumar | 2021-05-21 06:55:11 | Re: Move pg_attribute.attcompression to earlier in struct for reduced size? |