From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | David Steele <david(at)pgmasters(dot)net> |
Cc: | PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Backend crash on non-exclusive backup cancel |
Date: | 2017-03-01 02:08:55 |
Message-ID: | CAB7nPqR9xvq_wYKFjUYF1CECFLRbe6xGhGWaCv_yE8pZU71Ztw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Tue, Feb 28, 2017 at 10:21 PM, David Steele <david(at)pgmasters(dot)net> wrote:
> On 2/27/17 10:05 PM, Michael Paquier wrote:
>> That's not necessarily true, I can see a stop backup able to finish as
>> well by issuing a cancel request. It seems to me that we just need to
>> have the shmem information updated at the same time as the
>> session-level switches for consistency and we're good. The
>> inconsistency in places when updating the session-level flags and the
>> shmem-level flags is what is causing harm.
>
> I'm sure this could be done but it will require quite a bit of
> refactoring and I'm not sure that it's worth it. In my mind it would be
> enough to document that cancelled backups should be considered invalid
> whether the stop backup record was written or not. However, I'm willing
> to go with the majority opinion.
I would think that addressing the problem is the way to go, hitting an
assertion in this code path means that we are doing something wrong so
simply removing it does not sound like a correct answer to me.
>>> If that makes sense I'm happy to work up a patch. This is definitely an
>>> edge case and I seriously doubt it is causing any issues in the field.
>>
>> <...>
>>
>> David, are you willing to write a patch or should I? Changing
>> nonexclusive_backup_running/exclusive_backup_running to be an enum
>> would make the code more readable as well IMO.
>
> I would like to see if anyone else weighs in on this first, but yes I am
> planning to write a patch. Agreed on the enum.
This was itching me yesterday so I wrote a patch able to address this
problem... At least it can be registered to the CF on time and give it
more visibility. By the way, I have noticed a second bug in the
current logic while looking at the problem you have reported:
1) Start backup in session 1:
=# select pg_start_backup('popo');
pg_start_backup
-----------------
0/2000028
(1 row)
2) pg_stop_backup() in session 2
3) And now when trying to work on backups with session 1:
=# select pg_start_backup('popo');
ERROR: 55000: a backup is already in progress in this session
LOCATION: pg_start_backup, xlogfuncs.c:87
=# select pg_stop_backup();
ERROR: 55000: exclusive backup not in progress
LOCATION: do_pg_stop_backup, xlog.c:10642
So the handling around exclusive_backup_running is broken now as it is
possible to lock a session easily when handling backups. And the root
of the problem is that checks on exclusive_backup_running are not
necessary. I have fixed as well this problem as per the attached. Note
however that SESSION_BACKUP_EXCLUSIVE still makes sense in the patch
attached when doing checks with pg_stop_backup_v2() for a
non-exclusive backup run. It is also useful for debugging.
--
Michael
Attachment | Content-Type | Size |
---|---|---|
backup-session-locks-fixes.patch | application/octet-stream | 7.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Steele | 2017-03-01 03:44:53 | Re: Backend crash on non-exclusive backup cancel |
Previous Message | Stephen Frost | 2017-02-28 15:53:55 | Re: BUG #14543: libpq fails with group readable ssl keys |
From | Date | Subject | |
---|---|---|---|
Next Message | 钱新林 | 2017-03-01 02:13:17 | Re: help to identify the reason that extension's C function returns array get segmentation fault |
Previous Message | Haribabu Kommi | 2017-03-01 01:59:46 | Refactor handling of database attributes between pg_dump and pg_dumpall |