| From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> | 
|---|---|
| To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> | 
| Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com> | 
| Subject: | Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple | 
| Date: | 2017-09-29 06:04:12 | 
| Message-ID: | CAB7nPqSeFiDcWsrGNbeDx=AxCibRYfiOH4Ag+DQtmhQL=SVRog@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-committers pgsql-hackers | 
On Fri, Sep 29, 2017 at 6:39 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Peter Geoghegan wrote:
>
>> We certainly do still see wrong answers to queries here:
>>
>> postgres=# select ctid, xmin, xmax, * from t;
>>  ctid  | xmin  | xmax | id | name | x
>> -------+-------+------+----+------+---
>>  (0,1) | 21171 |    0 |  1 | 111  | 0
>>  (0,7) | 21177 |    0 |  3 | 333  | 5
>> (2 rows)
>>
>> postgres=# select * from t where id = 3;
>>  id | name | x
>> ----+------+---
>>   3 | 333  | 5
>> (1 row)
>>
>> postgres=# set enable_seqscan = off;
>> SET
>> postgres=# select * from t where id = 3;
>>  id | name | x
>> ----+------+---
>> (0 rows)
>
> Yeah, oops.
This really looks like a problem at heap-level with the parent
redirection not getting defined (?). Please note that a subsequent
REINDEX fails as well:
=# reindex index t_pkey ;
ERROR:  XX000: failed to find parent tuple for heap-only tuple at
(0,7) in table "t"
LOCATION:  IndexBuildHeapRangeScan, index.c:2597
VACUUM FREEZE also is not getting things right, but a VACUUM FULL does.
Also, dropping the constraint and attempting to recreate it is failing:
=# alter table t drop constraint t_pkey;
ALTER TABLE
=# create index t_pkey on t(id);
ERROR:  XX000: failed to find parent tuple for heap-only tuple at
(0,7) in table "t"
LOCATION:  IndexBuildHeapRangeScan, index.c:2597
A corrupted page clearly indicates that there are no tuple redirections:
=# select lp, t_ctid, lp_off, t_infomask, t_infomask2 from
heap_page_items(get_raw_page('t', 0));
 lp | t_ctid | lp_off | t_infomask | t_infomask2
----+--------+--------+------------+-------------
  1 | (0,1)  |   8152 |       2818 |           3
  2 | null   |      0 |       null |        null
  3 | (0,4)  |   8112 |       9986 |       49155
  4 | (0,5)  |   8072 |       9986 |       49155
  5 | (0,6)  |   8032 |       9986 |       49155
  6 | (0,7)  |   7992 |       9986 |       49155
  7 | (0,7)  |   7952 |      11010 |       32771
(7 rows)
And a non-corrupted page clearly shows the redirection done with lp_off:
 lp | t_ctid | lp_off | t_infomask | t_infomask2
----+--------+--------+------------+-------------
  1 | (0,1)  |   8152 |       2818 |           3
  2 | null   |      7 |       null |        null
  3 | null   |      0 |       null |        null
  4 | null   |      0 |       null |        null
  5 | null   |      0 |       null |        null
  6 | null   |      0 |       null |        null
  7 | (0,7)  |   8112 |      11010 |       32771
(7 rows)
-- 
Michael
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2017-09-29 12:58:15 | pgsql: Add PostgreSQL version to coverage output | 
| Previous Message | Peter Geoghegan | 2017-09-28 22:24:47 | Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2017-09-29 06:06:51 | Re: Bug with pg_basebackup and 'shared' tablespace | 
| Previous Message | Pavan Deolasee | 2017-09-29 05:39:00 | Re: pgbench stuck with 100% cpu usage |