Re: checkpoint clarifications needed

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Tom DalPozzo <t(dot)dalpozzo(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: checkpoint clarifications needed
Date: 2017-01-09 14:26:29
Message-ID: c22c511c-154e-67c9-7101-61e3a52ad2bc@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 01/09/2017 06:14 AM, Tom DalPozzo wrote:
> Hi, I need some clarifications about checkpoints.
> Below here a log from my standby server when started and then some parts
> of the interested WAL in the master's cluster obtained by pg_xlogdump.
> Just to have an example to talk on.
>
> 1) I see: "LOG: redo starts at 1/F00A7448" . I was expecting a
> checkpoint record around 1/F00A7448 but the related checkpoint record is
> at lsn: 1/FCBD7510 instead.
> I expected that because I read in the doc :"Checkpoints are points in
> the sequence of transactions at which it is guaranteed that the heap and
> index data files have been updated with ALL information written BEFORE
> that checkpoint".
> And I interpreted that as "All information written before that
> checkpoint RECORD" but now I guess that one thing is a checkpoint point
> and one thing is a checkpoint record. Right?
>
> 2) I see that a checkpoint position can be right in the middle of a
> group of records related to a transaction (in the example, transaction
> id 10684). So a checkpoint position is NOT a consistency state point, right?
>
> 3) According to doc at 29.5, in pg_control the position of the last
> checkpoint record (not the checkpoint position itself) is saved right?
>
> 4) If I'm right at 2) then, between the checkpoint position (1/F00A7448
> ) and the checkpoint record position (1/FCBD7510) there must be a point
> where the DB is in a consistency state. If not, in case of crash just
> after writing the checkpoint record to the WAL and its position to
> pg_control, the system would replay from the checkpoint position (known
> by last checkpoint record) without finding a consistency state. Right?
>
> 5) How can we define, in terms of log records, a consistency state position?

https://www.postgresql.org/docs/9.5/static/wal-internals.html

"After a checkpoint has been made and the log flushed, the checkpoint's
position is saved in the file pg_control. Therefore, at the start of
recovery, the server first reads pg_control and then the checkpoint
record; then it performs the REDO operation by scanning forward from the
log position indicated in the checkpoint record. Because the entire
content of data pages is saved in the log on the first page modification
after a checkpoint (assuming full_page_writes is not disabled), all
pages changed since the checkpoint will be restored to a consistent state."

>
> Best regards
> Pupillo
>
> STANDBY SERVER LOG
> LOG: redo starts at 1/F00A7448
> ....
> LOG: consistent recovery state reached at 2/426DF28
> LOG: invalid record length at 2/426DF28: wanted 24, got 0
> LOG: database system is ready to accept read only connections
> LOG: started streaming WAL from primary at 2/4000000 on timeline 1
>
>
> FROM PG_XLOGDUMP OF MASTER
> ........
> rmgr: Heap len (rec/tot): 14/ 1186, tx: 10684, lsn:
> 1/F009EEE0, prev 1/F009EEA8, desc: UPDATE off 1 xmax 10684 ; new off 3
> xmax 0, blkref #0: rel 1663/16384/16422 blk 20054, blkref #1: rel
> 1663/16384/16422 blk 19774
> rmgr: Btree len (rec/tot): 2/ 64, tx: 10684, lsn:
> 1/F009F388, prev 1/F009EEE0, desc: INSERT_LEAF off 2, blkref #0: rel
> 1663/16384/16428 blk 711
> rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
> 1/F009F3C8, prev 1/F009F388, desc: FPI_FOR_HINT , blkref #0: rel
> 1663/16384/16484 blk 9663 FPW
> rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
> 1/F00A13E8, prev 1/F009F3C8, desc: FPI_FOR_HINT , blkref #0: rel
> 1663/16384/16484 blk 9664 FPW
> rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
> 1/F00A3408, prev 1/F00A13E8, desc: FPI_FOR_HINT , blkref #0: rel
> 1663/16384/16484 blk 9665 FPW
> rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
> 1/F00A5428, prev 1/F00A3408, desc: FPI_FOR_HINT , blkref #0: rel
> 1663/16384/16484 blk 9666 FPW
> rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
> 1/F00A7448, prev 1/F00A5428, desc: FPI_FOR_HINT , blkref #0: rel
> 1663/16384/16484 blk 9667 FPW
> rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
> 1/F00A9468, prev 1/F00A7448, desc: FPI_FOR_HINT , blkref #0: rel
> 1663/16384/16484 blk 9668 FPW
> rmgr: XLOG len (rec/tot): 0/ 8193, tx: 0, lsn:
> 1/F00AB488, prev 1/F00A9468, desc: FPI_FOR_HINT , blkref #0: rel
> 1663/16384/16484 blk 9669 FPW
> rmgr: Heap len (rec/tot): 8/ 8063, tx: 10682, lsn:
> 1/F00AD4A8, prev 1/F00AB488, desc: LOCK off 5: xid 10682: flags 0
> LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/16384/16422 blk 19787 FPW
> rmgr: Heap len (rec/tot): 8/ 8063, tx: 10684, lsn:
> 1/F00AF440, prev 1/F00AD4A8, desc: LOCK off 2: xid 10684: flags 0
> LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/16384/16422 blk 19774 FPW
> rmgr: Heap len (rec/tot): 14/ 4657, tx: 10682, lsn:
> 1/F00B13D8, prev 1/F00AF440, desc: UPDATE off 5 xmax 10682 ; new off 4
> xmax 0, blkref #0: rel 1663/16384/16422 blk 20075 FPW, blkref #1: rel
> 1663/16384/16422 blk 19787
> rmgr: Heap len (rec/tot): 14/ 4657, tx: 10684, lsn:
> 1/F00B2628, prev 1/F00B13D8, desc: UPDATE off 2 xmax 10684 ; new off 4
> xmax 0, blkref #0: rel 1663/16384/16422 blk 20054 FPW, blkref #1: rel
> 1663/16384/16422 blk 19774
> rmgr: Btr
> .......
> rmgr: Heap2 len (rec/tot): 8/ 68, tx: 0, lsn:
> 1/FCBD7448, prev 1/FCBD7400, desc: CLEAN remxid 10903, blkref #0: rel
> 1663/16384/16422 blk 2001
> rmgr: Heap2 len (rec/tot): 8/ 66, tx: 0, lsn:
> 1/FCBD7490, prev 1/FCBD7448, desc: CLEAN remxid 10903, blkref #0: rel
> 1663/16384/16422 blk 2003
> rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
> 1/FCBD74D8, prev 1/FCBD7490, desc: RUNNING_XACTS nextXid 10907
> latestCompletedXid 10906 oldestRunningXid 10907
> rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn:
> 1/FCBD7510, prev 1/FCBD74D8, desc: CHECKPOINT_ONLINE redo 1/F00A7448;
> tli 1; prev tli 1; fpw true; xid 0:10685; oid 24665; multi 1; offset 0;
> oldest xid 584 in DB 1; oldest multi 1 in DB 1; oldest/newest commit
> timestamp xid: 0/0; oldest running xid 10682; online
> rmgr: Heap len (rec/tot): 3/ 164, tx: 10907, lsn:
> 1/FCBD7580, prev 1/FCBD7510, desc: INSERT off 25, blkref #0: rel
> 1663/16384/16484 blk 16398
> rmgr: Btree len (rec/tot): 2/ 64, tx: 10907, lsn:
> 1/FCBD7628, prev 1/FCBD7580, desc: INSERT_LEAF off 60, blkref #0: rel
> 1663/16384/16490 blk 2722
> ........
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom DalPozzo 2017-01-09 14:47:37 Re: checkpoint clarifications needed
Previous Message Albe Laurenz 2017-01-09 14:24:14 Re: Querying dead rows