| From: | Jeff Davis <pgsql(at)j-davis(dot)com> | 
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | 9.3: load path to mitigate load penalty for checksums | 
| Date: | 2012-06-05 01:26:04 | 
| Message-ID: | 1338859564.28589.199.camel@sussancws0025 | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Introduction:
=================================
A lot of progress has been made on the checksums issue, with Simon's
excellent work beginning to gain consensus:
http://archives.postgresql.org/message-id/CA
+U5nMKw_GBs6qQ_Y8-RjGL1V7MVW2HWBHartB8LoJhnPfxL8g(at)mail(dot)gmail(dot)com
For the purposes of this proposal, I'll assume that's the general
direction we'll be taking for CRCs.
The major drawback of that proposal is that it increases the amount of
work to be done after a large data load by requiring more WAL.
Proposal:
=================================
I propose a special user-initiated loading mode at the table
granularity. During this time, readers must ignore PD_ALL_VISIBLE,
HEAP_XMIN_COMMITTED, and the visibility map entirely. However, writers
may set all of those bits before the writing transaction commits,
obviating the need to rewrite (and WAL) the data again later. Ideally,
there would be no work for VACUUM to do after the data load (unless a
transaction aborted).
This would also help the non-CRC case of course, but I expect CRCs to
make this significantly more important.
Goals:
=================================
  * Table granularity (doesn't affect other tables at all)
  * Allows concurrent loaders
  * Allows loading into existing tables with existing data
  * Online (allow reads to proceed, even if degraded)
Rough user interaction:
=================================
INITIATE LOAD ON foo AS 'job name';
-- run concurrent loading sessions
FINALIZE LOAD 'job name';
High-level design:
=================================
By "hints" I mean the VM bit, PD_ALL_VISIBLE, and HEAP_XMIN_COMMITTED.
By "ignorehints" I mean a flag in pg_class indicating that readers
should ignore hints.
By "optimistichints" I mean a flag in pg_class indicating that writers
can optimistically set hints.
Obviously, readers and writers would need a mechanism to honor those
flags, but I haven't dug into the details yet (additional routines in
tqual.c?).
  States:
    0: normal
       * ignorehints = false
       * optimistichints = false
    1: trying to enter data load mode, waiting on existing
       lockers (who might be scanning) to finish
       * ignorehints = true
       * optimistichints = false
    2: data load mode
       * ignorehints = true
       * optimistichints = true
    3: trying to leave data load mode, waiting on old snapshots to
       be released and aborted transactions to be cleaned up
       * ignorehints = true
       * optimistichints = false
INITIATE LOAD would first transition from state 0 to 1 by acquiring a
ShareUpdateExclusiveLock on the table (to be sure no concurrent INITIATE
or FINALIZE LOAD is going on) and setting ignorehints = true.
Then it moves from state 1 to state 2 by waiting for all transactions
that hold a lock on that table. Any transactions that don't already have
a lock will see the new flag when they try to get it. Now we're sure
that all readers will see the "ignorehints" flag, so we can set the
"optimistichints" flag to indicate that writers can write hints
optimistically.
FINALIZE LOAD would first move from state 2 to state 3 by acquiring a
ShareUpdateExclusiveLock on the table setting optimistichints = false.
Then, it would move from state 3 to state 0 by first waiting for all
transactions that currently hold a lock on the table, to ensure they see
the optimistichints=false flag. Then, it would remember the current xid
as max_loader_xid, and wait until the global xmin is greater than
max_loader_xid. This should ensure that all snapshots regard all loading
transactions as complete. Also, it would need to run a lazy VACUUM to
remove any tuples from aborted transactions.
Details and optimizations
=================================
 * We probably want a graceful way to handle multiple data loads
happening on the same table. Rather than erroring out, we could treat it
more like a reference count, and only do the work to move in to data
load mode if not there already, and only move out of data load mode if
we're the last loading job on the table.
 * In general, there are some usability issues to sort out, to make sure
a table isn't put into data load mode and left that way. Right now, I'm
mostly concerned with getting a working design, but those will be
important, too.
 * We could optimize away the VACUUM going from 3 -> 0 if we are sure no
writing transactions aborted.
 * INITIATE and FINALIZE probably need to use PreventTransactionChain()
and multiple transactions, to avoid holding the ShareUpdateExclusiveLock
for too long. Also, we want to keep people from using it in the same
transaction as the loading xact, because they might not realize that
they would get a concurrency of 1 that way (because of the
ShareUpdateExclusiveLock).
Thoughts?
Regards,
	Jeff Davis
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Noah Misch | 2012-06-05 06:37:30 | Re: Unnecessary WAL archiving after failover | 
| Previous Message | Tom Lane | 2012-06-05 01:02:05 | Re: [RFC] Interface of Row Level Security |