From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-patches(at)postgresql(dot)org, Alon Goldshuv <agoldshuv(at)greenplum(dot)com> |
Subject: | Re: WAL bypass for CTAS |
Date: | 2005-06-20 20:55:12 |
Message-ID: | 1119300912.3645.315.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
On Mon, 2005-06-20 at 14:50 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > I enclose a complete patch for avoiding WAL usage for CREATE TABLE AS
> > SELECT, when not in archive mode (PITR). The main use case for this is
> > large BI environments that create summary tables or prejoined tables,
> > though there are many general applications.
>
> Applied
Thanks
> after heavy corrections --- there were a number of things wrong
> with this "simple" patch, starting with having gotten the tests
> backwards :-(
Sorry, I thought I had corrected that error before submission. I was
aware that I had made that error earlier.
> and extending to not having actually flushed the data
> before commit (smgrimmedsync isn't enough, you have to
> FlushRelationBuffers).
I followed the logic as seen in nbtsort.c as you suggested. That code
doesn't perform a FlushRelationBuffers and it looks like I fooled myself
into thinking the CTAS/SELECT INTO case was also in local.
Perhaps we should be building CTAS/SELECT INTO in local buffers anyway?
It looks like we could save time by avoiding shared_buffers completely
and build up a whole page before writing it anywhere. (But thats a story
for another day).
Perhaps this is also related to metapage errors, since the metapage is
always the last page to be written?
> A consideration we had all missed in the original discussions is that
> if the transaction doesn't emit any WAL records at all,
> RecordTransactionCommit will think that it need not WAL-log the
> transaction commit, leading to the possibility that the commit is lost
> even though all the data is preserved :-(
> This is not a hazard for CREATE TABLE AS, since it will certainly have
> emitted WAL records while creating the table's catalog entries. It will
> be a very real hazard for COPY however.
OK, but I haven't written that patch yet!
> The cleanest solution I can
> think of is that the COPY code should emit a WAL record for the first
> tuple copied in, but not for later ones. To this end, I separated the
> "use_wal" and "use_fsm" aspects of what the patch was doing.
Not very clean, but will do as you suggest.
> I didn't apply the freespace.c changes either; that struck me as a
> serious kluge with no real benefit. We can just omit updating the FSM's
> running average, if it even has one. (ISTM there's a reasonable
> argument to be made that the tuple sizes during CREATE/COPY might not be
> representative of later requests anyway.)
I was striving for completeness only. I was doubtful about that part of
the patch, but thought I'd add that rather than have you say I hadn't
thought about the FSM avg_request_size.
I put those changes in mainly for COPY. If you don't make any request at
all to FSM then a relation never gets to the MRU relation FSM list. I
agree that it is not strictly necessary, but leaving it off would be a
change in behaviour, since COPY did previously cause the relation to get
to the MRU. That could be a problem, since a relation might not then be
allocated any FSM pages following a vacuum.
Best Regards, Simon Riggs
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2005-06-20 21:04:42 | Re: pg_locks view versus prepared transactions |
Previous Message | Tom Lane | 2005-06-20 20:54:37 | Re: Issue with plpython write_file and read_file tests |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2005-06-20 21:09:39 | Re: WAL bypass for CTAS |
Previous Message | Tom Lane | 2005-06-20 18:50:19 | Re: WAL bypass for CTAS |