Re: COPY FROM WHEN condition

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adam Berlin <berlin(dot)ab(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: COPY FROM WHEN condition
Date: 2019-04-03 05:44:32
Message-ID: CAKJS1f_SQso+=U5_hEps=z5PJ7=4Lchs_MFt4no0etyP53qukQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(Fixed all of what you mentioned)

On Wed, 3 Apr 2019 at 06:57, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > +/*
> > + * CopyMultiInsertInfo_Flush
> > + * Write out all stored tuples in all buffers out to the tables.
> > + *
> > + * To save us from ending up with buffers for 1000s of partitions we remove
> > + * buffers belonging to partitions that we've seen no tuples for in this batch
>
> That seems a little naive (imagine you have like 5 partitions, and we
> constantly cycle through 2-3 of them per batch). It's probably OK for
> this version. I guess I'd only do that cleanup if we're actually
> flushing due to the number of partitions.

hmm good point. It seems like being smarter there would be a good thing.

I've ended up getting rid of the hash table in favour of the List that
you mentioned and storing the buffer in ResultRelInfo. I also changed
the logic that removes buffers once we reach the limit. Instead of
getting rid of buffers that were not used on this run, I've changed it
so it just gets rid of the buffers starting with the oldest one first,
but stops once the number of buffers is at the maximum again. This
can mean that we end up with MAX_BUFFERED_TUPLES buffers instead of
MAX_PARTITION_BUFFERS if there is only 1 tuple per buffer. My current
thinking is that this does not matter since only 1 slot will be
allocated per buffer. We'll remove all of the excess buffers during
the flush and keep just MAX_PARTITION_BUFFERS of the newest buffers.

Also, after changing CopyMultiInsertBuffer to use fixed sized arrays
instead of allocating them with another palloc the performance has
improved a bit more.

Using the perl files mentioned in [1]

Master + Patched:
# copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
COPY 35651564
Time: 9106.776 ms (00:09.107)
# truncate table listp;
TRUNCATE TABLE
# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 10154.196 ms (00:10.154)

Master only:
# copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
COPY 35651564
Time: 22200.535 ms (00:22.201)
# truncate table listp;
TRUNCATE TABLE
# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 18592.107 ms (00:18.592)

> > if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> > {
> > ereport(ERROR,
> > - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > - errmsg("cannot perform FREEZE on a partitioned table")));
> > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > + errmsg("cannot perform FREEZE on a partitioned table")));
> > }
>
> accidental hunk?

It was left over from a pgindent run. Now removed.

[1] https://www.postgresql.org/message-id/CAKJS1f98Fa+QRTGKwqbtz0M=Cy1EHYR8Q-W08cpA78tOy4euKQ@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
david_tableam_copy_v3.patch application/octet-stream 44.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2019-04-03 05:44:37 Re: pg_basebackup ignores the existing data directory permissions
Previous Message Masahiko Sawada 2019-04-03 05:27:44 Re: New vacuum option to do only freezing