| From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
|---|---|
| To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, khuddleston(at)pivotal(dot)io |
| Subject: | Re: Making "COPY partitioned_table FROM" faster |
| Date: | 2018-07-26 17:30:22 |
| Message-ID: | CAAKRu_ZvbATWE_22mMdT6ms2cEOQR7018Au12O5Q8C_xKBjigA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jul 25, 2018 at 7:24 PM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:
On patched code line 2564, there is a missing parenthesis. It might be
better to remove the parentheses entirely and, while you're at it, there is
a missing comma.
/*
* For partitioned tables we can't support multi-inserts when there
* are any statement level insert triggers. (It might be possible to
* allow partitioned tables with such triggers in the future, but for
* now CopyFromInsertBatch expects that any before row insert and
* statement level insert triggers are on the same relation.
*/
Should be
/*
* For partitioned tables we can't support multi-inserts when there
* are any statement level insert triggers. It might be possible to
* allow partitioned tables with such triggers in the future, but for
* now, CopyFromInsertBatch expects that any before row insert and
* statement level insert triggers are on the same relation.
*/
Regarding the performance improvement and code diff from v2 to v3:
The rearranging of the code in v3 makes sense and improves the flow of the
code, however, I stared at it for awhile and couldn't quite work out in my
head which part caused the significant improvement from v2.
I would have thought that a speedup from patch v2 to v3 would have come
from doing multi-inserts less often when the target partition is switching
a lot, however, looking at the v2 to v3 diff, I can't see how any of the
changes would have caused a decrease in the number of multi-inserts given
the same data and target table ddl.
So, I thinking I'm missing something. Which of the changes would cause the
performance improvement from patch v2 to v3? My understanding is:
a) Made the avgTuplesPerPartChange >= 1.3 the first condition in the
> test that sets leafpart_use_multi_insert.
>
This would short-circuit checking the other conditions when deciding how to
set leafpart_use_multi_insert
> b) Added an unlikely() when testing of the partition's resultRelInfo
> has set to be initialised. This is only true the first time a
> partition is inserted into during the COPY.
>
The addition of "unlikely" here seems like a good idea because there is a
function call (ExecInitPartitionInfo) inside the if statement. However,
would that cause such a difference in performance from v2 to v3? What would
be the reason? Cache misses? Some kind of pre-evaluation of the expression?
c) Added unlikely() around lastPartitionSampleLineNo test. This will
> only be true 1 in 1000, so pretty unlikely.
>
Even though this makes sense based on the frequency with which this
condition will evaluate to true, I don't understand what the performance
benefit would be for adding a compiler hint here around such a trivial
calculation
d) Moved the nBufferedTuples > 0 test under the insertMethod ==
> CIM_MULTI_CONDITIONAL test. It's never going to be > 0 when we're
> doing CIM_SINGLE.
>
This is a good catch. It makes this part of the code more clear, as well.
However, it doesn't seem like it would affect performance at all.
Let me know what I'm missing.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2018-07-26 17:35:16 | Re: Online enabling of checksums |
| Previous Message | Robert Haas | 2018-07-26 17:03:39 | Re: Online enabling of checksums |