From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com> |
Cc: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Cache relation sizes? |
Date: | 2018-12-27 21:42:46 |
Message-ID: | CAEepm=3f9Ho1jKohAUF=ueDqN5LUfdLv5k8FK9DNYaCP=si1Cg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Dec 27, 2018 at 8:00 PM Jamison, Kirk <k(dot)jamison(at)jp(dot)fujitsu(dot)com> wrote:
> I also find this proposed feature to be beneficial for performance, especially when we want to extend or truncate large tables.
> As mentioned by David, currently there is a query latency spike when we make generic plan for partitioned table with many partitions.
> I tried to apply Thomas' patch for that use case. Aside from measuring the planning and execution time,
> I also monitored the lseek calls using simple strace, with and without the patch.
Thanks for looking into this and testing!
> Setup 8192 table partitions.
> (1) set plan_cache_mode = 'force_generic_plan';
> Planning Time: 1678.680 ms
> Planning Time: 1596.566 ms
> (2) plan_cache_mode = 'auto’
> Planning Time: 768.669 ms
> Planning Time: 181.690 ms
> (3) set plan_cache_mode = 'force_generic_plan';
> Planning Time: 14.294 ms
> Planning Time: 13.976 ms
> If I did the test correctly, I am not sure though as to why the patch did not affect the generic planning performance of table with many partitions.
> However, the number of lseek calls was greatly reduced with Thomas’ patch.
> I also did not get considerable speed up in terms of latency average using pgbench –S (read-only, unprepared).
> I am assuming this might be applicable to other use cases as well.
> (I just tested the patch, but haven’t dug up the patch details yet).
The result for (2) is nice. Even though you had to use 8192
partitions to see it.
> Would you like to submit this to the commitfest to get more reviews for possible idea/patch improvement?
For now I think this still in the experiment/hack phase and I have a
ton of other stuff percolating in this commitfest already (and a week
of family holiday in the middle of January). But if you have ideas
about the validity of the assumptions, the reason it breaks initdb, or
any other aspect of this approach (or alternatives), please don't let
me stop you, and of course please feel free to submit this, an
improved version or an alternative proposal yourself! Unfortunately I
wouldn't have time to nurture it this time around, beyond some
drive-by comments.
Assorted armchair speculation: I wonder how much this is affected by
the OS and KPTI, virtualisation technology, PCID support, etc. Back
in the good old days, Linux's lseek(SEEK_END) stopped acquiring the
inode mutex when reading the size, at least in the generic
implementation used by most filesystems (I wonder if our workloads
were indirectly responsible for that optimisation?) so maybe it became
about as fast as a syscall could possibly be, but now the baseline for
how fast syscalls can be has moved and it also depends on your
hardware, and it also has external costs that depend on what memory
you touch in between syscalls. Also, other operating systems might
still acquire a per-underlying-file/vnode/whatever lock (<checks
source code>... yes) and the contention for that might depend on what
else is happening, so that a single standalone test wouldn't capture
that but a super busy DB with a rapidly expanding and contracting
table that many other sessions are trying to observe with
lseek(SEEK_END) could slow down more.
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2018-12-27 21:55:43 | Re: pg_dumpall --exclude-database option |
Previous Message | Tom Lane | 2018-12-27 21:26:53 | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) |