From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|
To: | chenhj <chjischj(at)163(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [Proposal] Page Compression for OLTP |
Date: | 2020-05-21 08:04:55 |
Message-ID: | alpine.DEB.2.22.394.2005210914440.2856263@pseudo |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
My 0.02€, some of which may just show some misunderstanding on my part:
- you have clearly given quite a few thoughts about the what and how…
which makes your message an interesting read.
- Could this be proposed as some kind of extension, provided that enough
hooks are available? ISTM that foreign tables and/or alternative
storage engine (aka ACCESS METHOD) provide convenient APIs which could
fit the need for these? Or are they not appropriate? You seem to
suggest that there are not.
If not, what could be done to improve API to allow what you are seeking
to do? Maybe you need a somehow lower-level programmable API which does
not exist already, or at least is not exported already, but could be
specified and implemented with limited effort? Basically you would like
to read/write pg pages to somewhere, and then there is the syncing
issue to consider. Maybe such a "page storage" API could provide
benefit for some specialized hardware, eg persistent memory stores,
so there would be more reason to define it anyway? I think it might
be valuable to give it some thoughts.
- Could you maybe elaborate on how your plan differs from [4] and [5]?
- Have you consider keeping page headers and compressing tuple data
only?
- I'm not sure there is a point in going below the underlying file
system blocksize, quite often 4 KiB? Or maybe yes? Or is there
a benefit to aim at 1/4 even if most pages overflow?
- ISTM that your approach entails 3 "files". Could it be done with 2?
I'd suggest that the possible overflow pointers (coa) could be part of
the headers so that when reading the 3.1 page, then the header would
tell where to find the overflow 3.2, without requiring an additional
independent structure with very small data in it, most of it zeros.
Possibly this is not possible, because it would require some available
space in standard headers when the is page is not compressible, and
there is not enough. Maybe creating a little room for that in
existing headers (4 bytes could be enough?) would be a good compromise.
Hmmm. Maybe the approach I suggest would only work for 1/2 compression,
but not for other target ratios, but I think it could be made to work
if the pointer can entail several blocks in the overflow table.
- If one page is split in 3 parts, could it creates problems on syncing,
if 1/3 or 2/3 pages get written, but maybe that is manageable with WAL
as it would note that the page was not synced and that is enough for
replay.
- I'm unclear how you would manage the 2 representations of a page in
memory. I'm afraid that a 8 KiB page compressed to 4 KiB would
basically take 12 KiB, i.e. reduce the available memory for caching
purposes. Hmmm. The current status is that a written page probably
takes 16 KiB, once in shared buffers and once in the system caches,
so it would be an improvement anyway.
- Maybe the compressed and overflow table could become bloated somehow,
which would require the vaccuuming implementation and add to the
complexity of the implementation?
- External tools should be available to allow page inspection, eg for
debugging purposes.
--
Fabien.
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-05-21 08:10:05 | Re: Is it useful to record whether plans are generic or custom? |
Previous Message | Julien Rouhaud | 2020-05-21 07:49:19 | Re: Planning counters in pg_stat_statements (using pgss_store) |