From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | davinder singh <davindersingh2692(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Optimize external TOAST storage |
Date: | 2022-03-16 18:36:56 |
Message-ID: | 20220316183656.GA1137410@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 16, 2022 at 10:08:45AM -0400, Robert Haas wrote:
> I would like to take a slightly contrary position. I think that a
> system here that involves multiple knobs is going to be too
> complicated to be of any real-world use, because almost nobody will
> understand it or tune it properly for their installation. And who is
> to say that a single setting would even be appropriate for a whole
> installation, as opposed to something that needs to be tuned for each
> individual table? A single tunable might be OK, but what I would
> really like here is a heuristic that, while perhaps not optimal in
> every environment, is good enough that a very high percentage of users
> will never need to worry about it. In a case like this, a small gain
> that happens automatically feels better than a large gain that
> requires manual tweaking in every install.
Agreed. If there is a tunable, it needs to be as simple as possible. And
ideally most users would never need to use it because the default would be
"good enough."
> Now that doesn't answer the question of what that heuristic should be.
> I initially thought that if compression didn't end up reducing the
> number of TOAST chunks then there was no point, but that's not true,
> because having the last chunk be smaller than the rest can allow us to
> fit more than 4 into a page rather than exactly 4. However, this
> effect isn't likely to be important if most chunks are full-size
> chunks. If we insert 100 full-size chunks and 1 partial chunk at the
> end, we can't save much even if that chunk ends up being 1 byte
> instead of a full-size chunk. 25 TOAST table pages out of 26 are going
> to be full of full-size chunks either way, so we can't save more than
> ~4% and we might easily save nothing at all. As you say, the potential
> savings are larger as the values get smaller, because a higher
> proportion of the TOAST table pages are not quite full. So I think the
> only cases where it's even interesting to think about suppressing this
> optimization are those where the value is quite small -- and maybe the
> thing to do is just pick a conservatively large threshold and call it
> good.
>
> For example, say that instead of applying this technique when there
> are at least 2 TOAST chunks, we apply it when there are at least 500
> (i.e. normally 1MB). It's hard for me to imagine that we can ever
> lose, and in fact I think we could come down quite a bit from there
> and still end up winning pretty much all the time. Setting the
> threshold to, say, 50 or 100 or 150 TOAST chunks instead of 2 may
> leave some money on the table, but if it means that we don't have
> meaningful downsides and we don't have tunables for users to fiddle
> with, it might be worth it.
As long as we can demonstrate some decent improvements, I think using a
conservative threshold is a good idea. I do wonder whether thiѕ could
weaken the read performance gains quite a bit, though.
Thinking further, is simply reducing the number of TOAST chunks the right
thing to look at? If I want to add a TOAST attribute that requires 100,000
chunks, and you told me that I could save 10% in the read path for an extra
250 chunks of disk space, I would probably choose read performance. If I
wanted to add 100,000 attributes that were each 3 chunks, and you told me
that I could save 10% in the read path for an extra 75,000 chunks of disk
space, I might choose the extra disk space. These are admittedly extreme
(and maybe even impossible) examples, but my point is that the amount of
disk space you are willing to give up may be related to the size of the
attribute. And maybe one way to extract additional read performance with
this optimization is to use a variable threshold so that we are more likely
to use it for large attributes.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2022-03-16 18:42:48 | Re: add checkpoint stats of snapshot and mapping files of pg_logical dir |
Previous Message | Jimmy Yih | 2022-03-16 18:20:10 | Re: Concurrent deadlock scenario with DROP INDEX on partitioned index |