From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: FSM versus GIN pending list bloat |
Date: | 2015-08-04 20:50:21 |
Message-ID: | CANP8+jJsjj8HOzVKbLB4+Bc+B1tkzymJf3O3K5BFS=zpXbTX1Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 4 August 2015 at 21:04, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> Couple of questions here...
>>
>> * the docs say "it's desirable to have pending-list cleanup occur in the
>> background", but there is no way to invoke that, except via VACUUM. I
>> think we need a separate function to be able to call this as a background
>> action. If we had that, we wouldn't need much else, would we?
>>
>
> I thought maybe the new bgworker framework would be a way to have a
> backend signal a bgworker to do the cleanup when it notices the pending
> list is getting large. But that wouldn't directly fix this issue, because
> the bgworker still wouldn't recycle that space (without further changes),
> only vacuum workers do that currently.
>
> But I don't think this could be implemented as an extension, because the
> signalling code has to be in core, so (not having studied the matter at
> all) I don't know if it is good fit for bgworker.
>
We need to expose 2 functions:
1. a function to perform the recycling directly (BRIN has an equivalent
function)
2. a function to see how big the pending list is for a particular index,
i.e. do we need to run function 1?
We can then build a bgworker that polls the pending list and issues a
recycle if and when needed - which is how autovac started.
> * why do we have two parameters: gin_pending_list_limit and fastupdate?
>> What happens if we set gin_pending_list_limit but don't set fastupdate?
>>
>
> Fastupdate is on by default. If it were turned off, then
> gin_pending_list_limit would be mostly irrelevant for those tables.
> Fastupdate could have been implemented as a magic value (0 or -1) for
> gin_pending_list_limit but that would break backwards compatibility (and
> arguably would not be a better way of doing things, anyway).
>
>
>> * how do we know how to set that parameter? Is there a way of knowing
>> gin_pending_list_limit has been reached?
>>
>
> I don't think there is an easier answer to that. The trade offs are
> complex and depend on things like how well cached the parts of the index
> needing insertions are, how many lexemes/array elements are in an average
> document, and how many documents inserted near the same time as each other
> share lexemes in common. And of course what you need to optimize for,
> latency or throughput, and if latency search latency or insert latency.
>
So we also need a way to count the number of times the pending list is
flushed. Perhaps record that on the metapage, so we can see how often it
has happened - and another function to view the stats on that
This and the OP seem like 9.5 open items to me.
>>
>
> I don't think so. Freeing gin_pending_list_limit from being forcibly tied
> to work_mem is a good thing. Even if I don't know exactly how to set
> gin_pending_list_limit, I know I don't want to be 4GB just because work_mem
> was set there for some temporary reason. I'm happy to leave it at its
> default and let its fine tuning be a topic for people who really care about
> every microsecond of performance.
>
OK, I accept this.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-08-04 20:52:41 | Re: RFC: replace pg_stat_activity.waiting with something more descriptive |
Previous Message | Robert Haas | 2015-08-04 20:47:21 | Re: RFC: replace pg_stat_activity.waiting with something more descriptive |