From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Timothy Garnett <tgarnett(at)panjiva(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com> |
Subject: | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Date: | 2015-05-04 06:47:06 |
Message-ID: | CAEepm=2eybKBd4G3VUQA1igaJ_nSRtSikfLAFpGcwYMq8nd4ug@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, May 4, 2015 at 11:49 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Sun, May 3, 2015 at 4:40 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Based on this data, it seems that truncation of member space
>> as well as advancement of oldest multixact id happens once
>> it reaches 50% usage and at that time segments drops down to almost
>> zero. This happens repeatedly after 1 hour and in-between there
>> is no progress which indicates that all the work happens at
>> one go rather than in spreaded way. Won't this choke the system
>> when it happens due to I/O, isn't it better if we design it in a way such
>> that it is spreaded over period of time rather than doing everything at
>> one go?
>
> At 50% usage it starts vacuuming tables that have the very oldest mxid
> in the system. That is, at 50% usage, we try to select the smallest
> non-zero fraction of tables that can be selected based on relminmxid
> alone, with any luck exactly one or close to it. As usage increases,
> we decrease the cutoff age slowly so we start selecting tables with
> slightly newer relminmxid values for vacuuming too. Only if it
> reaches 75% usage will it vacuum everything it can and eat all your
> IO. In a real system, I suppose there would be lots of big tables
> that need to be vacuumed, the vacuuming would take some time, and they
> would tend to have different relminmxids so that vacuuming would be
> effectively spread out by this selection algorithm. I think. Perhaps
> we should devise a test procedure to try to see if that happens. We'd
> need to create a bunch of big tables and modify monitor.sh to show the
> relminmxid for each of them so that you could see when they are being
> processed -- I will look into that.
After initdb, I did:
postgres=# create table foo as select 42 as id;
SELECT 1
postgres=# create table cat_a as select generate_series(1, 10000000);
SELECT 10000000
postgres=# create table cat_b as select generate_series(1, 10000000);
SELECT 10000000
postgres=# create table cat_c as select generate_series(1, 10000000);
SELECT 10000000
postgres=# create table cat_d as select generate_series(1, 10000000);
SELECT 10000000
Then I started monitor.sh (new version attached) and started
explode_mxact_members.c and recorded the attached file
monitor-output.txt. The result is not great: since relminmxid starts
at 1 for all tables created in a brand new database, 3
(autovacuum_max_workers) vacuums started around the same time, and
later they didn't really seem to be diverging, a bit under an hour
later the same three were triggered again, and so on.
I have also attached monitor-output-fail.txt, a case where the
database size is too large to be vacuumed fast enough for that rate of
member space consumption. It starts vacuuming with 3 workers at 50%,
and hasn't finished by 100%, so the new error is raised and no more
multixacts can be created. Eventually the vacuuming completes and
progress can be made. The database was set up as above, but the
tables have 10 times more rows.
> Restricting ourselves to selecting tables to vacuum using their
> relminmxid alone makes this patch small since autovacuum already works
> that way. We *could* introduce code that would be able to spread out
> the work of vacuuming tables that happen to have identical or very
> close relminmxid (say by introducing some non-determinism or doing
> something weird based on hashing table oids and the time to explicitly
> spread the start of processing over time, or <your idea here>), but I
> didn't want to propose anything too big/complicated/clever/stupid and
> I suspect that the relminmxid values will tend to diverge over time
> (but I could be wrong about that, if they all start at 1 and then move
> forward in lockstep over long periods of time then what I propose is
> not good enough... let's see if we can find out).
Maybe we do need to consider something more radical, since tables that
have the same relminmxid are fairly easy to produce (for example by
restoring a database dump into a new database). I will try to do
something that adds some noise to the signal to deal with this case
(of the top of my head, something along the lines of time_in_minutes %
16 == table_oid %16..., or adjust the multixact cutoff age with a
recipe including table_oid % something, or ... not sure yet). If
anyone has any better ideas, I am all ears.
--
Thomas Munro
http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
monitor-output-fail.txt | text/plain | 7.8 KB |
monitor.sh | application/x-sh | 1.4 KB |
monitor-output.txt | text/plain | 19.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2015-05-04 07:12:39 | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Previous Message | Amit Kapila | 2015-05-04 06:25:46 | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |