From: | Greg Stark <stark(at)mit(dot)edu> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com> |
Subject: | Re: Allow a per-tablespace effective_io_concurrency setting |
Date: | 2015-09-02 18:49:13 |
Message-ID: | CAM-w4HNFSi9HAvRXrvPhYf7awhzZq1ojOtcbK064VxC5U5t1jA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
On 2 Sep 2015 14:54, "Andres Freund" <andres(at)anarazel(dot)de> wrote:
>
>
> > + /*----------
> > + * The user-visible GUC parameter is the number of drives (spindles),
> > + * which we need to translate to a number-of-pages-to-prefetch target.
> > + * The target value is stashed in *extra and then assigned to the actual
> > + * variable by assign_effective_io_concurrency.
> > + *
> > + * The expected number of prefetch pages needed to keep N drives busy is:
> > + *
> > + * drives | I/O requests
> > + * -------+----------------
> > + * 1 | 1
> > + * 2 | 2/1 + 2/2 = 3
> > + * 3 | 3/1 + 3/2 + 3/3 = 5 1/2
> > + * 4 | 4/1 + 4/2 + 4/3 + 4/4 = 8 1/3
> > + * n | n * H(n)
>
> I know you just moved this code. But: I don't buy this formula. Like at
> all. Doesn't queuing and reordering entirely invalidate the logic here?
I can take the blame for this formula.
It's called the "Coupon Collector Problem". If you hit get a random
coupon from a set of n possible coupons, how many random coupons would
you have to collect before you expect to have at least one of each.
This computation model assumes we have no information about which
spindle each block will hit. That's basically true for the case of
bitmapheapscan for most cases because the idea of bitmapheapscan is to
be picking a sparse set of blocks and there's no reason the blocks
being read will have any regularity that causes them all to fall on
the same spindles. If in fact you're reading a fairly dense set then
bitmapheapscan probably is a waste of time and simply reading
sequentially would be exactly as fast or even faster.
We talked about this quite a bit back then and there was no dispute
that the aim is to provide GUCs that mean something meaningful to the
DBA who can actually measure them. They know how many spindles they
have. They do not know what the optimal prefetch depth is and the only
way to determine it would be to experiment with Postgres. Worse, I
think the above formula works for essentially random I/O but for more
predictable I/O it might be possible to use a different formula. But
if we made the GUC something low level like "how many blocks to
prefetch" then we're left in the dark about how to handle that
different access pattern.
I did speak to a dm developer and he suggested that the kernel could
help out with an API. He suggested something of the form "how many
blocks do I have to read before the end of the current device". I
wasn't sure exactly what we would do with something like that but it
would be better than just guessing how many I/O operations we need to
issue to keep all the spindles busy.
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-09-02 18:50:12 | Re: [PATCH] SQL function to report log message |
Previous Message | Robert Haas | 2015-09-02 18:45:10 | Re: RFC: replace pg_stat_activity.waiting with something more descriptive |
From | Date | Subject | |
---|---|---|---|
Next Message | Julien Rouhaud | 2015-09-02 19:12:41 | Re: Allow a per-tablespace effective_io_concurrency setting |
Previous Message | Andres Freund | 2015-09-02 16:45:56 | Re: Allow a per-tablespace effective_io_concurrency setting |