From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: effective_io_concurrency's steampunk spindle maths |
Date: | 2020-03-06 18:05:13 |
Message-ID: | 20200306180513.brdgyzxqxkvbphv5@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-03-02 18:28:41 +1300, Thomas Munro wrote:
> I was reading through some old threads[1][2][3] while trying to figure
> out how to add a new GUC to control I/O prefetching for new kinds of
> things[4][5], and enjoyed Simon Riggs' reference to Jules Verne in the
> context of RAID spindles.
>
> On 2 Sep 2015 14:54, "Andres Freund" <andres(at)anarazel(dot)de> wrote:
> > > On 2015-09-02 18:06:54 +0200, Tomas Vondra wrote:
> > > Maybe the best thing we can do is just completely abandon the "number of
> > > spindles" idea, and just say "number of I/O requests to prefetch". Possibly
> > > with an explanation of how to estimate it (devices * queue length).
> >
> > I think that'd be a lot better.
>
> +many, though I doubt I could describe how to estimate it myself,
> considering cloud storage, SANs, multi-lane NVMe etc. You basically
> have to experiment, and like most of our resource consumption limits,
> it's a per-backend limit anyway, so it's pretty complicated, but I
> don't see how the harmonic series helps anyone.
>
> Should we rename it? Here are my first suggestions:
Why rename? It's not like anybody knew how to infer a useful value for
effective_io_concurrency, given the math computing the actually
effective prefetch distance... I feel like we'll just unnecessarily
cause people difficulty by doing so.
> random_page_prefetch_degree
> maintenance_random_page_prefetch_degree
I don't like these names.
> Rationale for this naming pattern:
> * "random_page" from "random_page_cost"
I don't think we want to corner us into only ever using these for random
io.
> * leaves room for a different setting for sequential prefetching
I think if we want to split those at some point, we ought to split it if
we have a good reason, not before. It's not at all clear to me why you'd
want a substantially different queue depth for both.
> * "degree" conveys the idea without using loaded words like "queue"
> that might imply we know something about the I/O subsystem or that
> it's system-wide like kernel and device queues
Why is that good? Queue depth is a pretty well established term. You can
search for benchmarks of devices with it, you can correlate with OS
config, etc.
> * "maintenance_" prefix is like other GUCs that establish (presumably
> larger) limits for processes working on behalf of many user sessions
That part makes sense to me.
> Whatever we call it, I don't think it makes sense to try to model the
> details of any particular storage system. Let's use a simple counter
> of I/Os initiated but not yet known to have completed (for now: it has
> definitely completed when the associated pread() complete; perhaps
> something involving real async I/O completion notification in later
> releases).
+1
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2020-03-06 18:12:10 | Re: Error on failed COMMIT |
Previous Message | Kirill Bychik | 2020-03-06 17:59:31 | Re: WAL usage calculation patch |