From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com> |
Subject: | Re: Streaming read-ready sequential scan code |
Date: | 2024-04-03 22:12:57 |
Message-ID: | 20240403221257.md4gfki3z75cdyf6@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2024-04-04 09:27:35 +1300, Thomas Munro wrote:
> Anecdotally by all reports I've seen so far, all-in-cache seems to be
> consistently a touch faster than master if anything, for streaming seq
> scan and streaming ANALYZE. That's great!, but it doesn't seem to be
> due to intentional changes. No efficiency is coming from batching
> anything. Perhaps it has to do with CPU pipelining effects: though
> it's doing the same work as ReadBuffer()-when-it-gets-a-hit, the work
> itself is cut up into stages in a kind of pipeline:
> read_stream_next_buffer() chooses page n + 2, pins page n + 1 and
> returns page n each time you call it, so maybe we get more CPU
> parallelism due to spreading the data dependencies out?
Another theory is that it's due to the plain ReadBuffer() path needing to do
RelationGetSmgr(reln) on every call, whereas the streaming read path doesn't
need to.
> > On the topic of BAS_BULKREAD buffer access strategy, I think the least
> > we could do is add an assert like this to read_stream_begin_relation()
> > after calculating max_pinned_buffers.
> >
> > Assert(GetAccessStrategyBufferCount(strategy) > max_pinned_buffers);
> >
> > Perhaps we should do more? I think with a ring size of 16 MB, large
> > SELECTs are safe for now. But I know future developers will make
> > changes and it would be good not to assume they will understand that
> > pinning more buffers than the size of the ring effectively invalidates
> > the ring.
>
> Yeah I think we should clamp max_pinned_buffers if we see a strategy.
> What do you think about:
>
> if (strategy)
> {
> int strategy_buffers = GetAccessStrategyBufferCount(strategy);
> max_pinned_buffers = Min(strategy_buffers / 2, max_pinned_buffers);
> }
> I just don't know where to get that '2'. The idea would be to
> hopefully never actually be constrained by it in practice, except in
> tiny/toy setups, so we can't go too wrong with our number '2' there.
The / 2 is to avoid causing unnecessarily frequent WAL flushes, right? If so,
should we apply that only if the ring the strategy doesn't use the
StrategyRejectBuffer() logic?
I think it's fine to add a handwavy justification for the 2, saying that we
want to balance readahead distance and reducing WAL write frequency, and that
at some point more sophisticated logic will be needed.
> Then we should increase the default ring sizes for BAS_BULKREAD and
> BAS_VACUUM to make the previous statement true.
I'm not sure it's right tying them together. The concerns for both are fairly
different.
> The size of main memory and L2 cache have increased dramatically since those
> strategies were invented. I think we should at least double them, and more
> likely quadruple them. I realise you already made them configurable per
> command in commit 1cbbee033, but I mean the hard coded default 256 in
> freelist.c. It's not only to get more space for our prefetching plans, it's
> also to give the system more chance of flushing WAL asynchronously/in some
> other backend before you crash into dirty data; as you discovered,
> prefetching accidentally makes that effect worse in.a BAS_VACUUM strategy,
> by taking away space that is effectively deferring WAL flushes, so I'd at
> least like to double the size for if we use "/ 2" above.
I think for VACUUM we should probably go a bit further. There's no comparable
L1/L2 issue, because the per-buffer processing + WAL insertion is a lot more
expensive, compared to a seqscan. I'd go or at lest 4x-8x.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2024-04-03 22:16:03 | Re: Security lessons from liblzma - libsystemd |
Previous Message | Tom Lane | 2024-04-03 22:06:02 | Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~? |