Quick Links

Re: [Patch][WiP] Tweaked LRU for shared buffers

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, garsthe1st(at)gmail(dot)com, dxahtepb(at)gmail(dot)com, geymer_98(at)mail(dot)ru, dafi913(at)yandex(dot)ru, edigaryev(dot)ig(at)phystech(dot)edu
Cc:	Benjamin Manes <ben(dot)manes(at)gmail(dot)com>
Subject:	Re: [Patch][WiP] Tweaked LRU for shared buffers
Date:	2019-02-15 23:30:26
Message-ID:	ea41857c-de94-910b-4412-563c1fb61da0@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2/13/19 3:37 PM, Andrey Borodin wrote:
> Hi, hackers!
>
> We have held education project at Sirius edu center (Sochi, Russia)
> with mentors from Yandex. The group of 5 students was working on
> improving the shared buffers eviction algorithm: Andrey Chausov, Yuriy
> Skakovskiy, Ivan Edigaryev, Arslan Gumerov, Daria Filipetskaya. I’ve
> been a mentor for the group. For two weeks we have been looking into
> known caching algorithms and tried to adapt some of them for PostgreSQL
> codebase.
>
> While a lot of algorithms appeared to be too complex to be hacked in
> 2 weeks, we decided to implement and test the working version of
> tweaked LRU eviction algorithm.
>
> ===How it works===
> Most of the buffers are organized into the linked list. Firstly
> admitted pages jump into 5/8th of the queue. The last ⅛ of the queue is
> governed by clock sweep algorithm to improve concurrency.
>

Interesting. Where do these numbers (5/8 and 1/8) come from?

> ===So how we tested the patch===
> We used sampling on 4 Yandex.Cloud compute instances with 16 vCPU
> cores, 8GB of RAM, 200GB database in 30-minute YCSB-like runs with Zipf
> distribution. We found that on read-only workload our algorithm is
> showing consistent improvement over the current master branch. On
> read-write workloads we haven’t found performance improvements yet,
> there was too much noise from checkpoints and bgwriter (more on it in
> TODO section).
> Charts are here: [0,1]

That TPS chart looks a bit ... wild. How come the master jumps so much
up and down? That's a bit suspicious, IMHO.

How do I reproduce this benchmark? I'm aware of pg_ycsb, but maybe
you've used some other tools?

Also, have you tried some other benchmarks (like, regular TPC-B as
implemented by pgbench, or read-only pgbench)? We need such benchmarks
with a range of access patterns to check for regressions.

BTW what do you mean by "sampling"?

> We used this config: [2]
>

That's only half the information - it doesn't say how many clients were
running the benchmark etc.

> ===TODO===
> We have taken some ideas expressed by Ben Manes in the pgsql-hackers
> list. But we could not implement all of them during the time of the
> program. For example, we tried to make LRU bumps less write-contentious
> by storing them in a circular buffer. But this feature was not stable
> enough.

Can you point us to the thread/email discussing those ideas? I have
tried searching through archives, but I haven't found anything :-(

This message does not really explain the algorithms, and combined with
the absolute lack of comments in the linked commit, it'd somewhat
difficult to form an opinion.

> The patch in its current form also requires improvements. So, we
> shall reduce the number of locks at all (in this way we have tried
> bufferization, but not only it). “Clock sweep” algorithm at the last
> ⅛ part of the logical sequence should be improved too (see
> ClockSweepTickAtTheAnd() and places of its usage).
OK

> Unfortunately, we didn’t have more time to bring CAR and W-TinyLFU
> to testing-ready state.

What is CAR? Did you mean ARC, perhaps?

> We have a working implementation of frequency sketch [3] to make a
> transition between the admission cycle and LRU more concise with TinyLFU
> filter. Most probably, work in this direction will be continued.

> Also, the current patch does not change bgwriter behavior: with a
> piece of knowledge from LRU, we can predict that some pages will not be
> changed in the nearest future. This information should be used to
> schedule the background writes better.

Sounds interesting.

> We also think that with proper eviction algorithm shared buffers
> should be used instead of specialized buffer rings.
>

Are you suggesting to get rid of the buffer rings we use for sequential
scans, for example? IMHO that's going to be tricky, e.g. because of
synchronized sequential scans.

> We will be happy to hear your feedback on our work! Thank you :)
>
> [0] LRU TPS https://yadi.sk/i/wNNmNw3id22nMQ
> [1] LRU hitrate https://yadi.sk/i/l1o710C3IX6BiA
> [2] Benchmark config https://yadi.sk/d/L-PIVq--ujw6NA
> [3] Frequency sketch code https://github.com/heyfaraday/postgres/commit/a684a139a72cd50dd0f9d031a8aa77f998607cf1
>
> With best regards, almost serious cache group.
>

cheers

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

[Patch][WiP] Tweaked LRU for shared buffers at 2019-02-13 14:37:26 from Andrey Borodin

Responses

Re: [Patch][WiP] Tweaked LRU for shared buffers at 2019-02-15 23:51:19 from Peter Geoghegan
Re: [Patch][WiP] Tweaked LRU for shared buffers at 2019-02-17 13:14:30 from

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2019-02-15 23:51:19	Re: [Patch][WiP] Tweaked LRU for shared buffers
Previous Message	Tomas Vondra	2019-02-15 22:36:37	Re: Early WIP/PoC for inlining CTEs