| From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> | 
|---|---|
| To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, garsthe1st(at)gmail(dot)com, dxahtepb(at)gmail(dot)com, geymer_98(at)mail(dot)ru, dafi913(at)yandex(dot)ru, edigaryev(dot)ig(at)phystech(dot)edu | 
| Cc: | Benjamin Manes <ben(dot)manes(at)gmail(dot)com> | 
| Subject: | Re: [Patch][WiP] Tweaked LRU for shared buffers | 
| Date: | 2019-02-15 23:30:26 | 
| Message-ID: | ea41857c-de94-910b-4412-563c1fb61da0@2ndquadrant.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
On 2/13/19 3:37 PM, Andrey Borodin wrote:
> Hi, hackers!
> 
> We have held education project at Sirius edu center (Sochi, Russia)
> with mentors from Yandex. The group of 5 students was working on
> improving the shared buffers eviction algorithm: Andrey Chausov, Yuriy
> Skakovskiy, Ivan Edigaryev, Arslan Gumerov, Daria Filipetskaya. I’ve
> been a mentor for the group. For two weeks we have been looking into
> known caching algorithms and tried to adapt some of them for PostgreSQL
> codebase.
>
> While a lot of algorithms appeared to be too complex to be hacked in 
> 2 weeks, we decided to implement and test the working version of
> tweaked LRU eviction algorithm.
> 
> ===How it works===
> Most of the buffers are organized into the linked list. Firstly
> admitted pages jump into 5/8th of the queue. The last ⅛ of the queue is
> governed by clock sweep algorithm to improve concurrency.
> 
Interesting. Where do these numbers (5/8 and 1/8) come from?
> ===So how we tested the patch===
> We used sampling on 4 Yandex.Cloud compute instances with 16 vCPU
> cores, 8GB of RAM, 200GB database in 30-minute YCSB-like runs with Zipf
> distribution. We found that on read-only workload our algorithm is
> showing consistent improvement over the current master branch. On
> read-write workloads we haven’t found performance improvements yet,
> there was too much noise from checkpoints and bgwriter (more on it in
> TODO section).
> Charts are here: [0,1]
That TPS chart looks a bit ... wild. How come the master jumps so much
up and down? That's a bit suspicious, IMHO.
How do I reproduce this benchmark? I'm aware of pg_ycsb, but maybe
you've used some other tools?
Also, have you tried some other benchmarks (like, regular TPC-B as
implemented by pgbench, or read-only pgbench)? We need such benchmarks
with a range of access patterns to check for regressions.
BTW what do you mean by "sampling"?
> We used this config: [2]
> 
That's only half the information - it doesn't say how many clients were
running the benchmark etc.
> ===TODO===
> We have taken some ideas expressed by Ben Manes in the pgsql-hackers
> list. But we could not implement all of them during the time of the
> program. For example, we tried to make LRU bumps less write-contentious
> by storing them in a circular buffer. But this feature was not stable
> enough.
Can you point us to the thread/email discussing those ideas? I have
tried searching through archives, but I haven't found anything :-(
This message does not really explain the algorithms, and combined with
the absolute lack of comments in the linked commit, it'd somewhat
difficult to form an opinion.
> The patch in its current form also requires improvements. So, we 
> shall reduce the number of locks at all (in this way we have tried 
> bufferization, but not only it). “Clock sweep” algorithm at the last
> ⅛ part of the logical sequence should be improved too (see 
> ClockSweepTickAtTheAnd() and places of its usage).
OK
> Unfortunately, we didn’t have more time to bring CAR and W-TinyLFU
> to testing-ready state.
What is CAR? Did you mean ARC, perhaps?
> We have a working implementation of frequency sketch [3] to make a
> transition between the admission cycle and LRU more concise with TinyLFU
> filter. Most probably, work in this direction will be continued.
OK
> Also, the current patch does not change bgwriter behavior: with a
> piece of knowledge from LRU, we can predict that some pages will not be
> changed in the nearest future. This information should be used to
> schedule the background writes better.
Sounds interesting.
> We also think that with proper eviction algorithm shared buffers
> should be used instead of specialized buffer rings.
> 
Are you suggesting to get rid of the buffer rings we use for sequential
scans, for example? IMHO that's going to be tricky, e.g. because of
synchronized sequential scans.
> We will be happy to hear your feedback on our work! Thank you :)
> 
> [0] LRU TPS https://yadi.sk/i/wNNmNw3id22nMQ
> [1] LRU hitrate https://yadi.sk/i/l1o710C3IX6BiA
> [2] Benchmark config https://yadi.sk/d/L-PIVq--ujw6NA
> [3] Frequency sketch code https://github.com/heyfaraday/postgres/commit/a684a139a72cd50dd0f9d031a8aa77f998607cf1
> 
> With best regards, almost serious cache group.
> 
cheers
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Geoghegan | 2019-02-15 23:51:19 | Re: [Patch][WiP] Tweaked LRU for shared buffers | 
| Previous Message | Tomas Vondra | 2019-02-15 22:36:37 | Re: Early WIP/PoC for inlining CTEs |