From: | Stefan Keller <sfkeller(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Oleg Ivanov <o(dot)ivanov(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google) |
Date: | 2021-04-20 21:29:43 |
Message-ID: | CAFcOn29U7X14d9ZcZDcsNJe6GadRtFTmeh7JkJJSG+rLTMuG7g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Just for the records: A learned index as no more foreknowledge about
the dataset as other indices.
I'd give learned indexes at least a change to provide a
proof-of-concept. And I want to learn more about the requirements to
be accepted as a new index (before undergoing month's of code
sprints).
As you may have seen, the "Stonebraker paper" I cited [1] is also
sceptic requiring full parity on features (like "concurrency control,
recovery, non main memory,and multi-user settings")! Non main memory
code I understand.
=> But index read/write operations and multi-user settings are part of
a separate software (transaction manager), aren't they?
~Stefan
[1] https://www.cs.brandeis.edu/~olga/publications/ieee2021.pdf
Am Di., 20. Apr. 2021 um 22:22 Uhr schrieb Peter Geoghegan <pg(at)bowt(dot)ie>:
>
> On Tue, Apr 20, 2021 at 12:51 PM Jonah H. Harris <jonah(dot)harris(at)gmail(dot)com> wrote:
> >> Maybe I'll be wrong about learned indexes - who knows? But the burden
> >> of proof is not mine. I prefer to spend my time on things that I am
> >> reasonably confident will work out well ahead of time.
> >
> >
> > Agreed on all of your takes, Peter. In time, they will probably be more realistic.
>
> A big problem when critically evaluating any complicated top-down
> model in the abstract is that it's too easy for the designer to hide
> *risk* (perhaps inadvertently). If you are allowed to make what
> amounts to an assumption that you have perfect foreknowledge of the
> dataset, then sure, you can do a lot with that certainty. You can
> easily find a way to make things faster or more space efficient by
> some ridiculous multiple that way (like 10x, 100x, whatever).
>
> None of these papers ever get around to explaining why what they've
> come up with is not simply fool's gold. The assumption that you can
> have robust foreknowledge of the dataset seems incredibly fragile,
> even if your model is *almost* miraculously good. I have no idea how
> fair that is. But my job is to make Postgres better, not to judge
> papers. My mindset is very matter of fact and practical.
>
> --
> Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-04-20 21:49:48 | Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google) |
Previous Message | Peter Geoghegan | 2021-04-20 21:25:57 | Re: RFE: Make statistics robust for unplanned events |