Quick Links

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

From:	Stefan Keller <sfkeller(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	"Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Oleg Ivanov <o(dot)ivanov(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Date:	2021-04-20 21:29:43
Message-ID:	CAFcOn29U7X14d9ZcZDcsNJe6GadRtFTmeh7JkJJSG+rLTMuG7g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Just for the records: A learned index as no more foreknowledge about
the dataset as other indices.

I'd give learned indexes at least a change to provide a
proof-of-concept. And I want to learn more about the requirements to
be accepted as a new index (before undergoing month's of code
sprints).

As you may have seen, the "Stonebraker paper" I cited [1] is also
sceptic requiring full parity on features (like "concurrency control,
recovery, non main memory,and multi-user settings")! Non main memory
code I understand.
=> But index read/write operations and multi-user settings are part of
a separate software (transaction manager), aren't they?

~Stefan

[1] https://www.cs.brandeis.edu/~olga/publications/ieee2021.pdf

Am Di., 20. Apr. 2021 um 22:22 Uhr schrieb Peter Geoghegan <pg(at)bowt(dot)ie>:
>
> On Tue, Apr 20, 2021 at 12:51 PM Jonah H. Harris <jonah(dot)harris(at)gmail(dot)com> wrote:
> >> Maybe I'll be wrong about learned indexes - who knows? But the burden
> >> of proof is not mine. I prefer to spend my time on things that I am
> >> reasonably confident will work out well ahead of time.
> >
> >
> > Agreed on all of your takes, Peter. In time, they will probably be more realistic.
>
> A big problem when critically evaluating any complicated top-down
> model in the abstract is that it's too easy for the designer to hide
> *risk* (perhaps inadvertently). If you are allowed to make what
> amounts to an assumption that you have perfect foreknowledge of the
> dataset, then sure, you can do a lot with that certainty. You can
> easily find a way to make things faster or more space efficient by
> some ridiculous multiple that way (like 10x, 100x, whatever).
>
> None of these papers ever get around to explaining why what they've
> come up with is not simply fool's gold. The assumption that you can
> have robust foreknowledge of the dataset seems incredibly fragile,
> even if your model is *almost* miraculously good. I have no idea how
> fair that is. But my job is to make Postgres better, not to judge
> papers. My mindset is very matter of fact and practical.
>
> --
> Peter Geoghegan

In response to

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google) at 2021-04-20 20:22:27 from Peter Geoghegan

Responses

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google) at 2021-04-20 21:49:48 from Tom Lane
Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google) at 2021-04-20 21:51:32 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2021-04-20 21:49:48	Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Previous Message	Peter Geoghegan	2021-04-20 21:25:57	Re: RFE: Make statistics robust for unplanned events