Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Stefan Keller <sfkeller(at)gmail(dot)com>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Oleg Ivanov <o(dot)ivanov(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Date: 2021-04-20 21:51:32
Message-ID: CAH2-Wzm8-QfCH=74JrHSZQVzUv62cNhjM8Vvr8zWRDvigcJbSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 20, 2021 at 2:29 PM Stefan Keller <sfkeller(at)gmail(dot)com> wrote:
> Just for the records: A learned index as no more foreknowledge about
> the dataset as other indices.

Maybe. ML models are famously prone to over-interpreting training
data. In any case I am simply not competent to assess how true this
is.

> I'd give learned indexes at least a change to provide a
> proof-of-concept. And I want to learn more about the requirements to
> be accepted as a new index (before undergoing month's of code
> sprints).

I have everything to gain and nothing to lose by giving them a chance
-- I'm not required to do anything to give them a chance, after all. I
just want to be clear that I'm a skeptic now rather than later. I'm
not the one making a big investment of my time here.

> As you may have seen, the "Stonebraker paper" I cited [1] is also
> sceptic requiring full parity on features (like "concurrency control,
> recovery, non main memory,and multi-user settings")! Non main memory
> code I understand.
> => But index read/write operations and multi-user settings are part of
> a separate software (transaction manager), aren't they?

It's easy for me to be a skeptic -- again, what do I have to lose by
freely expressing my opinion? Mostly I'm just saying that I wouldn't
work on this because ISTM that there is significant uncertainty about
the outcome, but much less uncertainty about the outcome of
alternative projects of comparable difficulty. That's fundamentally
how I assess what to work on. There is plenty of uncertainty on my end
-- but that's beside the point.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-04-20 22:00:39 Privilege boundary between sysadmin and database superuser [Was: Re: pg_amcheck option to install extension]
Previous Message Tom Lane 2021-04-20 21:49:48 Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)