Re: WIP: BRIN multi-range indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP: BRIN multi-range indexes
Date: 2021-01-26 22:59:05
Message-ID: d4aa7fa0-d06d-6584-9234-8c1696924dde@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/26/21 7:52 PM, John Naylor wrote:
> On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com <mailto:tomas(dot)vondra(at)enterprisedb(dot)com>>
> wrote:
> >
> >
> > On 1/23/21 12:27 AM, John Naylor wrote:
>
> > > Still, it would be great if multi-minmax can be a drop in
> replacement. I
> > > know there was a sticking point of a distance function not being
> > > available on all types, but I wonder if that can be remedied or worked
> > > around somehow.
> > >
> >
> > Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in
> > replacement for minmax (essentially, using these opclasses as the
> > default ones, with the option to switch back to plain minmax). I'm not
> > convinced we should do that - though. Imagine you have minmax indexes in
> > your existing DB, it's working perfectly fine, and then we come and just
> > silently change that during dump/restore. Is there some past example
> > when we did something similar and it turned it to be OK?
>
> I was assuming pg_dump can be taught to insert explicit opclasses for
> minmax indexes, so that upgrade would not cause surprises. If that's
> true, only new indexes would have the different default opclass.
>

Maybe, I suppose we could do that. But I always found such changes
happening silently in the background a bit suspicious, because it may be
quite confusing. I certainly wouldn't expect such difference between
creating a new index and index created by dump/restore. Did we do such
changes in the past? That might be a precedent, but I don't recall any
example ...

> > As for the distance functions, I'm pretty sure there are data types
> > without "natural" distance - like most strings, for example. We could
> > probably invent something, but the question is how much we can rely on
> > it working well enough in practice.
> >
> > Of course, is minmax even the right index type for such data types?
> > Strings are usually "labels" and not queried using range queries,
> > although sometimes people encode stuff as strings (but then it's very
> > unlikely we'll define the distance definition well). So maybe for those
> > types a hash / bloom would be a better fit anyway.
>
> Right.
>
> > But I do have an idea - maybe we can do without distances, in those
> > cases. Essentially, the primary issue of minmax indexes are outliers, so
> > what if we simply sort the values, keep one range in the middle and as
> > many single points on each tail?
>
> That's an interesting idea. I think it would be a nice bonus to try to
> do something along these lines. On the other hand, I'm not the one
> volunteering to do the work, and the patch is useful as is.
>

IMO it's fairly small amount of code, so I'll take a stab at in in the
next version of the patch.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-01-26 23:27:46 pg_replication_origin_drop API potential race condition
Previous Message Bruce Momjian 2021-01-26 22:55:14 Re: mkid reference