Re: BRIN index worse than sequential scan for large search set

From: Mickael van der Beek <mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: BRIN index worse than sequential scan for large search set
Date: 2023-02-24 17:51:00
Message-ID: CACM-Oyc-rOkzP6kV1POY+fiZcRQ+cqd5iWqR61zN6nUVRVt3sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Justin,

Thanks for the quick response!

The table may be dense, but the tuples aren't. You're asking to return
> 1/1000th of the tuples, across the entire table. Suppose there are ~100
> tuples per page, and you need to read about every 10th page. It makes
> sense that it's slow to read a large amount of data nonsequentially.

Ah, of course, you're right!
I forgot that the BRIN indexes store ranges that are not fully covered by
the row values and that PostgreSQL has to double-check (bitmap heap scan)
...
Would you thus advise to only use BRIN indexes for columns who's values are
(1) monotonically increasing but also (2) close to each other?

I guess that in my use case, something like a roaring bitmap would have
been perfect but I do not believe that those exist in PostgreSQL by default.
Btrees work well performance wise but are simply too large (> 400MB for 20M
rows) even when the index fill factor is 100% (+/- 380 MB) for my use case
as I need to index around 6B rows partitioned in roughly 3K buckets which
would result in ~120GB of Btree indexes which seems a bit much for simple
existence checks.

Because it's necessary to check if the tuple is visible to the current
> transaction. It might be from an uncommited/aborted transaction.

Interesting, that's something I did not consider indeed.
I'm guessing that this is a cost brought by MVCC that you can't get around
no matter the isolation level?

Mickael

On Fri, Feb 24, 2023 at 6:19 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Fri, Feb 24, 2023 at 05:40:55PM +0100, Mickael van der Beek wrote:
> > Hello everyone,
> >
> > I'm playing around with BRIN indexes so as to get a feel for the feature.
> > During my tests, I was unable to make BRIN indexes perform better than a
> > sequential scan for queries searching for large value sets (20K values in
> > the example down below).
>
> > And now let's query 20K random rows from our 20M total rows:
>
> I didn't try your test, but I think *random* is the problem/explanation.
>
> > By default, this query will not use the BRIN index and simply run a 1.5s
> > long sequential scan (hitting 700 MB) and a 2.47s hash join for a total
> > 8.7s query time:
> > https://explain.dalibo.com/plan/46c3191g8a6c1bc7
>
> > If we force the use of the BRIN index using (`SET LOCAL enable_seqscan =
> > OFF;`) the same query will now take 50s with 2.5s spent on the bitmap
> index
> > scan (hitting 470 MB of data) and a whopping 42s on the bitmap heap scan
> > (hitting 20 GB of data!):
> > https://explain.dalibo.com/plan/7f73bg9172a8b226
>
> That means the planner's cost model correctly preferred a seq scan.
>
> > So I had the following two questions:
> >
> > 1. Why is the BRIN index systematically worse than a sequential scan,
> > even when the table is x1000 larger than the search set, physically
> > pre-sorted, dense (fillfactor at 100%) and the search rows are
> themselves
> > sorted?
>
> The table may be dense, but the tuples aren't. You're asking to return
> 1/1000th of the tuples, across the entire table. Suppose there are ~100
> tuples per page, and you need to read about every 10th page. It makes
> sense that it's slow to read a large amount of data nonsequentially.
> That's why random_page_cost is several times higher than seq_page_cost.
>
> I would expect brin to win if the pages to be accessed were dense rather
> than distributed across the whole table.
>
> > 2. Since we only select the "idx" column, why does the BRIN index not
> > simply return the searched value if included in one of it's ranges?
> > Hitting the actual row data stored in the table seems to be unnessary
> no?
>
> Because it's necessary to check if the tuple is visible to the current
> transaction. It might be from an uncommited/aborted transaction.
>
> --
> Justin
>

--
Mickael van der BeekWeb developer & Security analyst

mickael(dot)van(dot)der(dot)beek(at)gmail(dot)com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2023-02-24 20:37:15 Re: BRIN index worse than sequential scan for large search set
Previous Message Justin Pryzby 2023-02-24 17:19:44 Re: BRIN index worse than sequential scan for large search set