Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Chris Wilson <chris(dot)wilson(at)cantabcapital(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction
Date: 2019-06-20 16:51:26
Message-ID: CANP8+jKeswEqRUOdRJAhubL=FFxVRq9zmhwt3gPg2_7EFAP3xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 20 Jun 2019 at 17:30, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Thu, Jun 20, 2019 at 05:18:33PM +0100, Simon Riggs wrote:
> > On Thu, 20 Jun 2019 at 17:01, Chris Wilson <
> chris(dot)wilson(at)cantabcapital(dot)com>
> > wrote:
> >
> >
> > > I deliberately included r in the index, to demonstrate the issue that
> I’m
> > > seeing. I know that there is very little locality in this particular,
> > > dummy, arbitrary test case. I can try to produce a test case that has
> some
> > > locality, but I expect it to show exactly the same results, i.e. that
> the
> > > BRIN index performs much worse when we try to query on this column as
> well.
> > >
> >
> > I'm suggesting that adding the second column to the index is the source
> of
> > your problem, not adding the column to the query.
>
> But it *is* odd that the index returns more rows with a strictly tighter
> conditions, right ?
>

Oh, very. I was seeing this as an optimization issue rather than a bug
report.

> Note, it's not an issue of rowcount estimate being confused by redundant
> conditions, but real rowcount, and it returns more rows even when the
> conditions are duplicative. Compare:
>
> postgres=# explain analyze select * from brin_test where id >= 90000 and r
> in (1);
> ...
> -> Bitmap Index Scan on brin_test_id_r_idx (cost=0.00..12.03
> rows=28125 width=0) (actual time=0.136..0.137 rows=37120 loops=1)
> Index Cond: ((id >= 90000) AND (r = 1))
>
> postgres=# explain analyze select * from brin_test where id >= 90000 and r
> in (1,1);
> ...
> -> Bitmap Index Scan on brin_test_id_r_idx (cost=0.00..12.03
> rows=28125 width=0) (actual time=0.263..0.263 rows=74240 loops=1)
> Index Cond: ((id >= 90000) AND (r = ANY ('{1,1}'::integer[])))
>
> postgres=# explain analyze select * from brin_test where id >= 90000 and r
> in (1,1,1);
> ...
> -> Bitmap Index Scan on brin_test_id_r_idx (cost=0.00..12.03
> rows=28125 width=0) (actual time=0.387..0.387 rows=111360 loops=1)
> Index Cond: ((id >= 90000) AND (r = ANY ('{1,1,1}'::integer[])))
>
> Note, the docs say:
> https://www.postgresql.org/docs/devel/indexes-multicolumn.html
> |A multicolumn BRIN index can be used with query conditions that involve
> any
> |subset of the index's columns. Like GIN and unlike B-tree or GiST, index
> search
> |effectiveness is the same regardless of which index column(s) the query
> |conditions use. The only reason to have multiple BRIN indexes instead of
> one
> |multicolumn BRIN index on a single table is to have a different
> pages_per_range
> |storage parameter.
>

The min/max values of each column are held for each block range.

If it scans using the "r" column it will identify more block ranges to scan
than if it used the id column and hence would scan more real rows, so that
part is understandable.

The only question is why it chooses to scan on "r" and not "id", which
needs some investigation.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Solutions for the Enterprise

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Lewis 2019-06-20 18:09:13 Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction
Previous Message Justin Pryzby 2019-06-20 16:30:44 Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction