From: | Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: maximum columns for brin bloom indexes |
Date: | 2021-04-08 14:49:18 |
Message-ID: | 20210408144918.GA3086@ahch-to |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 08, 2021 at 12:18:36PM +0200, Tomas Vondra wrote:
> On 4/8/21 9:08 AM, Jaime Casanova wrote:
> > Hi everyone,
> >
> > When testing brin bloom indexes I noted that we need to reduce the
> > PAGES_PER_RANGE parameter of the index to allow more columns on it.
> >
> > Sadly, this could be a problem if you create the index before the table
> > grows, once it reaches some number of rows (i see the error as early as
> > 1000 rows) it starts error out.
> >
> > create table t1(i int, j int);
> >
> > -- uses default PAGES_PER_RANGE=128
> > create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ;
> >
> > insert into t1
> > select random()*1000, random()*1000 from generate_series(1, 1000);
> > ERROR: index row size 8968 exceeds maximum 8152 for index "t1_i_j_idx"
> >
> > if instead you create the index with a minor PAGES_PER_RANGE it goes
> > fine, in this case it works once you reduce it to at least 116
> >
> > create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops )
> > with (pages_per_range=116);
> >
> >
> > so, for having:
> > two int columns - PAGES_PER_RANGE should be max 116
> > three int columns - PAGES_PER_RANGE should be max 77
> > one int and one timestamp - PAGES_PER_RANGE should be max 121
> >
> > and so on
> >
>
> No, because this very much depends on the number if distinct values in
> the page page range, which determines how well the bloom filter
> compresses. You used 1000, but that's just an arbitrary value and the
> actual data might have any other value. And it's unlikely that all three
> columns will have the same number of distinct values.
>
Ok, that makes sense. Still I see a few odd things:
"""
drop table if exists t1;
create table t1(i int, j int);
create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ;
-- This one will succeed, I guess because it has less different
-- values
insert into t1
select random()*20, random()*100 from generate_series(1, 1000);
-- succeed
insert into t1
select random()*20, random()*100 from generate_series(1, 100000);
-- succeed
insert into t1
select random()*200, random()*1000 from generate_series(1, 1000);
-- succeed
insert into t1
select random()*200, random()*1000 from generate_series(1, 1000);
-- succeed? This is the case it has been causing problems before
insert into t1
select random()*1000, random()*1000 from generate_series(1, 1000);
"""
Maybe this makes sense, but it looks random to me. If it makes sense
this is something we should document better.
Let's try another combination:
"""
drop table if exists t1;
create table t1(i int, j int);
create index on t1 using brin(i int4_bloom_ops, j int4_bloom_ops ) ;
-- this fails again
insert into t1
select random()*1000, random()*1000 from generate_series(1, 1000);
-- and this starts to fail now, but this worked before
insert into t1
select random()*20, random()*100 from generate_series(1, 1000);
"""
> Of course, this also depends on the false positive rate.
>
How the false positive rate work?
> FWIW I doubt people are using multi-column BRIN indexes very often.
>
true.
Another question, should we allow to create a brin multi column index
that uses different opclasses?
CREATE INDEX ON t1 USING brin (i int4_bloom_ops, j int4_minmax_ops);
--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL
From | Date | Subject | |
---|---|---|---|
Next Message | Julien Rouhaud | 2021-04-08 14:54:09 | Re: pg_stat_statements oddity with track = all |
Previous Message | David Steele | 2021-04-08 14:45:58 | Re: allow partial union-all and improve parallel subquery costing |