From: | Joel Perren <joel(dot)perren(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Selectivity and row count estimates for JSONB columns |
Date: | 2021-10-20 14:56:43 |
Message-ID: | CACFz3n13sazt00u+_AbGL5Ur9danB56sHPhP61ymnqUHQRZ1uQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi all
I have a Postgres/PostGIS database with two separate database tables each
of which has the following column structure:
- identifier (text)
- geometry (geometryz,27700)
- properties (jsonb)
I have created a GIN index (jsonb_path_ops) over the properties column on
both tables.
As I understand it, Postgres is unable to calculate statistics on the
contents of JSONB columns and so should use hard-coded estimates of
selectivity when planning queries. However, despite both tables having
identical structures, similar row counts, identical indexes, and both
having been recently ANALYZEd, I am finding that the query planner is
producing different row count estimates between the tables.
To expand: both tables have a 'description' field within their JSONB
columns. I am executing the following query:
EXPLAIN
SELECT identifier, geometry, properties FROM table
WHERE properties @@ '$.description == "test"'
ORDER BY identifier;
I am expecting that the selectivity value used for such queries should be
0.010 which is the default result of the matchingsel selectivity function
which the @@ operator uses when operating on jsonb.
For both tables, the planner opts for a Bitmap Index Scan -> Sort -> Gather
Merge. However, the estimated number of rows returned from the Index Scan
node differs substantially.
- Table A (1,611,752 rows): 159 estimated rows (this is roughly 0.01% and
makes sense);
- Table B (1,656,110 rows): 16566 estimated rows (roughly 1% - why??)
This difference is causing the planner to come up with some strange plans
for queries on Table B which it does not do for Table A. I know that both
estimates are just that and that neither is 'better', but I would really
prefer it to be consistent and use a selectivity of 0.01 rather than 0.1 as
it should be doing.
I know this is getting long now, so apologies. But one thing I did notice
and wondered if it could be related is that the pg_stats table for Table A
seems to have values for histogram_bounds and correlation, but nulls for
these values for Table B.
Any ideas?
Thanks
Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Toomas | 2021-10-20 15:07:49 | Re: Can db user change own password? |
Previous Message | Adrian Klaver | 2021-10-20 14:45:46 | Re: Can db user change own password? |