strange nested loop row count estimates

From: Sergey Koposov <skoposov(at)cmu(dot)edu>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: strange nested loop row count estimates
Date: 2019-05-02 00:59:52
Message-ID: e5cc7ba42336483fa7d072d92718573fc250bcb1.camel@cmu.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I'm currently trying to understand the expected row counts for a query involving a nested loop join and bitmap index scan
on the functional index and a custom operator. And the numbers that I see don't make sense to me currently. Hopefully
somebody here can shed some light on it, or confirm this is some kind of issue.

Here is the query and explain analyze

explain analyze select * from twomass.psc as t , gaia_dr2.gaia_source as g where
(
(q3c_ang2ipix(g.ra,g.dec) between q3c_nearby_it(t.ra, t.decl, 0.0003, 0) and
q3c_nearby_it(t.ra, t.decl, 0.0003, 1))
or
(q3c_ang2ipix(g.ra,g.dec) between q3c_nearby_it(t.ra, t.decl, 0.0003, 1) and
q3c_nearby_it(t.ra, t.decl, 0.0003, 3))
)
and
0.0003 ==<<>>== (g.ra,g.dec,t.ra,t.decl)::q3c_type limit 10;

https://explain.depesz.com/s/vcNd

What I can't understand at all is how the estimate of 3E15 rows!!!! is obtained by the nested loop
given that the bitmap heap scan is expected to return *one* single row for each row of the 'left' table.
So in my mind the estimate of the total number of rows
should be ~ 1e9 rows after the nested loop. Because of this crazy overestimate, I actually have to force the nested loop
in this query by disabling seqscan.
(if I don't disable the seqscan -- this is the plan I get which ignores the indices:
https://explain.depesz.com/s/EIiG

Some more details about the query:
q3c_ang2ipix(ra,dec) is the function mapping (double,double) -> bigint and the tables have a functional index on that.
Like this:
Table "gaia_dr2.gaia_source"
Column | Type | Modifiers
----------------------------------+-------------------+-----------
ra | double precision |
dec | double precision |
.......
Indexes:
"gaia_source2_q3c_ang2ipix_idx" btree (q3c_ang2ipix(ra, "dec"))

q3c_nearby_() function just returns bigint.

The ==<<>== is the custom operator with custom low selectivity (1e-12 in this case)

The tables in the join in question have 450 mill and 1.5 billion rows.

I hope somebody can help me understand what's going on.

Thank you in advance.

Sergey

PS the kind of query that I show comes from the q3c module ( https://github.com/segasai/q3c )
that is used for spatial queries of large astronomical catalogues.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2019-05-02 01:02:12 Re: Starting Postgres when there is no disk space
Previous Message Igal Sapir 2019-05-02 00:07:01 Starting Postgres when there is no disk space