Quick Links

Re: Statistics on array values

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Marco Colli <collimarco91(at)gmail(dot)com>
Cc:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: Statistics on array values
Date:	2020-02-02 17:11:05
Message-ID:	2661.1580663465@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Marco Colli <collimarco91(at)gmail(dot)com> writes:
> Let's say that you have a simple query like the following on a large table
> (for a multi-tenant application):
> SELECT "subscribers".* FROM "subscribers" WHERE "subscribers"."project_id"
> = 123 AND (tags @> ARRAY['de']::varchar[]);

> If you run EXPLAIN ANALYZE you can see that stats are completely wrong.
> For example I get an expected count of 3,500 rows whereas the actual
> result is 20 rows. This also results in bad query plans...

> In a previous discussion someone said that this wrong estimate is because
> @> uses a fixed selectivity of 0.001, **regardless of actual data**!!
> Is that true?

Hasn't been true since 9.2.

You might get some insight from looking into the most_common_elems,
most_common_elem_freqs, and elem_count_histogram fields of the pg_stats
view.

It seems likely to me that increasing the statistics target for this array
column would help. IIRC, estimates for values that don't show up in
most_common_elems are going to depend on the lowest frequency that *does*
show up there ... so if you want better resolution for non-common values,
you need more entries.

regards, tom lane

In response to

Statistics on array values at 2020-02-02 14:18:19 from Marco Colli

Responses

Re: Statistics on array values at 2020-02-02 18:26:19 from Marco Colli

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Marco Colli	2020-02-02 18:26:19	Re: Statistics on array values
Previous Message	Marco Colli	2020-02-02 14:38:57	Re: Statistics on array values