From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | pgsql-general list <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | understand pg_ndistinct type && Why with(autovacuum_enabled=off) some query estimate 100, some is 200. |
Date: | 2022-09-14 16:16:59 |
Message-ID: | CACJufxFUMoAyu+gtrB6WnJ9B0sk72qOH49a3dUcVF9+NLU-ncw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
-- n-distinct tests
CREATE TABLE ndistinct (
filler1 TEXT,
filler2 NUMERIC,
a INT,
b INT,
filler3 DATE,
c INT,
d INT
)
WITH (autovacuum_enabled = off);
-- over-estimates when using only per-column statistics
INSERT INTO ndistinct (a, b, c, filler1)
SELECT i/100, i/100, i/100, cash_words((i/100)::money)
FROM generate_series(1,1000) s(i);
ANALYZE ndistinct;
-- Group Aggregate, due to over-estimate of the number of groups
SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct
GROUP BY a, b');
estimated | actual
-----------+--------
100 | 11
(1 row)
SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct
GROUP BY b, c');
estimated | actual
-----------+--------
100 | 11
(1 row)
SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct
GROUP BY a, b, c');
estimated | actual
-----------+--------
100 | 11
(1 row)
SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct
GROUP BY a, b, c, d');
estimated | actual
-----------+--------
200 | 11
(1 row)
SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct
GROUP BY b, c, d');
estimated | actual
-----------+--------
200 | 11
(1 row)
-- correct command
CREATE STATISTICS s10 ON a, b, c FROM ndistinct;
ANALYZE ndistinct;
SELECT s.stxkind, d.stxdndistinct
FROM pg_statistic_ext s, pg_statistic_ext_data d
WHERE s.stxrelid = 'ndistinct'::regclass
AND d.stxoid = s.oid;
stxkind | stxdndistinct
---------+-----------------------------------------------------
{d,f,m} | {"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11}
(1 row)
---------------------------------------------------------------
I don't understand the query GROUP BY b, c, d estimate is 200, while GROUP
BY a, b, c is 100.
I also don't understand the last query stxdndistinct result.
I know what d,f,m refer to.
I may found the pg_ndistinct type source:
https://doxygen.postgresql.org/mvdistinct_8c.html#a03c06f5f0db3fc22cd5323ea04906a7c
But my C knowledge is limited.
Is there any way in sql level to query more info (like base type)
about pg_ndistinct
?
--
I recommend David Deutsch's <<The Beginning of Infinity>>
Jian
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2022-09-14 16:58:30 | Re: CVE-2022-2625 |
Previous Message | Tom Lane | 2022-09-14 14:23:48 | Re: Mysterious performance degradation in exceptional cases |