From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | ivanmulhin(at)gmail(dot)com, Pg Docs <pgsql-docs(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: incorrect information in documentation |
Date: | 2022-04-12 21:36:47 |
Message-ID: | CAKFQuwax7V5R_rw=EOWmy=TBON6v3sveBx_WvwsENskCL5CLQQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs |
On Mon, Aug 9, 2021 at 8:40 PM David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
wrote:
> On Mon, Aug 9, 2021 at 11:05 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
>>
>> > selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/
>> > num_distinct1,
>> > 1/num_distinct2)
>> > = (1 - 0) * (1 - 0) / max(10000, 10000)
>> > = 0.0001
>>
>> Nice, can you provide a patch please?
>>
>>
> Change the line:
>
>
Concretely, as attached and inline.
David J.
commit 73fa486a855d75d74a1a695bb350bfbfe27c7751
Author: David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
Date: Tue Apr 12 21:23:53 2022 +0000
doc: make unique non-null join selectivity example match the prose
The description of the computation for the unique, non-null,
join selectivity describes a division by the maximum of two values,
while the example shows a multiplication by their reciprocal. While
equivalent the max phrasing is easier to understand; which seems
more important here than precisely adhering to the formula use
in the code (for which either variant is still an approximation).
While both num_distinct and num_rows are equal for a unique column
both the concept and formula use row count (10,000) and the
field num_distinct has already been set to mean the specific value
present in the pg_stats table (i.e, -1), so use num_rows here.
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 78053d7c49..f72bc4b274 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -391,18 +391,20 @@ tablename | null_frac | n_distinct | most_common_vals
</programlisting>
In this case there is no <acronym>MCV</acronym> information for
- <structfield>unique2</structfield> because all the values appear to be
- unique, so we use an algorithm that relies only on the number of
- distinct values for both relations together with their null fractions:
+ <structname>unique2</structname> and all the values appear to be
+ unique (n_distinct = -1), so we use an algorithm that relies on the row
+ count estimates for both relations (num_rows, not shown, but "tenk")
+ together with the column null fractions (zero for both):
<programlisting>
-selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1,
1/num_distinct2)
+selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_rows1,
num_rows2)
= (1 - 0) * (1 - 0) / max(10000, 10000)
= 0.0001
</programlisting>
This is, subtract the null fraction from one for each of the relations,
- and divide by the maximum of the numbers of distinct values.
+ and divide by the row count of the larger relation (this value does get
+ scaled in the non-unique case).
The number of rows
that the join is likely to emit is calculated as the cardinality of the
Cartesian product of the two inputs, multiplied by the
Attachment | Content-Type | Size |
---|---|---|
v0001-doc-make-row-estimation-example-match-prose.patch | application/octet-stream | 2.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-04-12 22:22:53 | Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose |
Previous Message | Peter Geoghegan | 2022-04-12 21:34:01 | Re: "GIN and GiST Index Types" page is about usage in full text search, but looks general purpose |