From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Schoemans Maxime <maxime(dot)schoemans(at)ulb(dot)be>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SAKR Mahmoud <mahmoud(dot)sakr(at)ulb(dot)be>, Diogo Repas <diogo(dot)repas(at)gmail(dot)com>, LUO Zhicheng <zhicheng(dot)luo(at)ulb(dot)be>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
Subject: | Re: Implement missing join selectivity estimation for range types |
Date: | 2024-01-22 08:10:37 |
Message-ID: | CACJufxFhES7untAPmxJDrB-y_pNgah_1HNbXjcOj6g=yxh3kdA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I cannot figure out why it aborts.
as Tom mentioned in upthread about the test cases.
similar to src/test/regress/sql/stats_ext.sql check_estimated_rows function.
we can test it by something:
create or replace function check_estimated_rows(text) returns table (ok bool)
language plpgsql as
$$
declare
ln text;
tmp text[];
first_row bool := true;
begin
for ln in
execute format('explain analyze %s', $1)
loop
if first_row then
first_row := false;
tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)');
return query select 0.2 < tmp[1]::float8 / tmp[2]::float8
and tmp[1]::float8 / tmp[2]::float8 < 5;
end if;
end loop;
end;
$$;
select * from check_estimated_rows($$select * from test_range_join_1,
test_range_join_2 where ir1 && ir2$$);
select * from check_estimated_rows($$select * from test_range_join_1,
test_range_join_2 where ir1 << ir2$$);
select * from check_estimated_rows($$select * from test_range_join_1,
test_range_join_2 where ir1 >> ir2$$);
Do you need 3 tables to do the test? because we need to actually run
the query then compare the estimated row
and actually returned rows.
If you really execute the query with 3 table joins, it will take a lot of time.
So two tables join with where quql should be fine?
/* Fast-forwards i and j to start of iteration */
+ for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++);
+ for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++);
+
+ /* Do the estimation on overlapping regions */
+ while (i < nhist1 && j < nhist2)
+ {
+ double cur_sel1,
+ cur_sel2;
+ RangeBound cur_sync;
+
+ if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0)
+ cur_sync = hist1[i++];
+ else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0)
+ cur_sync = hist2[j++];
+ else
+ {
+ /* If equal, skip one */
+ cur_sync = hist1[i];
+
this part range_cmp_bound_values "if else if" part computed twice, you
can just do
`
int cmp;
cmp = range_cmp_bound_values(typcache, &hist1[i], &hist2[j]);
if cmp <0 then
else if cmp > 0 then
else then
`
also. I think you can put the following into main while loop.
+ for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++);
+ for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++);
split range and multirange into 2 patches might be a good idea.
seems: same function (calc_hist_join_selectivity) with same function
signature in src/backend/utils/adt/multirangetypes_selfuncs.c
and src/backend/utils/adt/rangetypes_selfuncs.c,
previously mail complaints not resolved.
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2024-01-22 08:18:03 | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Previous Message | Michael Paquier | 2024-01-22 07:52:52 | Re: introduce dynamic shared memory registry |