From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Making clausesel.c Smarter |
Date: | 2017-02-24 10:00:07 |
Message-ID: | CAKJS1f82CCOwVh5dkmnvadYpfpKOFFKXgpczJtbXWigJpEOArw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've stumbled over an interesting query out in the wild where the
query was testing a nullable timestamptz column was earlier than some
point in time, and also checking the column IS NOT NULL.
The use pattern here was that records which required processing had
this timestamp set to something other than NULL, a worker would come
along and process those, and UPDATE the record to NULL to mark the
fact that it was now processed. So what we are left with was a table
with a small number of rows with a value in this timestamp column, and
an ever-growing number of rows with a NULL value.
A highly simplified version of the query was checking for records that
required processing before some date, say:
SELECT * FROM a WHERE ts < '2017-02-24' AND ts IS NOT NULL;
Of course, the ts IS NOT NULL is not required here, but I can
understand how someone might make the mistake of adding this. The
simple solution to the problem was to have that null test removed, but
that seemingly innocent little mistake caused some pain due to very
slow running queries which held on to a nested loop plan 33 times
longer than it should have been doing. Basically what was happening
here is that clauselist_selectivity() was applying the selectivity
with the ~0.97 null_fact from pg_statistic over the top of the already
applied estimate on the number of rows below the constant timestamp.
Since the diagnosis of this problem was not instant, and some amount
of pain was suffered before the solution was found, I wondered if it
might be worth smartening up the planner a bit in this area.
We're already pretty good at handling conditions like: SELECT * FROM a
WHERE x < 10 and x < 1; where we'll effectively ignore the x < 10
estimate since x < 1 is more restrictive, so if we just build on that
ability a bit we could get enough code to cover the above case.
I've attached a draft patch which improves the situation here.
Given the test case:
create table ts (ts timestamptz);
insert into ts select case when x%1000=0 then '2017-01-01'::date +
(x::text || ' sec')::interval else null end from
generate_series(1,1000000) x ( x );
analyze ts;
explain analyze select count(*) from ts where ts <=
'2017-02-01'::timestamptz and ts is not null;
With the patch we get:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=15938.83..15938.84 rows=1 width=8) (actual
time=101.003..101.003 rows=1 loops=1)
-> Seq Scan on ts (cost=0.00..15937.00 rows=733 width=0) (actual
time=0.184..100.868 rows=1000 loops=1)
Filter: ((ts IS NOT NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
Rows Removed by Filter: 999000
Planning time: 0.153 ms
Execution time: 101.063 ms
Whereas master gives us:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Aggregate (cost=15937.00..15937.01 rows=1 width=8) (actual
time=119.256..119.256 rows=1 loops=1)
-> Seq Scan on ts (cost=0.00..15937.00 rows=1 width=0) (actual
time=0.172..119.062 rows=1000 loops=1)
Filter: ((ts IS NOT NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
Rows Removed by Filter: 999000
Planning time: 0.851 ms
Execution time: 119.401 ms
A side effect of this is that we're now able to better detect
impossible cases such as:
postgres=# explain analyze select count(*) from ts where ts <=
'2017-02-01'::timestamptz and ts is null;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Aggregate (cost=15937.00..15937.01 rows=1 width=8) (actual
time=135.012..135.012 rows=1 loops=1)
-> Seq Scan on ts (cost=0.00..15937.00 rows=1 width=0) (actual
time=135.007..135.007 rows=0 loops=1)
Filter: ((ts IS NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
Rows Removed by Filter: 1000000
Planning time: 0.067 ms
Execution time: 135.050 ms
(6 rows)
Master is not able to see the impossibility of this case:
postgres=# explain analyze select count(*) from ts where ts <=
'2017-02-01'::timestamptz and ts is null;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Aggregate (cost=15938.83..15938.84 rows=1 width=8) (actual
time=131.681..131.681 rows=1 loops=1)
-> Seq Scan on ts (cost=0.00..15937.00 rows=733 width=0) (actual
time=131.676..131.676 rows=0 loops=1)
Filter: ((ts IS NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
Rows Removed by Filter: 1000000
Planning time: 0.090 ms
Execution time: 131.719 ms
(6 rows)
Now one thing I was unsure of in the patch was this "Other clauses"
concept that I've come up with. In the patch we have:
default:
addCachedSelectivityOtherClause(&cslist, var, s2);
break;
I was unsure if this should only apply to F_EQSEL and F_NEQSEL. My
imagination here won't stretch far enough to imagine a OpExpr which
passes with a NULL operand. If it did my logic is broken, and if
that's the case then I think limiting "Others" to mean F_EQSEL and
F_NEQSEL would be the workaround.
Before I do any more work on this, I'd like to know, is this something
that we might want to fix?
It would be good to improve the situation here in the back branches
too, but I'm thinking that the attached might be a little invasive for
that?
Thoughts?
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
smarter_clausesel.patch | application/octet-stream | 18.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavan Deolasee | 2017-02-24 10:01:08 | Re: Patch: Write Amplification Reduction Method (WARM) |
Previous Message | Simon Riggs | 2017-02-24 09:54:22 | Re: UPDATE of partition key |