From: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
---|---|
To: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [PATCH] Erase the distinctClause if the result is unique by definition |
Date: | 2020-02-11 07:57:51 |
Message-ID: | 20200211075751.GA2139@nol |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 11, 2020 at 10:57:26AM +0800, Andy Fan wrote:
> On Tue, Feb 11, 2020 at 12:22 AM Ashutosh Bapat <
> ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> > I forgot to mention this in the last round of comments. Your patch was
> > actually removing distictClause from the Query structure. Please avoid
> > doing that. If you remove it, you are also removing the evidence that this
> > Query had a DISTINCT clause in it.
> >
>
> Yes, I removed it because it is the easiest way to do it. what is the
> purpose of keeping the evidence?
>
> >> However the patch as presented has some problems
> >> 1. What happens if the primary key constraint or NOT NULL constraint gets
> >> dropped between a prepare and execute? The plan will no more be valid and
> >> thus execution may produce non-distinct results.
>
> > But that doesn't matter since a query can be prepared outside a
> > transaction and executed within one or more subsequent transactions.
> >
>
> Suppose after a DDL, the prepared statement need to be re-parsed/planned
> if it is not executed or it will prevent the DDL to happen.
>
> The following is my test.
>
> postgres=# create table t (a int primary key, b int not null, c int);
> CREATE TABLE
> postgres=# insert into t values(1, 1, 1), (2, 2, 2);
> INSERT 0 2
> postgres=# create unique index t_idx1 on t(b);
> CREATE INDEX
>
> postgres=# prepare st as select distinct b from t where c = $1;
> PREPARE
> postgres=# explain execute st(1);
> QUERY PLAN
> -------------------------------------------------
> Seq Scan on t (cost=0.00..1.02 rows=1 width=4)
> Filter: (c = 1)
> (2 rows)
> ...
> postgres=# explain execute st(1);
> QUERY PLAN
> -------------------------------------------------
> Seq Scan on t (cost=0.00..1.02 rows=1 width=4)
> Filter: (c = $1)
> (2 rows)
>
> -- session 2
> postgres=# alter table t alter column b drop not null;
> ALTER TABLE
>
> -- session 1:
> postgres=# explain execute st(1);
> QUERY PLAN
> -------------------------------------------------------------
> Unique (cost=1.03..1.04 rows=1 width=4)
> -> Sort (cost=1.03..1.04 rows=1 width=4)
> Sort Key: b
> -> Seq Scan on t (cost=0.00..1.02 rows=1 width=4)
> Filter: (c = $1)
> (5 rows)
>
> -- session 2
> postgres=# insert into t values (3, null, 3), (4, null, 3);
> INSERT 0 2
>
> -- session 1
> postgres=# execute st(3);
> b
> ---
>
> (1 row)
>
> and if we prepare sql outside a transaction, and execute it in the
> transaction, the other session can't drop the constraint until the
> transaction is ended.
And what if you create a view on top of a query containing a distinct clause
rather than using prepared statements? FWIW your patch doesn't handle such
case at all, without even needing to drop constraints:
CREATE TABLE t (a int primary key, b int not null, c int);
INSERT INTO t VALUEs(1, 1, 1), (2, 2, 2);
CREATE UNIQUE INDEX t_idx1 on t(b);
CREATE VIEW v1 AS SELECT DISTINCT b FROM t;
EXPLAIN SELECT * FROM v1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
I also think this is not the right way to handle this optimization.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2020-02-11 08:22:54 | client-side fsync() error handling |
Previous Message | Peter Eisentraut | 2020-02-11 07:41:53 | Add PostgreSQL home page to --help output |