Quick Links

Re: Killing off removed rels properly

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Alexander Lakhin <exclusion(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Killing off removed rels properly
Date:	2023-02-20 20:48:04
Message-ID:	3724863.1676926084@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I wrote:
> But while I'm looking at this, 3c569049b seems kind of broken on
> its own terms. Why is it populating so little of the IndexOptInfo
> for a partitioned index? I realize that we're not going to directly
> plan anything using such an index, but not populating fields like
> sortopfamily seems like it's at least leaving stuff on the table,
> and at worst making faulty decisions.

I fixed the other issues discussed in this thread, but along the
way I grew even more concerned about 3c569049b, because I discovered
that it's changed plans in more ways than what its commit message
suggests. For example, given the setup

CREATE TABLE pa_target (tid integer PRIMARY KEY) PARTITION BY LIST (tid);
CREATE TABLE pa_targetp PARTITION OF pa_target DEFAULT;
CREATE TABLE pa_source (sid integer);

then I get this as of commit 3c569049b7^:

# explain select * from pa_source s left join pa_target t on s.sid = t.tid;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.15..544.88 rows=32512 width=8)
-> Seq Scan on pa_source s (cost=0.00..35.50 rows=2550 width=4)
-> Index Only Scan using pa_targetp_pkey on pa_targetp t (cost=0.15..0.19 rows=1 width=4)
Index Cond: (tid = s.sid)
(4 rows)

and this as of 3c569049b7 and later:

# explain select * from pa_source s left join pa_target t on s.sid = t.tid;
QUERY PLAN
----------------------------------------------------------------------------
Hash Left Join (cost=67.38..109.58 rows=2550 width=8)
Hash Cond: (s.sid = t.tid)
-> Seq Scan on pa_source s (cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on pa_targetp t (cost=0.00..35.50 rows=2550 width=4)
(5 rows)

Now, I'm not unhappy about that change: it's clearly a win that we now
realize we'll get at most one matching t row for each s row. What
I'm unhappy about is that this means a partially-populated IndexOptInfo
is being used for rowcount estimation and perhaps other things.
That seems like sheer folly. Even if it manages to not dump core
from trying to access a missing field, there's a significant risk of
wrong answers, now or in the future. Why was it done like that?

regards, tom lane

In response to

Re: Killing off removed rels properly at 2023-02-20 18:11:43 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Smith	2023-02-20 22:01:10	Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message	Alvaro Herrera	2023-02-20 20:17:47	Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error