From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: neqjoinsel versus "refresh materialized view concurrently" |
Date: | 2018-03-13 22:56:38 |
Message-ID: | CAEepm=0H9+XviC5bZ=DmH3O6vg2BDOouZx2RRcYCu0-WXaLBGw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 14, 2018 at 11:34 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> LOG: duration: 26101.612 ms plan:
> Query Text: SELECT newdata FROM pg_temp_3.pg_temp_16452 newdata WHERE
> newdata IS NOT NULL AND EXISTS (SELECT * FROM pg_temp_3.pg_temp_16452
> newdata2 WHERE newdata2 IS NOT NULL AND newdata2
> OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid
> OPERATOR(pg_catalog.<>) newdata.ctid) LIMIT 1
> Limit (cost=0.00..90.52 rows=1 width=28) (actual
> time=26101.608..26101.608 rows=0 loops=1)
> -> Nested Loop Semi Join (cost=0.00..225220.96 rows=2488 width=28)
> (actual time=26101.606..26101.606 rows=0 loops=1)
> Join Filter: ((newdata2.ctid <> newdata.ctid) AND (newdata.* *=
> newdata2.*))
> Rows Removed by Join Filter: 25000000
> -> Seq Scan on pg_temp_16452 newdata (cost=0.00..73.00
> rows=4975 width=34) (actual time=0.022..15.448 rows=5000 loops=1)
> Filter: (newdata.* IS NOT NULL)
> -> Materialize (cost=0.00..97.88 rows=4975 width=34) (actual
> time=0.000..0.500 rows=5000 loops=5000)
> -> Seq Scan on pg_temp_16452 newdata2 (cost=0.00..73.00
> rows=4975 width=34) (actual time=0.010..4.033 rows=5000 loops=1)
> Filter: (newdata2.* IS NOT NULL)
This plan is chosen because we're looking for just one row (LIMIT 1)
that has equal data but a different ctid. In this case we're not
going to find one, so we'll pay the full enormous cost of the nested
loop, but the startup cost is estimated as 0 and we think we are going
to find a row straight away. That's because we don't know that it's
unlikely for there to be a row with the same columns but a different
ctid.
There is a fundamental and complicated estimation problem lurking here
of course and I'm not sure what to think about that yet. Maybe there
is a very simple fix for this particular problem:
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -660,7 +660,7 @@ refresh_by_match_merge(Oid matviewOid, Oid
tempOid, Oid relowner,
"(SELECT * FROM %s newdata2
WHERE newdata2 IS NOT NULL "
"AND newdata2
OPERATOR(pg_catalog.*=) newdata "
"AND newdata2.ctid
OPERATOR(pg_catalog.<>) "
- "newdata.ctid) LIMIT 1",
+ "newdata.ctid)",
tempname, tempname);
if (SPI_execute(querybuf.data, false, 1) != SPI_OK_SELECT)
elog(ERROR, "SPI_exec failed: %s", querybuf.data);
That gets me back to the sort-merge plan, but maybe it's too superficial.
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-03-13 23:10:58 | Re: SQL/JSON: functions |
Previous Message | Michael Paquier | 2018-03-13 22:54:35 | Re: SQL/JSON: functions |