From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Pathological performance when inserting many NULLs into a unique index |
Date: | 2019-04-18 02:37:17 |
Message-ID: | CAH2-Wzm08nr+JPx4jMOa9CGqxWYDQ-_D4wtPBiKghXAUiUy-nQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I thought of a case that results in pathological performance due to a
behavior of my nbtree patch series:
regression=# create table uniquenulls(nully int4, constraint pp unique(nully));
CREATE TABLE
Time: 10.694 ms
regression=# insert into uniquenulls select i from generate_series(1, 1e6) i;
INSERT 0 1000000
Time: 1356.025 ms (00:01.356)
regression=# insert into uniquenulls select null from generate_series(1, 1e6) i;
INSERT 0 1000000
Time: 270834.196 ms (04:30.834)
The issue here is that the duration of the second INSERT statement is
wildly excessive, because _bt_stepright() needs to step right many
many times for each tuple inserted. I would expect the second insert
to take approximately as long as the first, but it takes ~200x longer
here. It could take much longer still if we pushed this example
further. What we see here is a limited case in which the O(n ^ 2)
performance that "getting tired" used to prevent can occur. Clearly
that's totally unacceptable. Mea culpa.
Sure enough, the problem goes away when the index isn't a unique index
(i.e. in the UNIQUE_CHECK_NO case):
regression=# alter table uniquenulls drop constraint pp;
ALTER TABLE
Time: 28.968 ms
regression=# create index on uniquenulls (nully);
CREATE INDEX
Time: 1159.958 ms (00:01.160)
regression=# insert into uniquenulls select null from generate_series(1, 1e6) i;
INSERT 0 1000000
Time: 1155.708 ms (00:01.156)
Tentatively, I think that the fix here is to not "itup_key->scantid =
NULL" when a NULL value is involved (i.e. don't "itup_key->scantid =
NULL" when we see IndexTupleHasNulls(itup) within _bt_doinsert()). We
may also want to avoid calling _bt_check_unique() entirely in this
situation. That way, the performance should be the same as the
UNIQUE_CHECK_NO case: we descend to the appropriate leaf page
directly, and then we're done. We don't have to find the appropriate
leaf page by groveling through indefinitely-many existing leaf pages
that are full of NULL values, because we know that there cannot ever
be a unique violation for us to detect.
I'll create an open item for this, and begin work on a patch tomorrow.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2019-04-18 02:42:49 | Re: pgsql: Fix plan created for inherited UPDATE/DELETE with all tables exc |
Previous Message | Kato, Sho | 2019-04-18 02:09:52 | RE: Speeding up creating UPDATE/DELETE generic plan for partitioned table into a lot |