From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: OOM in spgist insert |
Date: | 2021-05-13 15:25:23 |
Message-ID: | 1517578.1620919523@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> writes:
> [ v4-0001-When-there-are-INCLUDEd-columns-in-SpGist-check-t.patch ]
I don't like this patch one bit --- it's basically disabling a fairly
important SPGiST feature as soon as there are included columns.
What's more, it's not really giving us any better defense against
the infinite-picksplit-loop problem than we had before.
I wonder if we should give up on the theory posited circa
spgdoinsert.c:2213:
* Note: if the opclass sets longValuesOK, we rely on the
* choose function to eventually shorten the leafDatum
* enough to fit on a page. We could add a test here to
* complain if the datum doesn't get visibly shorter each
* time, but that could get in the way of opclasses that
* "simplify" datums in a way that doesn't necessarily
* lead to physical shortening on every cycle.
The argument that there might be a longValuesOK opclass that *doesn't*
shorten the datum each time seems fairly hypothetical to me.
An alternative way of looking at things is that this is the opclass's
fault. It should have realized that it's not making progress, and
thrown an error. However, I'm not sure if the opclass picksplit
function has enough info to throw a meaningful error. It looks to
me like the trouble case from spg_text_picksplit's point of view
is that it's given a single tuple containing a zero-length string,
so that there is no prefix it can strip. However, it seems possible
that that could be a legitimate case. Even if it's not, the opclass
function doesn't have (for instance) the name of the index to cite
in an error message. Nor did we give it a way to return a failure
indication, which is seeming like a mistake right now.
BTW, another nasty thing I discovered while testing this is that
the CHECK_FOR_INTERRUPTS() at line 2146 is useless, because
we're holding a buffer lock there so InterruptHoldoffCount > 0.
So once you get into this loop you can't even cancel the query.
Seems like that needs a fix, too.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2021-05-13 15:30:26 | Re: parallel vacuum - few questions on docs, comments and code |
Previous Message | Bruce Momjian | 2021-05-13 14:59:48 | Re: PG 14 release notes, first draft |