Re: BUG #18080: to_tsvector fails for long text input

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: uwe(dot)binder(at)pass-consulting(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18080: to_tsvector fails for long text input
Date: 2023-09-22 17:48:59
Message-ID: 1056584.1695404939@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> Yeah. My thought about blocking the error had been to limit
> prs.lenwords to MaxAllocSize/sizeof(ParsedWord) in this code.

Concretely, as attached. This allows the given test case to
complete, since it doesn't actually create very many distinct
words. In other cases we could expect to fail when the array
has to get enlarged, but that's just a normal implementation
limitation.

I looked for other places that might initialize lenwords
to not-sane values, and didn't find any.

BTW, the field order in ParsedWord is such that there's a fair
amount of wasted pad space on 64-bit builds. I doubt we can
get away with rearranging it in released branches; but maybe
it's worth doing something about that in HEAD, to push out
the point at which you hit the 1Gb limit.

regards, tom lane

Attachment Content-Type Size
bound-lenwords-in-to_tsvector_byid.patch text/x-diff 534 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message vignesh C 2023-09-22 18:55:59 Re: [16+] subscription can end up in inconsistent state
Previous Message Heikki Linnakangas 2023-09-22 13:43:06 Re: BUG #18129: GiST index produces incorrect query results