Quick Links

Re: BUG #18080: to_tsvector fails for long text input

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	uwe(dot)binder(at)pass-consulting(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #18080: to_tsvector fails for long text input
Date:	2023-09-22 17:48:59
Message-ID:	1056584.1695404939@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

I wrote:
> Yeah. My thought about blocking the error had been to limit
> prs.lenwords to MaxAllocSize/sizeof(ParsedWord) in this code.

Concretely, as attached. This allows the given test case to
complete, since it doesn't actually create very many distinct
words. In other cases we could expect to fail when the array
has to get enlarged, but that's just a normal implementation
limitation.

I looked for other places that might initialize lenwords
to not-sane values, and didn't find any.

BTW, the field order in ParsedWord is such that there's a fair
amount of wasted pad space on 64-bit builds. I doubt we can
get away with rearranging it in released branches; but maybe
it's worth doing something about that in HEAD, to push out
the point at which you hit the 1Gb limit.

regards, tom lane

Attachment	Content-Type	Size
bound-lenwords-in-to_tsvector_byid.patch	text/x-diff	534 bytes

In response to

Re: BUG #18080: to_tsvector fails for long text input at 2023-09-15 13:53:49 from Tom Lane

Responses

Re: BUG #18080: to_tsvector fails for long text input at 2023-09-22 19:31:10 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	vignesh C	2023-09-22 18:55:59	Re: [16+] subscription can end up in inconsistent state
Previous Message	Heikki Linnakangas	2023-09-22 13:43:06	Re: BUG #18129: GiST index produces incorrect query results