Re: BUG #18080: to_tsvector fails for long text input

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: uwe(dot)binder(at)pass-consulting(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18080: to_tsvector fails for long text input
Date: 2023-09-15 11:41:56
Message-ID: 202309151141.pq2zpi5kxdvn@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2023-Sep-04, PG Bug reporting form wrote:

> SELECT to_tsvector('english'::regconfig, (REPEAT('<Long123456789/>'::text,
> 20000000)));
> results in
> ERROR: invalid memory alloc request size 2133333320

This is because to_tsvector_byid does this:

prs.lenwords = VARSIZE_ANY_EXHDR(in) / 6; /* just estimation of word's
* number */
if (prs.lenwords < 2)
prs.lenwords = 2;
prs.curwords = 0;
prs.pos = 0;
prs.words = (ParsedWord *) palloc(sizeof(ParsedWord) * prs.lenwords);

where sizeof(ParsedWord) is 40 (in my laptop). So this tries to
allocate more memory than palloc() is willing to give it. The attached
patch fixes just the query you supplied and nothing else.

I wonder if we want to support this kind of thing; I suspect we don't.
Other parts of text-search would fail in the same way and would also
need to receive similar fixes. However, the real problem comes when we
try to store such huge tsvectors, because that means we end up with
"huge" tuples on disk that need I/O support. Eventually AFAIR you run
into the size limit in the FE/BE protocol and all crashes and burns
because that one cannot be changed without bumping the version.

So I don't think this patch actually does you any good.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

Attachment Content-Type Size
huge_tsvector.patch text/x-diff 1.4 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-09-15 13:53:49 Re: BUG #18080: to_tsvector fails for long text input
Previous Message Julien Rouhaud 2023-09-15 11:37:56 Re: BUG #18111: Query hangs when trying to INSERT empty string into NOT NULL table