Re: BUG #18080: to_tsvector fails for long text input

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: uwe(dot)binder(at)pass-consulting(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18080: to_tsvector fails for long text input
Date: 2023-09-15 13:53:49
Message-ID: 3300287.1694786029@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> On 2023-Sep-04, PG Bug reporting form wrote:
>> SELECT to_tsvector('english'::regconfig, (REPEAT('<Long123456789/>'::text,
>> 20000000)));
>> results in
>> ERROR: invalid memory alloc request size 2133333320

> This is because to_tsvector_byid does this:
> prs.lenwords = VARSIZE_ANY_EXHDR(in) / 6; /* just estimation of word's
> * number */
> if (prs.lenwords < 2)
> prs.lenwords = 2;

Yeah. My thought about blocking the error had been to limit
prs.lenwords to MaxAllocSize/sizeof(ParsedWord) in this code.
I doubt that switching over to MCXT_ALLOC_HUGE is a good idea.
(Would we not also have to touch the places that repalloc that
array?)

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Laurenz Albe 2023-09-15 16:09:17 Re: BUG #17943: Undefined symbol LLVMBuildGEP in llvmjit.so during pg_restore
Previous Message Alvaro Herrera 2023-09-15 11:41:56 Re: BUG #18080: to_tsvector fails for long text input