Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: James Inform <james(dot)inform(at)pharmapp(dot)de>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>
Subject: Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation
Date: 2022-08-14 16:17:12
Message-ID: CAH2-WznXGos_OJ03wJANo_2S_9oVLiJE9E5p7ii6Y_jSpGnkfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Aug 14, 2022 at 7:25 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> When I build with --with-icu and run coverage testing on the core
> regression tests under LANG=en_US.utf8, I see that most of
> varstr_abbrev_convert() is reached, but *not* the two buggy
> buffer-enlargement stanzas. So that explains why we've not seen this
> in testing. I wonder whether there is a reasonably cheap way to test
> that. The submitted test case is clearly out of the question to add
> as a regression test...

I don't think that it would be terribly difficult. I notice that the
similar "TEXTBUFLEN isn't big enough" code path in varstr_cmp()
already has test coverage. As does the corresponding point in the
equivalent SortSupport-based comparator, varstrfastcmp_locale().

Oh, hang on -- I see why it's tricky to test, now that I took a fresh look
the code. I see that ucol_nextSortKeyPart()'s interface (used only
when the DB encoding is UTF-8) allows us to only request however many
bytes of the conditioned binary key that we need, which for us is
always just the first sizeof(Datum) bytes. That's probably another
factor that made this bug hard to reach; I suspect that
ucol_nextSortKeyPart() has a tendency to avoid needing very much
scratch space to produce a usable abbreviated key.

I also suspect that the test case happens to tickle some obscure UCA
implementation detail in just the right/wrong way, where (for whatever
reason) it is necessary for the implementation to use a fairly large
buffer, despite the fact that it knows that its varlena.c caller will
only require enough conditioned binary key bytes to build an 8 byte
abbreviated key. It might be very rare and hard to hit (and/or depend
on the ICU version), which would explain why it took this long to hear
any complaints. So...I think that it might be quite difficult to test
this.

BTW, is the plan to get rid of the questionable coding pattern in both
varstr_abbrev_convert() and in varstrfastcmp_locale()? I am in favor
of just using repalloc() across the board.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2022-08-14 16:42:50 Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation
Previous Message Devrim Gündüz 2022-08-14 14:27:05 Re: BUG #17582: Error: Package: gdal-libs-1.11.4-12.rhel7.x86_64 (@pgdg11)