From: | Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows |
Date: | 2020-07-15 13:52:20 |
Message-ID: | 20200715155220.4bb89f56@firost |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I'm bumping this thread on pgsql-hacker, hopefully it will drag some more
opinions/discussions.
Should we try to fix this issue or not? This is clearly an upstream bug. It has
been reported, including regression tests, but this doesn't move since 2 years
now.
If we choose not to fix it on our side using eg a workaround (see patch), I
suppose this small bug should be documented somewhere so people are not lost
alone in the wild.
Opinions?
Regards,
Begin forwarded message:
Date: Sat, 13 Jun 2020 00:43:22 +0200
From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Роман Литовченко <roman(dot)lytovchenko(at)gmail(dot)com>, PostgreSQL mailing lists
<pgsql-bugs(at)lists(dot)postgresql(dot)org> Subject: Re: BUG #15285: Query used index
over field with ICU collation in some cases wrongly return 0 rows
On Fri, 12 Jun 2020 18:40:55 +0200
Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
> On Wed, 10 Jun 2020 00:29:33 +0200
> Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
> [...]
> > After playing with ICU regression tests, I found functions ucol_strcollIter
> > and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
> > here.
>
> I did some benchmarks. See attachment for the script and its header to
> reproduce.
>
> It sorts 935895 french phrases from 0 to 122 chars with an average of 49.
> Performance tests were done on current master HEAD (buggy) and using the patch
> in attachment, relying on ucol_strcollIter.
>
> My preliminary test with ucol_getSortKey was catastrophic, as we might
> expect. 15-17x slower than the current HEAD. So I removed it from actual
> tests. I didn't try with ucol_nextSortKeyPart though.
>
> Using ucol_strcollIter performs ~20% slower than HEAD on UTF8 databases, but
> this might be acceptable. Here are the numbers:
>
> DB Encoding HEAD strcollIter ratio
> UTF8 2.74 3.27 1.19x
> LATIN1 5.34 5.40 1.01x
>
> I plan to add a regression test soon.
Please, find in attachment the second version of the patch, with a
regression test.
Regards,
--
Jehan-Guillaume de Rorthais
Dalibo
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Replace-buggy-ucol_strcoll-funcs-with-ucol_strcollIt.patch | text/x-patch | 6.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Konstantin Knizhnik | 2020-07-15 14:28:07 | Re: Postgres is not able to handle more than 4k tables!? |
Previous Message | Peter Eisentraut | 2020-07-15 13:47:25 | Re: Improve handling of parameter differences in physical replication |