From: | Adé <ade(dot)hey(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables |
Date: | 2020-02-02 18:06:01 |
Message-ID: | CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello all,
Like the title says, using "gin_fuzzy_search_limit" degrades speed when it
has a relatively low setting.
What do I mean by "relatively low"? An example would be when a table with a
GIN index has many millions of rows and a particular keyword search has
1,000,000 possible results because the keyword is very common (or it's just
that the table is so supremely large that even a somewhat common keyword
appears enough to return one million results). However, you only want to
return around 100 random results from that one million, so you set
gin_fuzzy_search_limit to 100. That limit is relatively low when you look
at the ratio of the limit value to the possible results: 100 / 1,000,000 =
0.0001. You'll find the query is very slow for such a low ratio. It isn't
so slow if gin_fuzzy_search_limit is 100 but the keyword search has only a
total of 10,000 possible results (resulting in a higher ratio of 0.1).
This would explain why in the documentation it is said that "From
experience, values in the thousands (e.g., 5000 — 20000) work well". It's
not so common to have queries that return large enough result sets such
that gin_fuzzy_search_limit values between 5,000 and 20,000 would result in
low ratios and so result in the performance issue I've observed (these
gin_fuzzy_search_limit values have relatively high ratios between 0.005 and
0.02 if you have 1,000,000 results for a keyword search). However, if you
desire a lower gin_fuzzy_search_limit such as 100, while also having a
relatively larger table, you'll find this slowness issue.
I discussed this issue more and the reason for it in my original bug
report:
https://www.postgresql.org/message-id/16220-1a0a4f0cb67cafdc@postgresql.org
Attached is SQL to test and observe this issue and also attached is a patch
I want to eventually submit to a commitfest.
Best regards,
Adé
Attachment | Content-Type | Size |
---|---|---|
gin_fuzzy_search_limit_test.sql | application/octet-stream | 1.9 KB |
ginget.patch | application/octet-stream | 1.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2020-02-02 18:41:34 | Re: PATCH: add support for IN and @> in functional-dependency statistics use |
Previous Message | Daniel Gustafsson | 2020-02-02 17:19:04 | Re: TestLib condition for deleting temporary directories |