Re: insert and query performance on big string table with pg_trgm

From: Gábor SZŰCS <surrano(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Matthew Hall <mhall(at)mhcomputing(dot)net>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: insert and query performance on big string table with pg_trgm
Date: 2017-11-25 09:19:59
Message-ID: CAHEufv1Y+-G5HJteunb45DvXUS+Xqte+TKAqtPFHdHBQUVkxrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Don't know if it would make PostgreSQL happier but how about adding a hash
value column and creating the unique index on that one? May block some
false duplicates but the unique index would be way smaller, speeding up
inserts.

2017. nov. 25. 7:35 ezt írta ("Jeff Janes" <jeff(dot)janes(at)gmail(dot)com>):

>
>
> On Nov 21, 2017 00:05, "Matthew Hall" <mhall(at)mhcomputing(dot)net> wrote:
>
>
> > Are all indexes present at the time you insert? It will probably be
> much faster to insert without the gin index (at least) and build it after
> the load.
>
> There is some flexibility on the initial load, but the updates in the
> future will require the de-duplication capability. I'm willing to accept
> that might be somewhat slower on the load process, to get the accurate
> updates, provided we could try meeting the read-side goal I wrote about, or
> at least figure out why it's impossible, so I can understand what I need to
> fix to make it possible.
>
>
> As long as you don't let anyone use the table between the initial load and
> when the index build finishes, you don't have to compromise on
> correctness. But yeah, makes sense to worry about query speed first.
>
>
>
>
>
>
> > If you repeat the same query, is it then faster, or is it still slow?
>
> If you keep the expression exactly the same, it still takes a few seconds
> as could be expected for such a torture test query, but it's still WAY
> faster than the first such query. If you change it out to a different
> expression, it's longer again of course. There does seem to be a
> low-to-medium correlation between the number of rows found and the query
> completion time.
>
>
> To make this quick, you will need to get most of the table and most of the
> index cached into RAM. A good way to do that is with pg_prewarm. Of
> course that only works if you have enough RAM in the first place.
>
> What is the size of the table and the gin index?
>
>
> Cheers,
>
> Jeff
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dmitry Shalashov 2017-11-25 11:54:55 Re: Query became very slow after 9.6 -> 10 upgrade
Previous Message Jeff Janes 2017-11-25 06:35:22 Re: insert and query performance on big string table with pg_trgm