Re: full text search on hstore or json with materialized view?

From: Rj Ewing <ewing(dot)rj(at)gmail(dot)com>
To: George Neuner <gneuner2(at)comcast(dot)net>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: full text search on hstore or json with materialized view?
Date: 2017-04-20 14:56:18
Message-ID: CAOSSsV0HXviU8aKUuBFa1yautVfsXZzL2NhihPeUF9+mR5Mp2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Apr 19, 2017 at 6:44 PM, George Neuner <gneuner2(at)comcast(dot)net> wrote:
>
> On Wed, 19 Apr 2017 11:57:26 -0700, Rj Ewing <ewing(dot)rj(at)gmail(dot)com>
> wrote:
>
> >I did some testing using a secondary table with the key, value column.
> >However I don't think this will provide the performance that we need.
> >Queries we taking 60+ seconds just for a count.
>
> SELECT count(*) or filtered?
>
> Either way, your statistics may be way off. Did you vacuum analyze
> the table after the inserts (and the index creation if it was done
> separately)?

​I think my statistics were off. I never ran vacuum analyze.

>
> Without more detail re: your hardware, Postgresql version, what
> indexes are/will be available, the types of queries you want to run,
> etc., it's very hard to give really meaningful suggestions.
>
> ​postgresql 9.6, currently testing on 8gb ram, but have upto 64gb for
production. 7-core 2.10GHz​.

mostly want to run ad-hoc queries, returning entire row, matching 1 - 3 k:v
conditions.

The kind of query you have alluded to is pretty easily parallelized:
> it can be spread over multiple sessions with result aggregation done
> on the client side.
>
> Or, if you you have 9.6, you might try using backend parallelism:
> https://www.postgresql.org/docs/9.6/static/parallel-query.html
> [I've not used this, but some people have done it successfully.]​

>

> If you can restrict the FTS query to certain keys:
>
> SELECT id FROM mytable
> WHERE tsquery( ... ) @@ to_tsvector(v)
> AND k IN ( ... )
> GROUP BY id
>
> [note: according to David Rowley, GROUP BY may be parallelized
> whereas DISTINCT currently cannot be.]
>
> then given an index on 'k' it may be much faster than just the FTS
> query alone. Subject to key variability, it also may be improved by
> table partitioning to reduce the search space.
>
> If the FTS query is key restricted, you can parallelize either on the
> client or on the server. If the FTS query is not key restricted, you
> pretty much are limited to server side (and 9.6 or later).
>
> ​I'll look into parallelism if we can't get the performance we need.

​What do you mean if I can restrict the FTS query to certain keys? I'm not
a sql expert, but it seems like the above query would match multiple keys
to 1 tsquery value?

We need to be able to do AND conditions with separate k:v pairs. Our keys
are know ahead of time, and would vary for a given table, but would be in
the range of 30-60 keys per table.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2017-04-20 15:01:12 Re: Large data and slow queries
Previous Message Adrian Klaver 2017-04-20 14:49:09 Re: Unable to upload backups