Re: Large data and slow queries

From: Samuel Williams <space(dot)ship(dot)traveller(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large data and slow queries
Date: 2017-04-19 04:05:16
Message-ID: CAHkN8V-WbTtRVu4H=F-L4KpqN5g26QJ3QQf90pDW4E9TU0i=Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Oh, I've also tried earth distance and ll_to_earth in a GIST index...
it was slower that the BTREE index on a small subset of data in my
tests.

On 19 April 2017 at 16:01, Samuel Williams
<space(dot)ship(dot)traveller(at)gmail(dot)com> wrote:
> Hi.
>
> We have 400,000,000 records in a table (soon to be 800,000,000), here
> is the schema (\d+)
>
> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121
>
> We want the following kinds of query to be fast:
>
> SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE
> "user_event"."what" IN ('poll', 'location_change',
> 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19
> 01:23:55') AND (latitude > -37.03079375089291 AND latitude <
> -36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
> 175.0805140220076);
>
> We have a btree index and it appears to be working. However, it's
> still pretty slow.
>
> EXPLAIN ANALYZE gives the following:
> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314
>
> I'm thinking that I need to do the following to help:
>
> CLUSTER user_event ON index_user_event_for_visits_3 followed by
> analyze... Our data is mostly time series but sometimes we get some
> dumps with historical records.
>
> Perhaps add a BRIN index on created_at
>
> I'm wondering if... we can use an index to cache, all user_ids seen on
> a given day. If we change our index to be more discrete, e.g.
> created_at::date, would this help? The set union of user_ids for 365
> days should be pretty fast?
>
> I'm open to any ideas or suggestions, ideally we can keep
> optimisations within the database, rather than adding a layer of
> caching on top.
>
> Kind regards,
> Samuel

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2017-04-19 04:42:03 Re: Large data and slow queries
Previous Message Samuel Williams 2017-04-19 04:01:40 Large data and slow queries