Quick Links

Large data and slow queries

From:	Samuel Williams <space(dot)ship(dot)traveller(at)gmail(dot)com>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Large data and slow queries
Date:	2017-04-19 04:01:40
Message-ID:	CAHkN8V-09bXEMh9fMA_g9-73rv03CaCdkFQ99ZwebtGDyGhoYw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi.

We have 400,000,000 records in a table (soon to be 800,000,000), here
is the schema (\d+)

https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121

We want the following kinds of query to be fast:

SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE
"user_event"."what" IN ('poll', 'location_change',
'midnight_location') AND ("user_event"."created_at" >= '2016-04-19
01:23:55') AND (latitude > -37.03079375089291 AND latitude <
-36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
175.0805140220076);

We have a btree index and it appears to be working. However, it's
still pretty slow.

EXPLAIN ANALYZE gives the following:
https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314

I'm thinking that I need to do the following to help:

CLUSTER user_event ON index_user_event_for_visits_3 followed by
analyze... Our data is mostly time series but sometimes we get some
dumps with historical records.

Perhaps add a BRIN index on created_at

I'm wondering if... we can use an index to cache, all user_ids seen on
a given day. If we change our index to be more discrete, e.g.
created_at::date, would this help? The set union of user_ids for 365
days should be pretty fast?

I'm open to any ideas or suggestions, ideally we can keep
optimisations within the database, rather than adding a layer of
caching on top.

Kind regards,
Samuel

Responses

Re: Large data and slow queries at 2017-04-19 04:05:16 from Samuel Williams
Re: Large data and slow queries at 2017-04-19 04:42:03 from John R Pierce
Re: Large data and slow queries at 2017-04-19 10:50:10 from Alban Hertroys
Re: Large data and slow queries at 2017-04-19 11:25:04 from Martijn Tonies (Upscene Productions)
Re: Large data and slow queries at 2017-04-19 12:48:15 from Stephen Frost

Browse pgsql-general by date

	From	Date	Subject
Next Message	Samuel Williams	2017-04-19 04:05:16	Re: Large data and slow queries
Previous Message	Jeff Janes	2017-04-19 01:46:32	Re: streaming replication and archive_status