Re: Large data and slow queries

From: Andrew Staller <andrew(at)timescale(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large data and slow queries
Date: 2017-04-20 14:44:07
Message-ID: CAEsM1FvhnrnQDbkYDb2WQwayRF8Ft6c9OeQFRcs+jK3ZkPX74g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Awesome thread.

Samuel,

Just wanted you to be aware of the work we're doing at TimescaleDB (
http://www.timescale.com/), a time-series database extension for PostgreSQL.

Some of how we might help you:
- automatic partitioning by space (primary key - like country_id, for
instance) and time. This creates "chunks" of your data, right-sized by
volume and time constraints (which makes inserts fast at much greater scale
than Vanilla PostgreSQL - not sure if this is a problem for you)
- this will also help if your queries are selective on time and country_id
(per this example)
- the partitioning by time allows you to DROP old chunks without the need
for vacuums

On Thu, Apr 20, 2017 at 8:30 AM, Vick Khera <vivek(at)khera(dot)org> wrote:

> I'm curious why you have so many partial indexes. Are you trying to make
> custom indexes per query? It seems to me you might want to consider making
> the indexes general, and remove the redundant ones (that have the same
> prefix list of indexed fields).
>
> Secondly your table is 102Gb. Clearly there's a lot of data here. How many
> rows does that take? I would further suggest that you partition this table
> such that there are no more than about 10 million rows per partition (I've
> done this by using a id % 100 computation). Maybe in your case it makes
> sense to partition it based on the "what" field, because it appears you are
> trying to do that with your partial indexes already.
>
> On Wed, Apr 19, 2017 at 10:11 PM, Samuel Williams <
> space(dot)ship(dot)traveller(at)gmail(dot)com> wrote:
>
>> Okay, so after changing longitude/latitude to float4, and
>> re-organizing the table a bit, I got the query down from about 8
>> minutes to 40 seconds.
>>
>> The details are in the gist comments:
>> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121
>>
>> Now, just need to get performance another 2 orders of magnitude
>> better. Can we make the index a bit more coarse grained, perhaps
>> convert long/lat to integers or something, use a hilbert index, use a
>> postgis index... ideas?
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>

--
TimescaleDB* | *Growth & Developer Evangelism
c: 908.581.9509

335 Madison Ave.
New York, NY 10017
www.timescale.com
https://github.com/timescale/timescaledb

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2017-04-20 14:49:09 Re: Unable to upload backups
Previous Message Jan de Visser 2017-04-20 14:27:45 Re: UDP buffer drops / statistics collector