Re: Large data and slow queries

From: vinny <vinny(at)xs4all(dot)nl>
To: Samuel Williams <space(dot)ship(dot)traveller(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large data and slow queries
Date: 2017-04-19 07:31:41
Message-ID: 2d52de6778a7e907db41267d8e96d373@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2017-04-19 07:04, Samuel Williams wrote:
> Thanks John. Yes, you are absolutely right, you want the index to be
> bottom heavy so you can cull as much as possible at the top. I'm
> familiar with that, once implementing a brute-force sudoku solver, it
> has the same principle.
>
> I've been working on this all afternoon. By reducing the longitude,
> latitude columns to float4, in my test cases, I found about 50%
> improvement in performance. It may also use less space. So part of the
> problem was my choice of data type. We've computed that float4 has a
> worst case precision of about 1.6m which we are okay with for
> analytics data.
>
> Another option we may consider is using a (signed) integer - e.g.
> longitude = 180*(v/2^31) and latitude = 180*(v/2^31) as this has a
> uniform error across all points, but it's a bit more cumbersome to
> handle. Is there a rational datatype in postgres which works like
> this?
>
>
>
> On 19 April 2017 at 16:42, John R Pierce <pierce(at)hogranch(dot)com> wrote:
>> On 4/18/2017 9:01 PM, Samuel Williams wrote:
>>>
>>> We want the following kinds of query to be fast:
>>>
>>> SELECT ... AND (latitude > -37.03079375089291 AND latitude <
>>> -36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
>>> 175.0805140220076);
>>
>>
>>
>> I wonder if GIST would work better if you use the native POINT type,
>> and
>> compared it like
>>
>> mypoint <@ BOX
>> '((174.6307139779924,-37.03079375089291),(175.0805140220076,-36.67086424910709
>> ))'
>>
>> with a gist index on mypoint...
>>
>> but, it all hinges on which clauses in your query are most selective,
>> thats
>> where you want an index.
>>
>> --
>> john r pierce, recycling bits in santa cruz
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general

Did that 50% performance gain come from just the datatype, or that fact
that the index became smaller?

Given the number of records, my first thought was either partitioning or
partial-indexes.
The fewer rows are in the index, the quicker it will be to check,
and it's not a lot of work to create separate indexes for lat/long
ranges or dates.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom DalPozzo 2017-04-19 07:34:45 Re: tuple statistics update
Previous Message Tom DalPozzo 2017-04-19 07:28:43 Re: tuple statistics update