Re: Result set stability in queries sorted by column with non-unique values

From: Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Niranjan <niranjan81(at)gmail(dot)com>, "pgsql-in-general(at)postgresql(dot)org" <pgsql-in-general(at)postgresql(dot)org>, aronp(at)guaranteedplus(dot)com
Subject: Re: Result set stability in queries sorted by column with non-unique values
Date: 2015-04-16 10:12:29
Message-ID: CACmQ0_+1pPC6eKK8Hw+RVuekhkQZMOZ1Sesny3uw0WRbAty3=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-in-general

Thank you guys, I think it's clear now that in order to ensure a
deterministically 100% consistent result across queries I should add a
secondary sorting key with unique values. Since we have an "id"
column, this seems the perfect candidate.
Best regards,
Mic

On 16 April 2015 at 06:23, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>
> On Wed, Apr 15, 2015 at 9:24 PM, Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
> wrote:
>>
>> Thanks Niranjan,
>>
>> In our case we require the rows returned to be in the same consistent
>> order across several queries as we need to paginate through a large
>> set of results by doing a combination of ORDER BY x, LIMIT, OFFSET.
>> However the sorting key "x" can contain non-unique values, thus we
>> need to be sure that rows with the same values end up in the same
>> position across the different queries we do per each result "page".
>>
>> As I understand, unless we add a secondary sorting key (like the "id"
>> of the row) which is guaranteed to be unique, we might end up with
>> rows that can potentially "fluctuate" across the query result, is it
>> correct?
>>
>
> There are two reasons why the ordering may differ between queries when ORDER
> BY is on a non-unique column:
>
> 1. Different query plans being used for different queries
> 2. Physical ordering of rows has varied because of UPDATE/DELETES and either
> VACUUM or HOT recycling dead space in the table.
>
> If you control both these factors or somehow know that they are going to be
> consistent across all queries, which pretty much means that all queries use
> the same scan method for the table and the table itself is not undergoing
> any DELETE/UPDATE, you could get the same row ordering. But clearly you
> can't build your application under those assumptions and it will be far
> better to add another column to the ORDER BY to guarantee same ordering.
>
> Thanks,
> Pavan
>
> --
> Pavan Deolasee
> http://www.linkedin.com/in/pavandeolasee

In response to

Browse pgsql-in-general by date

  From Date Subject
Next Message Bjørn-Willy Arntzen 2015-08-06 14:49:20 Database locale is incompatible with operation system
Previous Message Pavan Deolasee 2015-04-16 05:23:29 Re: Result set stability in queries sorted by column with non-unique values