Re: sorting problem with distinct on()

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Sky Lendar <skylendar(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: sorting problem with distinct on()
Date: 2023-05-05 14:16:22
Message-ID: CAKFQuwaa8z-=2S7A1KBez4ga=k02SdYffVtkca3YXrYQFQFD+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, May 5, 2023 at 6:55 AM Sky Lendar <skylendar(at)gmail(dot)com> wrote:

> Hi there ans thx for reading and answering this post if you can:
>
> Let's regard an example of a table (stars) containing a code for a star
> (symb)
> and its index (nb) in a file.
>
>
> nb | symb
> ----+------------
> 0 | alTau
> 1 | alTau
> 2 | bePer
> 3 | alSco
> 4 | alLeo
> 5 | alCMa
> 6 | alVir
> 7 | Trappist-1
> 8 | SgrA*
> 9 | SgrA*
> 10 | GA
> 11 | VC
> 12 | M31
> 13 | M44
> 14 | alUMi
> 15 | M87
> 16 | alCyg
> 17 | alCyg
> 18 | beOri
> 19 | omiCet
> 20 | epTau
> 21 | epCas
> 22 | alAnd
> 23 | alAnd
> 24 | beAnd
> 25 | ga-1And
> 26 | ga-1And
> 27 | ga-1And
> 28 | ga-1And
> 29 | xiAnd
> 30 | upAnd
> 31 | upAnd
> 32 | 14And
> 33 | 51And
> 34 | M31
> 35 | alAql
> 36 | alAql
> 37 | beAql
> 38 | gaAql
> 39 | deAql
> 40 | epAql
> 41 | zeAql
> 42 | zeAql
> 43 | etAql
> 44 | thAql
> 45 | ioAql
> 46 | laAql
> 47 | xiAql
> 48 | 12Aql
> 49 | alAqr
> 50 | alAqr
> 51 | beAqr
> 52 | gaAqr
> 53 | gaAqr
> 54 | deAqr
> 55 | epAqr
> 56 | epAqr
> 57 | ze-1Aqr
> 58 | etAqr
> 59 | etAqr
> 60 | thAqr
> 61 | kaAqr
> 62 | laAqr
> 63 | laAqr
> 64 | laAqr
> 65 | laAqr
> 66 | nuAqr
> 67 | piAqr
> 68 | xiAqr
> 69 | Trappist-1
> 70 | alAra
> 71 | muAra
> 72 | alAri
> 73 | beAri
> 74 | beAri
> 75 | gaAri
> 76 | deAri
> 77 | 39Ari
> 78 | 41Ari
> 79 | alAur
> 80 | alAur
> 81 | beAur
> 82 | deAur
> 83 | epAur
> 84 | epAur
> 85 | epAur
> 86 | zeAur
> 87 | zeAur
> 88 | zeAur
> 89 | zeAur
> 90 | etAur
> 91 | etAur
> 92 | thAur
> 93 | thAur
> 94 | thAur
> 95 | ioAur
> 96 | ioAur
> 97 | ioAur
> 98 | alBoo
> 99 | alBoo
>
> Notice that some symbs are duplicated and I want to select only the
> distinct symbs.
> So, I could use
>
> select distinct on(symb) * from stars;
>
> I get this result:
>
> nb | symb
> ----+-------
> 48 | 12Aql
> 32 | 14And
> 77 | 39Ari
> 78 | 41Ari
> 33 | 51And
> 10 | GA
> 34 | M31
> 13 | M44
> 15 | M87
> 9 | SgrA*
>
> Notice that 12 is missing in the list.
> Even with x as (select distinct on(symb) * from stars) select * from x
> where nb = 12 order by nb;
>

nb = 12 is a duplicate with np = 34

Since your DISTINCT ON *subquery* doesn't specify an ordering which of
those two are chosen as the representative record for M31 is
non-determinstic.

If you want to ensure the lowest valued nb is chosen you need to sort the
*subquery*. The first record the DISTINCT encounters is the one selected
to represent.

Sorting in the outer/main query happens after the DISTINCT and so the
record is already gone.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeffrey Walton 2023-05-05 14:16:27 Re: SSL Enablement in Postgres via Client App like PG-ADMIN-4, Java.
Previous Message Sky Lendar 2023-05-05 13:55:21 sorting problem with distinct on()