Re: Why is GIN index slowing down my query?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: AlexK987 <alex(dot)cue(dot)987(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Why is GIN index slowing down my query?
Date: 2015-02-02 03:15:45
Message-ID: 12830.1422846945@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

AlexK987 <alex(dot)cue(dot)987(at)gmail(dot)com> writes:
> This is a realistic case: everyone have Python and Java skills, but PostGis
> and Haskell and Closure are rare. If we are looking for a person that has
> all the skills required for a task (array[1, 15]), that is "skills <@
> array[1, 15] " and not the opposite, right?

One of us has this backwards. It might be me, but I don't think so.
Consider a person who has the two desired skills plus skill #42:

regression=# select array[1,15,42] <@ array[1,15];
?column?
----------
f
(1 row)

regression=# select array[1,15,42] @> array[1,15];
?column?
----------
t
(1 row)

> Also can you explain why " entries for "0" and "1" swamp everything else so
> that the planner
> doesn't know that eg "15" is really rare. " I thought that if a value is not
> found in the histogram, than clearly that value is rare, correct? What am I
> missing here?

The problem is *how* rare. The planner will take the lowest frequency
seen among the most common elements as an upper bound for the frequency of
unlisted elements --- but if all you have in the stats array is 0 and 1,
and they both have frequency 1.0, that doesn't tell you anything. And
that's what I see for this example:

regression=# select most_common_elems,most_common_elem_freqs from pg_stats where tablename = 'talent' and attname = 'skills';
most_common_elems | most_common_elem_freqs
-------------------+------------------------
{0,1} | {1,1,1,1,0}
(1 row)

With a less skewed distribution, that rule of thumb would work better :-(

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Christian Weyer 2015-02-02 06:31:24 Re: Unexpected (bad) performance when querying indexed JSONB column
Previous Message AlexK987 2015-02-02 03:00:05 Re: Why is GIN index slowing down my query?