Re: Seeking help with a query that takes too long

From: "Nick Fankhauser" <nickf(at)ontko(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Manfred Koizar" <mkoi-pg(at)aon(dot)at>, "Pgsql-Performance(at)Postgresql(dot) Org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Seeking help with a query that takes too long
Date: 2003-11-12 16:52:51
Message-ID: NEBBLAAHGLEEPCGOBHDGMEJKJGAA.nickf@ontko.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


> It looks like you are running with the default statistics target (10).
> Try boosting it to 100 or even more for this column (see ALTER TABLE
> SET STATISTICS, then re-ANALYZE) and see if the estimate gets better.

Here are the results & a few more clues:

prod1=# alter table actor alter column actor_full_name_uppercase set
statistics 1000;
ALTER TABLE
prod1=# analyze actor;
ANALYZE
prod1=# select count(distinct actor_full_name_uppercase) from actor;
count
---------
1453371
(1 row)

prod1=# select count(actor_id) from actor;
count
---------
3386359
(1 row)

This indicates to me that 1 isn't too shabby as an estimate if the whole
name is specified, but I'm not sure how this gets altered in the case of a
"LIKE"

prod1=# \x
Expanded display is on.
prod1=# SELECT * FROM pg_stats
prod1-# WHERE tablename='actor' AND attname='actor_full_name_uppercase';

<Header boilerplate snipped out>

schemaname | public
tablename | actor
attname | actor_full_name_uppercase
null_frac | 0.000586667
avg_width | 21
n_distinct | -0.14701

<Long list of values and frequencies snipped out>

correlation | -0.00211291

Question: What does it mean when n_distinct is negative?

New results of explain analyze:

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------------------------------------------------
Limit (cost=252683.61..252683.68 rows=28 width=116) (actual
time=169377.32..169378.39 rows=1000 loops=1)
-> Sort (cost=252683.61..252683.68 rows=29 width=116) (actual
time=169377.31..169377.69 rows=1001 loops=1)
Sort Key: max((actor.actor_full_name)::text),
count(case_data.case_id)
-> Aggregate (cost=252678.57..252682.91 rows=29 width=116)
(actual time=169305.79..169354.50 rows=3456 loops=1)
-> Group (cost=252678.57..252680.01 rows=289 width=116)
(actual time=169305.76..169330.00 rows=5879 loops=1)
-> Sort (cost=252678.57..252679.29 rows=289
width=116) (actual time=169305.75..169308.15 rows=5879 loops=1)
Sort Key: actor.actor_id
-> Nested Loop (cost=0.00..252666.74 rows=289
width=116) (actual time=89.27..169273.51 rows=5879 loops=1)
-> Nested Loop (cost=0.00..251608.11
rows=289 width=77) (actual time=57.73..92753.49 rows=5882 loops=1)
-> Index Scan using
actor_full_name_uppercase on actor (cost=0.00..456.88 rows=113 width=42)
(actual time=32.80..3197.28 rows=3501 loops=1)
Index Cond:
((actor_full_name_uppercase >= 'SANDERS'::character varying) AND
(actor_full_name_uppercase < 'SANDERT'::character varying))
Filter:
(actor_full_name_uppercase ~~ 'SANDERS%'::text)
-> Index Scan using
actor_case_assignment_actor_id on actor_case_assignment (cost=0.00..2181.29
rows=2616 width=35) (actual time=22.26..25.57 rows=2 loops=3501)
Index Cond: ("outer".actor_id =
actor_case_assignment.actor_id)
-> Index Scan using case_data_case_id on
case_data (cost=0.00..3.65 rows=1 width=39) (actual time=13.00..13.00
rows=1 loops=5882)
Index Cond: (case_data.case_id =
"outer".case_id)
Total runtime: 169381.38 msec
(17 rows)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-11-12 17:10:35 Re: Seeking help with a query that takes too long
Previous Message Tom Lane 2003-11-12 16:10:34 Re: Seeking help with a query that takes too long