Re: Seeking help with a query that takes too long

From: "Nick Fankhauser" <nickf(at)ontko(dot)com>
To: "Manfred Koizar" <mkoi-pg(at)aon(dot)at>, "Pgsql-Performance(at)Postgresql(dot) Org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Seeking help with a query that takes too long
Date: 2003-11-12 15:55:48
Message-ID: NEBBLAAHGLEEPCGOBHDGKEJFJGAA.nickf@ontko.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>(actual time=37.62..677.44 rows=3501 loops=1)
^^^^^^^^^

> Nick, can you find out why this row count estimation is so far off?

It's actually correct:

prod1=# select count(actor_id) from actor where actor_full_name_uppercase
like 'SANDERS%';
count
-------
3501
(1 row)

Of course, I merely chose "SANDERS" arbitrarily as a name that falls
somewhere near the middle of the frequency range for names. SMITH or JONES
would represent a worst-case, and something like KOIZAR would probably be
unique.

Here are the stats:

prod1=# SELECT * FROM pg_stats
prod1-# WHERE tablename='actor' AND attname='actor_full_name_uppercase';
-[ RECORD
1 ]-----+-------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------------------------------------------
schemaname | public
tablename | actor
attname | actor_full_name_uppercase
null_frac | 0.000333333
avg_width | 21
n_distinct | 24215
most_common_vals | {"STATE OF INDIANA","INDIANA DEPARTMENT OF
REVENUE","BARTH CONS SCHOOL CORP","HOWARD COUNTY CLERK","ADVANCED RECOVERY
SERVICES","STATE OF INDIANA-DEPT OF REVENUE","ALLIED COLLECTION SERVICE
INC","CREDIT BUREAU OF LAPORTE","MIDWEST COLLECTION SVC INC","NCO FINANCIAL
SYSTEMS INC"}
most_common_freqs |
{0.0153333,0.0143333,0.00433333,0.00433333,0.004,0.00366667,0.00333333,0.003
33333,0.00266667,0.00266667}
histogram_bounds | {"(POE) ESTELLE, DENISE","BRIEN, LIISI","COTTRELL,
CAROL","FAMILY RENTALS","HAYNES, TAMIKA","KESSLER, VICTORIA","MEFFORD,
VERNON L","PHILLIPS, GERALD L","SHELTON, ANTOINETTE","TRICARICO, MELISSA
SUE","ZUEHLKE, THOMAS L"}
correlation | -0.00147395

I think this means that the average is 357 per actor. As you can see, the
range of assignments varies from people with a single parking ticket to
"State of Indiana", which is party to many thousands of cases.

> BTW, there seem to be missing cases:
> > -> Nested Loop (cost=0.00..2214.66 rows=2 width=115)
> > (actual time=59.05..119929.71 rows=5879 loops=1)
> ^^^^
> > -> Nested Loop (cost=0.00..2205.26 rows=3 width=76)
> > (actual time=51.46..66089.04 rows=5882 loops=1)

This is expected- We actually aggregate data from many county court
databases, with varying levels of data "cleanliness".

Regards,
-Nick

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Nick Fankhauser 2003-11-12 16:05:10 Re: Seeking help with a query that takes too long
Previous Message scott.marlowe 2003-11-12 15:53:56 Re: Value of Quad vs. Dual Processor machine