Fwd: Really really slow select count(*)

From: felix <crucialfelix(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Fwd: Really really slow select count(*)
Date: 2011-02-04 16:19:14
Message-ID: AANLkTinRfJKzpSwE55Kpgjh7MkDi6sk-ub_k5x1WVP8C@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

sorry, reply was meant to go to the list.

---------- Forwarded message ----------
From: felix <crucialfelix(at)gmail(dot)com>
Date: Fri, Feb 4, 2011 at 5:17 PM
Subject: Re: [PERFORM] Really really slow select count(*)
To: sthomas(at)peak6(dot)com

On Fri, Feb 4, 2011 at 4:00 PM, Shaun Thomas <sthomas(at)peak6(dot)com> wrote:

> How big is this table when it's acting all bloated and ugly?
>
458MB

Is this the only thing running when you're doing your tests? What does your
> disk IO look like?

this is on a live site. best not to scare the animals.

I have the same config on the dev environment but not the same table size.

> 10k-50k updates per day
>> mostly of this sort: set priority=1 where id=12345
>>
>
> Well... that's up to 16% turnover per day, but even then, regular vacuuming
> should keep it manageable.

something is definitely amiss with this table.

I'm not sure if its something that happened at one point when killing an
task that was writing to it or if its something about the way the app is
updating. it SHOULDN'T be that much of a problem, though I can find ways to
improve it.

No. Don't do that. You'd be better off loading everything into a temp table
> and doing this:
>
> UPDATE fastadder_fastadderstatus s
> SET priority = 1
> FROM temp_statuses t
> WHERE t.id=s.id;
>

ok, that is one the solutions I was thinking about.

are updates of the where id IN (1,2,3,4) generally not efficient ?
how about for select queries ?

"fastadder_fastadderstatus_pkey" PRIMARY KEY, btree (id)
>> "fastadder_fastadderstatus_apt_id_key" UNIQUE, btree (apt_id, service_id)
>> "fastadder_fastadderstatus_agent_priority" btree (agent_priority)
>> "fastadder_fastadderstatus_apt_id" btree (apt_id)
>> "fastadder_fastadderstatus_built" btree (built)
>> "fastadder_fastadderstatus_last_checked" btree (last_checked)
>> "fastadder_fastadderstatus_last_validated" btree (last_validated)
>> "fastadder_fastadderstatus_position_in_queue" btree (position_in_queue)
>> "fastadder_fastadderstatus_priority" btree (priority)
>> "fastadder_fastadderstatus_running_status" btree (running_status)
>> "fastadder_fastadderstatus_service_id" btree (service_id)
>>
>
> Whoh! Hold on, here. That looks like *way* too many indexes.

I actually just added most of those yesterday in an attempt to improve
performance. priority and agent_priority were missing indexes and that was a
big mistake.

overall performance went way up on my primary selects

> Definitely will slow down your insert/update performance.

there are a lot more selects happening throughout the day

> The index on 'built' for example, is a boolean. If it's evenly distributed,
> that's 150k matches for true or false,

ok,

built True is in the minority.

here is the test query that caused me to add indices to the booleans. this
is a 30k table which is doing selects on two booleans constantly. again:
True is the minority

explain analyze SELECT "nsproperties_apt"."id",
"nsproperties_apt"."display_address", "nsproperties_apt"."apt_num",
"nsproperties_apt"."bldg_id", "nsproperties_apt"."is_rental",
"nsproperties_apt"."is_furnished", "nsproperties_apt"."listing_type",
"nsproperties_apt"."list_on_web", "nsproperties_apt"."is_approved",
"nsproperties_apt"."status", "nsproperties_apt"."headline",
"nsproperties_apt"."slug", "nsproperties_apt"."cross_street",
"nsproperties_apt"."show_apt_num", "nsproperties_apt"."show_building_name",
"nsproperties_apt"."external_url", "nsproperties_apt"."listed_on",
"nsproperties_bldg"."id", "nsproperties_bldg"."name" FROM "nsproperties_apt"
LEFT OUTER JOIN "nsproperties_bldg" ON ("nsproperties_apt"."bldg_id" =
"nsproperties_bldg"."id") WHERE ("nsproperties_apt"."list_on_web" = True AND
"nsproperties_apt"."is_available" = True ) ;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=408.74..10062.18 rows=3344 width=152) (actual
time=12.688..2442.542 rows=2640 loops=1)
Hash Cond: (nsproperties_apt.bldg_id = nsproperties_bldg.id)
-> Seq Scan on nsproperties_apt (cost=0.00..9602.52 rows=3344
width=139) (actual time=0.025..2411.644 rows=2640 loops=1)
Filter: (list_on_web AND is_available)
-> Hash (cost=346.66..346.66 rows=4966 width=13) (actual
time=12.646..12.646 rows=4966 loops=1)
-> Seq Scan on nsproperties_bldg (cost=0.00..346.66 rows=4966
width=13) (actual time=0.036..8.236 rows=4966 loops=1)
Total runtime: 2444.067 ms
(7 rows)

=>

Hash Left Join (cost=1232.45..9784.18 rows=5690 width=173) (actual
time=30.000..100.000 rows=5076 loops=1)
Hash Cond: (nsproperties_apt.bldg_id = nsproperties_bldg.id)
-> Bitmap Heap Scan on nsproperties_apt (cost=618.23..9075.84 rows=5690
width=157) (actual time=10.000..60.000 rows=5076 loops=1)
Filter: (list_on_web AND is_available)
-> BitmapAnd (cost=618.23..618.23 rows=5690 width=0) (actual
time=10.000..10.000 rows=0 loops=1)
-> Bitmap Index Scan on nsproperties_apt_is_available
(cost=0.00..131.81 rows=6874 width=0) (actual time=0.000..0.000 rows=6545
loops=1)
Index Cond: (is_available = true)
-> Bitmap Index Scan on nsproperties_apt_list_on_web
(cost=0.00..483.32 rows=25476 width=0) (actual time=10.000..10.000
rows=26010 loops=1)
Index Cond: (list_on_web = true)
-> Hash (cost=537.99..537.99 rows=6099 width=16) (actual
time=20.000..20.000 rows=6099 loops=1)
-> Seq Scan on nsproperties_bldg (cost=0.00..537.99 rows=6099
width=16) (actual time=0.000..10.000 rows=6099 loops=1)
Total runtime: 100.000 ms
(12 rows)

> rendering it useless, yet still requiring space and maintenance. I'm
> guessing the story is similar for quite a few of the others.
>
> It doesn't really explain your count speed, but it certainly isn't helping.
>

it shouldn't affect count speed at all
it will affect the updates of course.

>
> Something seems fishy, here.
>

indeed

>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
> 312-676-8870
> sthomas(at)peak6(dot)com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer.php
> for terms and conditions related to this email
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Achilleas Mantzios 2011-02-04 16:19:46 Re: Talking about optimizer, my long dream
Previous Message Mladen Gogala 2011-02-04 16:06:28 Re: Talking about optimizer, my long dream