Select count(*) on a 2B Rows Tables Takes ~20 Hours

From: Fd Habash <fmhabash(at)gmail(dot)com>
To: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Select count(*) on a 2B Rows Tables Takes ~20 Hours
Date: 2018-09-13 17:33:54
Message-ID: 5b9a9f81.1c69fb81.43388.0e8b@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Based on my research in the forums and Google , it is described in multiple places that ‘select count(*)’ is expected to be slow in Postgres because of the MVCC controls imposed upon the query leading a table scan. Also, the elapsed time increase linearly with table size.

However, I do not know if elapsed time I’m getting is to be expected.

Table reltuples in pg_class = 2,266,649,344 (pretty close)
Query = select count(*) from jim.sttyations ;
Elapsed time (ET) = 18.5 hrs

This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g). CPU usage during count run hovers around 20% with 20g of freeable memory.

Is this ET expected? If not, what could be slowing it down? I’m currently running explain analyze and I’ll share the final output when done.

I’m familiar with the ideas listed here https://www.citusdata.com/blog/2016/10/12/count-performance/

Table "jim.sttyations"
Column | Type | Modifiers | Storage | Stats target | Description
-------------------+--------------------------+----------------------------+----------+--------------+-------------
stty_id | bigint | not null | plain | |
stty_hitlist_line | text | not null | extended | |
stty_status | text | not null default 'Y'::text | extended | |
stty_status_date | timestamp with time zone | not null | plain | |
vs_number | integer | not null | plain | |
stty_date_created | timestamp with time zone | not null | plain | |
stty_stty_id | bigint | | plain | |
stty_position | bigint | | plain | |
mstty_id | bigint | | plain | |
vsr_number | integer | | plain | |
stty_date_modified | timestamp with time zone | | plain | |
stty_stored | text | not null default 'N'::text | extended | |
stty_sequence | text | | extended | |
stty_hash | text | | extended | |
Indexes:
"stty_pk" PRIMARY KEY, btree (stty_id)
"stty_indx_fk01" btree (stty_stty_id)
"stty_indx_fk03" btree (vsr_number)
"stty_indx_fk04" btree (vs_number)
"stty_indx_pr01" btree (mstty_id, stty_id)
Check constraints:
"stty_cnst_ck01" CHECK (stty_status = ANY (ARRAY['Y'::text, 'N'::text]))
"stty_cnst_ck02" CHECK (stty_stored = ANY (ARRAY['N'::text, 'Y'::text]))
Foreign-key constraints:
"stty_cnst_fk01" FOREIGN KEY (stty_stty_id) REFERENCES sttyations(stty_id) NOT VALID
"stty_cnst_fk02" FOREIGN KEY (mstty_id) REFERENCES master_sttyations(mstty_id)
"stty_cnst_fk03" FOREIGN KEY (vsr_number) REFERENCES valid_status_reasons(vsr_number)

----------------
Thank you

refpep-> select count(*) from jim.sttyations;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Aggregate (cost=73451291.77..73451291.78 rows=1 width=8)
Output: count(*)
-> Index Only Scan using stty_indx_fk03 on jim.sttyations (cost=0.58..67784668.41 rows=2266649344 width=0)
Output: vsr_number
(4 rows)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2018-09-13 18:05:32 Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours
Previous Message padusuma 2018-09-13 12:57:39 Re: Performance of INSERT into temporary tables using psqlODBC driver