From: | Arya F <arya6000(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | 600 million rows of data. Bad hardware or need partitioning? |
Date: | 2020-05-04 03:27:54 |
Message-ID: | CAFoK1ay01_ND0QrCcAAHX7U=rTCssB8vfOiH+iv5b8Qh6WN80g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I have created the following table to duplicate my performance
numbers, but I have simplified the table for this question.
I'm running PostgreSQL 12 on the following hardware.
Dual Xeon Quad-Core E5320 1.86GHz
4GB of RAM
The table structure is
id uuid
address_api_url text
check_timestamp timestamp with time zone
address text
Indexes:
"new_table_pkey" PRIMARY KEY, btree (id)
"test_table_check_timestamp_idx" btree (check_timestamp)
Right now the table has 100 Million rows, but I expect it to reach
about 600-700 Million. I am faced with slow updates/inserts and the
issue is caused by the indices as it gets updates on each
insert/update, If I remove the indexes the insert performance remains
excellent with millions of rows.
To demonstrate the update performance I have constructed the following
query which updates the timestamp of 10000 rows
UPDATE test_table set check_timestamp = now() FROM(select id from
test_table limit 10000) AS subquery where test_table.id = subquery.id;
That update took about 1 minute and 44 seconds
Time: 104254.392 ms (01:44.254)
Below is the EXPLAIN ANALYZE
EXPLAIN ANALYZE UPDATE test_table set check_timestamp = now()
FROM(select id from test_table limit 10000) AS subquery where
test_table.id = subquery.id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_table (cost=0.57..28234.86 rows=10000 width=160)
(actual time=102081.905..102081.905 rows=0 loops=1)
-> Nested Loop (cost=0.57..28234.86 rows=10000 width=160) (actual
time=32.286..101678.652 rows=10000 loops=1)
-> Subquery Scan on subquery (cost=0.00..514.96 rows=10000
width=56) (actual time=0.048..45.127 rows=10000 loops=1)
-> Limit (cost=0.00..414.96 rows=10000 width=16)
(actual time=0.042..26.319 rows=10000 loops=1)
-> Seq Scan on test_table test_table_1
(cost=0.00..4199520.04 rows=101204004 width=16) (actual
time=0.040..21.542 rows=10000 loops=1)
-> Index Scan using new_table_pkey on test_table
(cost=0.57..2.77 rows=1 width=92) (actual time=10.160..10.160 rows=1
loops=10000)
Index Cond: (id = subquery.id)
Planning Time: 0.319 ms
Execution Time: 102081.967 ms
(9 rows)
Time: 102122.421 ms (01:42.122)
with the right hardware can one partition handle 600 millions of rows
with good insert/update performance? if so what kind of hardware
should I be looking at? Or would I need to create partitions? I'd like
to hear some recommendations.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Lewis | 2020-05-04 03:45:48 | Re: 600 million rows of data. Bad hardware or need partitioning? |
Previous Message | Justin Pryzby | 2020-05-03 15:38:26 | Re: Please help! Query jumps from 1s -> 4m |