Re: Postgres performance comparing GCP and AWS

From: Gunther Schadow <raj(at)gusw(dot)net>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Postgres performance comparing GCP and AWS
Date: 2021-02-24 13:10:20
Message-ID: 3dcfe362-b9fd-5168-863d-ef5651b74a71@gusw.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Maurici,

in my experience the key factor about speed in big queries is sequential
scan. There is a huge variance in how the system is tuned. In some cases
I cannot read more than 10 MB/s, in others I get to expect 20-40 MB/s.
But then, when things are tuned well and the parallel workers set in, I
see the throughput spike to 100-200 MB/s.

You may have to enable the parallel workers in your postgresql.conf

So, to me, this is what you want to check first. While the query runs,
have both iostat and top running, with top -j or -c or -a or whatever it
is on that particular OS to see the detail info about the process.
Perhaps even -H to see threads.

Then you should see good flow with high read speed and reasonable CPU
load %. If you get low read speed and low CPU that is a sign of IO
blockage somewhere. If you get high CPU and low IO, that's a planning
mistake (the nested loop trap). You don't have that here apparently. But
index scans I have seen with much worse IO throughput than seq table
scans. Not sure.

Also, on AWS you need to be sure you have enough IOPS provisioned on
your EBS (I use gp3 now where you can have up to 10k IOPS) and also
check bus throughput of the EC2 instance. Needless to say you don't want
a t* instance where you have a limited burst CPU capacity only.

regards,
-Gunther

On 2/23/2021 1:12 PM, Maurici Meneghetti wrote:
> Hi everyone,
>
> I have 2 postgres instances created from the same dump (backup), one
> on a GCP VM and the other on AWS RDS. The first instance takes 18
> minutes and the second one takes less than 20s to run this simples query:
> SELECT "Id", "DateTime", "SignalRegisterId", "Raw" FROM
> "SignalRecordsBlobs" WHERE "SignalSettingId" = 103 AND "DateTime"
> BETWEEN '2019-11-28T14:00:12.540200000' AND
> '2020-07-23T21:12:32.249000000';
> I’ve run this query a few times to make sure both should be reading
> data from cache.
> I expect my postgres on GPC to be at least similar to the one managed
> by AWS RDS so that I can work on improvements parallelly and compare.
>
> *DETAILS:
> Query explain for Postgres on GCP VM:
> *Bitmap Heap Scan on SignalRecordsBlobs SignalRecordsBlobs
>  (cost=18.80..2480.65 rows=799 width=70) (actual time=216.766..776.032
> rows=5122 loops=1)
>     Filter: (("DateTime" >= \'2019-11-28 14:00:12.5402\'::timestamp
> without time zone) AND ("DateTime" <= \'2020-07-23
> 21:12:32.249\'::timestamp without time zone))
>     Heap Blocks: exact=5223
>     Buffers: shared hit=423 read=4821
>   ->  Bitmap Index Scan on IDX_SignalRecordsBlobs_SignalSettingId
>  (cost=0.00..18.61 rows=824 width=0) (actual time=109.000..109.001
> rows=5228 loops=1)
>           Index Cond: ("SignalSettingId" = 103)
>           Buffers: shared hit=3 read=18
> Planning time: 456.315 ms
> Execution time: 776.976 ms
>
> *Query explain for Postgres on AWS RDS:
> *Bitmap Heap Scan on SignalRecordsBlobs SignalRecordsBlobs
>  (cost=190.02..13204.28 rows=6213 width=69) (actual time=2.215..14.505
> rows=5122 loops=1)
>     Filter: (("DateTime" >= \'2019-11-28 14:00:12.5402\'::timestamp
> without time zone) AND ("DateTime" <= \'2020-07-23
> 21:12:32.249\'::timestamp without time zone))
>     Heap Blocks: exact=5209
>     Buffers: shared hit=3290 read=1948
>   ->  Bitmap Index Scan on IDX_SignalRecordsBlobs_SignalSettingId
>  (cost=0.00..188.46 rows=6405 width=0) (actual time=1.159..1.159
> rows=5228 loops=1)
>           Index Cond: ("SignalSettingId" = 103)
>           Buffers: shared hit=3 read=26
> Planning time: 0.407 ms
> Execution time: 14.87 ms
>
> *PostgreSQL version number running:
> • VM on GCP*: PostgreSQL 11.10 (Debian 11.10-0+deb10u1) on
> x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
> *• Managed by RDS on AWS:* PostgreSQL 11.10 on x86_64-pc-linux-gnu,
> compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
>
> *How PostgreSQL was installed:
> • VM on GCP*: Already installed when created VM running Debian on
> Google Console.
> *• Managed by RDS on AWS:* RDS managed the installation.
>
> *Changes made to the settings in the postgresql.conf file:
> *Here are some postgres parameters that might be useful:
> *Instance on VM on GCP (2 vCPUs, 2 GB memory, 800 GB disk):*
> • effective_cache_size: 1496MB
> • maintenance_work_mem: 255462kB (close to 249MB)
> • max_wal_size: 1GB
> • min_wal_size: 512MB
> • shared_buffers: 510920kB (close to 499MB)
> • max_locks_per_transaction 1000
> • wal_buffers: 15320kB (close to 15MB)
> • work_mem: 2554kB
> • effective_io_concurrency: 200
> • dynamic_shared_memory_type: posix
> On this instance we installed a postgres extension called timescaledb
> to gain performance on other tables. Some of these parameters were set
> using recommendations from that extension.
>
> *Instance managed by RDS (2 vCPUs, 2 GiB RAM, 250GB disk, 750 de IOPS):*
> • effective_cache_size: 1887792kB (close to 1844MB)
> • maintenance_work_mem: 64MB
> • max_wal_size: 2GB
> • min_wal_size: 192MB
> • shared_buffers: 943896kB (close to 922MB)
> • max_locks_per_transaction 64
>
> *Operating system and version by runing "uname -a":
> • VM on GCP:* Linux {{{my instance name}}} 4.19.0-14-cloud-amd64 #1
> SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux
> *• Managed by AWS RDS:* Aparently Red Hay as shown using SELECT version();
>
> *Program used to connect to PostgreSQL:* Python psycopg2.connect() to
> create the connection and pandas read_sql_query() to query using that
> connection.
>
> Thanks in advance

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Milos Babic 2021-02-24 13:15:36 Re: Postgres performance comparing GCP and AWS
Previous Message Maurici Meneghetti 2021-02-23 18:12:19 Postgres performance comparing GCP and AWS