Re: Postgres performance comparing GCP and AWS

From: Hannu Krosing <hannuk(at)google(dot)com>
To: Justin Pitts <justinpitts(at)gmail(dot)com>
Cc: Philip Semanchuk <philip(at)americanefficient(dot)com>, Igor Gois <igor(at)bixtecnologia(dot)com(dot)br>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Maurici Meneghetti <maurici(dot)meneghetti(at)bixtecnologia(dot)com(dot)br>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Postgres performance comparing GCP and AWS
Date: 2021-03-01 16:01:06
Message-ID: CAMT0RQSaw23aYQNVdtn3GeX_gJdjNHTeA=vvNugR+MPoDjh5fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Have you tried to set the instance running on GCP to have similar
shared_buffers as the AWS database ?

What you described has a much lower cache hit rate on GCS and 2X the
shared buffers on AWS which could well explain much of the difference
in execution times.

DETAILS:
Query explain for Postgres on GCP VM:
Buffers: shared hit=423 read=4821

Query explain for Postgres on AWS RDS:
Buffers: shared hit=3290 read=1948

and the configuration :

Instance on VM on GCP (2 vCPUs, 2 GB memory, 800 GB disk):
• shared_buffers: 510920kB (close to 499MB)

Instance managed by RDS (2 vCPUs, 2 GiB RAM, 250GB disk, 750 de IOPS):
• shared_buffers: 943896kB (close to 922MB)

Cheers
Hannu

On Fri, Feb 26, 2021 at 9:16 AM Justin Pitts <justinpitts(at)gmail(dot)com> wrote:
>
> Since this is a comparison to RDS, and the goal presumably is to make the test as even as possible, you will want to pay attention to the network IO capacity for the client and the server in both tests.
>
> For RDS, you will be unable to run the client software locally on the server hardware, so you should plan to do the same for the GCP comparison.
>
> What is the machine size you are using for your RDS instance? Each machine size will specify CPU and RAM along with disk and network IO capacity.
>
> Is your GCP VM where you are running PG ( a GCP VM is the equivalent of an EC2 instance, by the way ) roughly equivalent to that RDS instance?
>
> Finally, is the network topology roughly equivalent? Are you performing these tests with the same region and/or availability zone?
>
>
>
> On Thu, Feb 25, 2021 at 3:32 PM Philip Semanchuk <philip(at)americanefficient(dot)com> wrote:
>>
>>
>>
>> > On Feb 25, 2021, at 4:04 PM, Igor Gois <igor(at)bixtecnologia(dot)com(dot)br> wrote:
>> >
>> > Philip,
>> >
>> > The results in first email in this thread were using explain analyze.
>> >
>> > I thought that you asked to run using only 'explain'. My bad.
>> >
>> > The point is, the execution time with explain analyze is less the 1 second. But the actual execution time (calculated from the python client) is 24 seconds (aws) and 300+ seconds in gcp
>>
>> Oh OK, sorry, I wasn’t following. Yes, network speed sounds like the source of the problem.
>>
>> Under AWS sometimes we log into an EC2 instance if we have to run a query that generates a lot of data so that both server and client are inside AWS. If GCP has something similar to EC2, it might be an interesting experiment to run your query from there and see how much, if any, that changes the time it takes to get results.
>>
>> Hope this helps
>> Philip
>>
>>
>>
>> >
>> > Em qui., 25 de fev. de 2021 às 17:53, Philip Semanchuk <philip(at)americanefficient(dot)com> escreveu:
>> >
>> >
>> > > On Feb 25, 2021, at 3:46 PM, Igor Gois <igor(at)bixtecnologia(dot)com(dot)br> wrote:
>> > >
>> > > Hi, Philip
>> > >
>> > > We ran: EXPLAIN (FORMAT JSON) SELECT "Id", "DateTime", "SignalRegisterId", "Raw" FROM "SignalRecordsBlobs" WHERE "SignalSettingId" = 103 AND "DateTime" BETWEEN '2019-11-28T14:00:12.540200000' AND '2020-07-23T21:12:32.249000000';
>> > >
>> > > but it was really fast. I think the results were discarded.
>> >
>> > EXPLAIN and EXPLAIN ANALYZE are different in an important way. EXPLAIN merely plans the query, EXPLAIN ANALYZE plans *and executes* the query. From the doc —
>> >
>> > "The ANALYZE option causes the statement to be actually executed, not only planned....Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return, other side effects of the statement will happen as usual. “
>> >
>> > https://www.postgresql.org/docs/12/sql-explain.html
>> >
>> >
>> > >
>> > > AWS Execution time select without explain: 24.96505s (calculated in python client)
>> > > AWS Execution time select with explain but without analyze: 0.03876s (calculated in python client)
>> > >
>> > > https://explain.depesz.com/s/5HRO
>> > >
>> > > Thanks in advance
>> > >
>> > >
>> > > Em qui., 25 de fev. de 2021 às 15:13, Philip Semanchuk <philip(at)americanefficient(dot)com> escreveu:
>> > >
>> > >
>> > > > On Feb 24, 2021, at 10:11 AM, Igor Gois <igor(at)bixtecnologia(dot)com(dot)br> wrote:
>> > > >
>> > > > Hi, Julien
>> > > >
>> > > > Your hypothesis about network transfer makes sense. The query returns a big size byte array blobs.
>> > > >
>> > > > Is there a way to test the network speed against the instances? I have access to the network speed in gcp (5 Mb/s), but don't have access in aws rds.
>> > >
>> > > Perhaps what you should run is EXPLAIN ANALYZE SELECT...? My understanding is that EXPLAIN ANALYZE executes the query but discards the results. That doesn’t tell you the network speed of your AWS instance, but it does isolate the query execution speed (which is what I think you’re trying to measure) from the network speed.
>> > >
>> > > Hope this is useful.
>> > >
>> > > Cheers
>> > > Philip
>> > >
>> > > >
>> > > >
>> > > > Em qua., 24 de fev. de 2021 às 10:35, Julien Rouhaud <rjuju123(at)gmail(dot)com> escreveu:
>> > > > Hi,
>> > > >
>> > > > On Wed, Feb 24, 2021 at 6:14 AM Maurici Meneghetti
>> > > > <maurici(dot)meneghetti(at)bixtecnologia(dot)com(dot)br> wrote:
>> > > > >
>> > > > > I have 2 postgres instances created from the same dump (backup), one on a GCP VM and the other on AWS RDS. The first instance takes 18 minutes and the second one takes less than 20s to run this simples query:
>> > > > > SELECT "Id", "DateTime", "SignalRegisterId", "Raw" FROM "SignalRecordsBlobs" WHERE "SignalSettingId" = 103 AND "DateTime" BETWEEN '2019-11-28T14:00:12.540200000' AND '2020-07-23T21:12:32.249000000';
>> > > > > I’ve run this query a few times to make sure both should be reading data from cache.
>> > > > > I expect my postgres on GPC to be at least similar to the one managed by AWS RDS so that I can work on improvements parallelly and compare.
>> > > > >
>> > > > > DETAILS:
>> > > > > [...]
>> > > > > Planning time: 456.315 ms
>> > > > > Execution time: 776.976 ms
>> > > > >
>> > > > > Query explain for Postgres on AWS RDS:
>> > > > > [...]
>> > > > > Planning time: 0.407 ms
>> > > > > Execution time: 14.87 ms
>> > > >
>> > > > Those queries were executed in respectively ~1s and ~15ms (one thing
>> > > > to note is that the slower one had less data in cache, which may or
>> > > > may note account for the difference). Does those plans reflect the
>> > > > reality of your slow executions? If yes it's likely due to quite slow
>> > > > network transfer. Otherwise we would need an explain plan from the
>> > > > slow execution, for which auto_explain can help you. See
>> > > > https://www.postgresql.org/docs/11/auto-explain.html for more details.
>> > > >
>> > > >
>> > > > --
>> > > > Att,
>> > > >
>> > > > Igor Gois | Sócio Consultor
>> > > > (48) 99169-9889 | Skype: igor_msg
>> > > > Site | Blog | LinkedIn | Facebook | Instagram
>> > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Att,
>> > >
>> > > Igor Gois | Sócio Consultor
>> > > (48) 99169-9889 | Skype: igor_msg
>> > > Site | Blog | LinkedIn | Facebook | Instagram
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > Att,
>> >
>> > Igor Gois | Sócio Consultor
>> > (48) 99169-9889 | Skype: igor_msg
>> > Site | Blog | LinkedIn | Facebook | Instagram
>> >
>> >
>>
>>
>>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Hannu Krosing 2021-03-01 16:58:06 Re: Potential performance issues
Previous Message Pavel Stehule 2021-03-01 15:06:16 Re: Potential performance issues