Quick Links

Re: Parallel Select query performance and shared buffers

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Metin Doslu <metin(at)citusdata(dot)com>
Cc:	postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Parallel Select query performance and shared buffers
Date:	2013-12-10 20:03:12
Message-ID:	CAHyXU0yZYRgWCUmePrVQxAgSe1cWhs92ZcuKTg+1QxO6s704OA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Wed, Nov 27, 2013 at 2:28 AM, Metin Doslu <metin(at)citusdata(dot)com> wrote:
> We have several independent tables on a multi-core machine serving Select
> queries. These tables fit into memory; and each Select queries goes over one
> table's pages sequentially. In this experiment, there are no indexes or
> table joins.
>
> When we send concurrent Select queries to these tables, query performance
> doesn't scale out with the number of CPU cores. We find that complex Select
> queries scale out better than simpler ones. We also find that increasing the
> block size from 8 KB to 32 KB, or increasing shared_buffers to include the
> working set mitigates the problem to some extent.
>
> For our experiments, we chose an 8-core machine with 68 GB of memory from
> Amazon's EC2 service. We installed PostgreSQL 9.3.1 on the instance, and set
> shared_buffers to 4 GB.
>
> We then generated 1, 2, 4, and 8 separate tables using the data generator
> from the industry standard TPC-H benchmark. Each table we generated, called
> lineitem-1, lineitem-2, etc., had about 750 MB of data. Next, we sent 1, 2,
> 4, and 8 concurrent Select queries to these tables to observe the scale out
> behavior. Our expectation was that since this machine had 8 cores, our run
> times would stay constant all throughout. Also, we would have expected the
> machine's CPU utilization to go up to 100% at 8 concurrent queries. Neither
> of those assumptions held true.
>
> We found that query run times degraded as we increased the number of
> concurrent Select queries. Also, CPU utilization flattened out at less than
> 50% for the simpler queries. Full results with block size of 8KB are below:
>
> Table select count(*) TPC-H Simple (#6)[2]
> TPC-H Complex (#1)[1]
> 1 Table / 1 query 1.5 s 2.5 s
> 8.4 s
> 2 Tables / 2 queries 1.5 s 2.5 s
> 8.4 s
> 4 Tables / 4 queries 2.0 s 2.9 s
> 8.8 s
> 8 Tables / 8 queries 3.3 s 4.0 s
> 9.6 s
>
> We then increased the block size (BLCKSZ) from 8 KB to 32 KB and recompiled
> PostgreSQL. This change had a positive impact on query completion times.
> Here are the new results with block size of 32 KB:
>
> Table select count(*) TPC-H Simple (#6)[2]
> TPC-H Complex (#1)[1]
> 1 Table / 1 query 1.5 s 2.3 s
> 8.0 s
> 2 Tables / 2 queries 1.5 s 2.3 s
> 8.0 s
> 4 Tables / 4 queries 1.6 s 2.4 s
> 8.1 s
> 8 Tables / 8 queries 1.8 s 2.7 s
> 8.3 s
>
> As a quick side, we also repeated the same experiment on an EC2 instance
> with 16 CPU cores, and found that the scale out behavior became worse there.
> (We also tried increasing the shared_buffers to 30 GB. This change
> completely solved the scaling out problem on this instance type, but hurt
> our performance on the hi1.4xlarge instances.)
>
> Unfortunately, increasing the block size from 8 to 32 KB has other
> implications for some of our customers. Could you help us out with the
> problem here?
>
> What can we do to identify the problem's root cause? Can we work around it?

I'm curious if you have a hardware 8 core or better box laying around
to replicate the test on. I've noticed scaling issues on virtual
platforms also.

I'm guessing you're getting lwlock bounced around on either
BufMappingPartitionLock (more likely) or the BufFreelistLock. Can
you/have you run a build with lock stats enabled?

Also, can I see a typical 'top' during poor scaling count(*) activity?
In particular, what's sys cpu%. I'm guessing it's non trivial.

merlin

In response to

Parallel Select query performance and shared buffers at 2013-11-27 08:28:30 from Metin Doslu

Responses

Re: Parallel Select query performance and shared buffers at 2013-12-10 20:06:11 from Claudio Freire

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Claudio Freire	2013-12-10 20:06:11	Re: Parallel Select query performance and shared buffers
Previous Message	Jeff Janes	2013-12-10 18:19:46	Re: select count(distinct ...) is slower than select distinct in about 5x