Quick Links

Re: Parallel Seq Scan vs kernel read ahead

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel Seq Scan vs kernel read ahead
Date:	2020-06-19 02:10:17
Message-ID:	CAApHDvq+mXCDE61qEWHLBCOVxHQMaF1S_Z8vhU_KsvhAowg+5w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, 19 Jun 2020 at 11:34, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Fri, 19 Jun 2020 at 03:26, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Thu, Jun 18, 2020 at 6:15 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > > With a 32TB relation, the code will make the chunk size 16GB. Perhaps
> > > I should change the code to cap that at 1GB.
> >
> > It seems pretty hard to believe there's any significant advantage to a
> > chunk size >1GB, so I would be in favor of that change.
>
> I could certainly make that change. With the standard page size, 1GB
> is 131072 pages and a power of 2. That would change for non-standard
> page sizes, so we'd need to decide if we want to keep the chunk size a
> power of 2, or just cap it exactly at whatever number of pages 1GB is.
>
> I'm not sure how much of a difference it'll make, but I also just want
> to note that synchronous scans can mean we'll start the scan anywhere
> within the table, so capping to 1GB does not mean we read an entire
> extent. It's more likely to span 2 extents.

Here's a patch which caps the maximum chunk size to 131072. If
someone doubles the page size then that'll be 2GB instead of 1GB. I'm
not personally worried about that.

I tested the performance on a Windows 10 laptop using the test case from [1]

Master:

workers=0: Time: 141175.935 ms (02:21.176)
workers=1: Time: 316854.538 ms (05:16.855)
workers=2: Time: 323471.791 ms (05:23.472)
workers=3: Time: 321637.945 ms (05:21.638)
workers=4: Time: 308689.599 ms (05:08.690)
workers=5: Time: 289014.709 ms (04:49.015)
workers=6: Time: 267785.270 ms (04:27.785)
workers=7: Time: 248735.817 ms (04:08.736)

Patched:

workers=0: Time: 155985.204 ms (02:35.985)
workers=1: Time: 112238.741 ms (01:52.239)
workers=2: Time: 105861.813 ms (01:45.862)
workers=3: Time: 91874.311 ms (01:31.874)
workers=4: Time: 92538.646 ms (01:32.539)
workers=5: Time: 93012.902 ms (01:33.013)
workers=6: Time: 94269.076 ms (01:34.269)
workers=7: Time: 90858.458 ms (01:30.858)

David

[1] https://www.postgresql.org/message-id/CAApHDvrfJfYH51_WY-iQqPw8yGR4fDoTxAQKqn%2BSa7NTKEVWtg%40mail.gmail.com

Attachment	Content-Type	Size
bigger_io_chunks_for_parallel_seqscan_v2.patch	application/x-patch	10.2 KB

In response to

Re: Parallel Seq Scan vs kernel read ahead at 2020-06-18 23:34:15 from David Rowley

Responses

Re: Parallel Seq Scan vs kernel read ahead at 2020-06-19 20:00:07 from Robert Haas
Re: Parallel Seq Scan vs kernel read ahead at 2020-06-22 04:54:22 from David Rowley
Re: Parallel Seq Scan vs kernel read ahead at 2020-06-22 21:52:21 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Justin Pryzby	2020-06-19 02:20:01	Re: Missing HashAgg EXPLAIN ANALYZE details for parallel plans
Previous Message	David Rowley	2020-06-19 02:02:29	Re: Missing HashAgg EXPLAIN ANALYZE details for parallel plans