From: | "CK Tan" <cktan(at)greenplum(dot)com> |
---|---|
To: | "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> |
Cc: | "Luke Lonergan" <LLonergan(at)greenplum(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Simon Riggs" <simon(at)enterprisedb(dot)com> |
Subject: | Re: Seq scans roadmap |
Date: | 2007-05-10 03:52:24 |
Message-ID: | 30E8D12C-C5C1-48DA-BF06-08353C398C35@greenplum.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In reference to the seq scans roadmap, I have just submitted a patch
that addresses some of the concerns.
The patch does this:
1. for small relation (smaller than 60% of bufferpool), use the
current logic
2. for big relation:
- use a ring buffer in heap scan
- pin first 12 pages when scan starts
- on consumption of every 4-page, read and pin the next 4-page
- invalidate used pages of in the scan so they do not force out
other useful pages
4 files changed:
bufmgr.c, bufmgr.h, heapam.c, relscan.h
If there are interests, I can submit another scan patch that returns
N tuples at a time, instead of current one-at-a-time interface. This
improves code locality and further improve performance by another
10-20%.
For TPCH 1G tables, we are seeing more than 20% improvement in scans
on the same hardware.
------------------------------------------------------------------------
-
----- PATCHED VERSION
------------------------------------------------------------------------
-
gptest=# select count(*) from lineitem;
count
---------
6001215
(1 row)
Time: 2117.025 ms
------------------------------------------------------------------------
-
----- ORIGINAL CVS HEAD VERSION
------------------------------------------------------------------------
-
gptest=# select count(*) from lineitem;
count
---------
6001215
(1 row)
Time: 2722.441 ms
Suggestions for improvement are welcome.
Regards,
-cktan
Greenplum, Inc.
On May 8, 2007, at 5:57 AM, Heikki Linnakangas wrote:
> Luke Lonergan wrote:
>>> What do you mean with using readahead inside the heapscan?
>>> Starting an async read request?
>> Nope - just reading N buffers ahead for seqscans. Subsequent
>> calls use
>> previously read pages. The objective is to issue contiguous reads to
>> the OS in sizes greater than the PG page size (which is much smaller
>> than what is needed for fast sequential I/O).
>
> Are you filling multiple buffers in the buffer cache with a single
> read-call? The OS should be doing readahead for us anyway, so I
> don't see how just issuing multiple ReadBuffers one after each
> other helps.
>
>> Yes, I think the ring buffer strategy should be used when the
>> table size
>> is > 1 x bufcache and the ring buffer should be of a fixed size
>> smaller
>> than L2 cache (32KB - 128KB seems to work well).
>
> I think we want to let the ring grow larger than that for updating
> transactions and vacuums, though, to avoid the WAL flush problem.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-05-10 04:20:46 | Re: Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory |
Previous Message | Alvaro Herrera | 2007-05-10 02:09:53 | Re: Implemented current_query |