From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | dg(at)illustra(dot)com (David Gould) |
Cc: | scrappy(at)hub(dot)org, chris(at)topdog(dot)pas1(dot)logicon(dot)com, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables? |
Date: | 1998-03-12 21:13:53 |
Message-ID: | 199803122113.QAA06519@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> At least on the systems I am intimately familiar with, the prefetch that the
> OS does (assuming a modern OS like Linux) is pretty hard to beat. If you have
> a table that was bulk loaded in key order, a sequential scan is going to
> result in a sequential access pattern to the underlying file and the OS
> prefetch does the right thing. If you have an unindexed table with rows
> inserted at the end, the OS prefetch still works. If you are using a secondary
> index on some sort of chopped up table with rows inserted willy-nilly, it
> then, it may be worth doing async reads in a burst and let the disk request
> sort make the best of it.
>
> As far as I am aware, Postgres does not do async I/O. Perhaps it should.
I am adding this to the TODO list:
* Do async I/O to do better read-ahead of data
Because we are not threaded, we really can't do anything else while we
are waiting for I/O, but we can pre-request data we know we will need.
>
> > Also nice so you can control what gets written to disk/fsync'ed and what doesn't
> > get fsync'ed.
>
> This is really the big win.
Yep, and this is what we are trying to work around in our buffered
pg_log change. Because we have the transaction ids all compact in one
place, this seems like a workable solution to our lack of write-to-disk
control. We just control the pg_log writes.
>
> > Our idea is to control when pg_log gets written to disk. We keep active
> > pg_log pages in shared memory, and every 30-60 seconds, we make a memory
> > copy of the current pg_log active pages, do a system sync() (which
> > happens anyway at that interval), update the pg_log file with the saved
> > changes, and fsync() the pg_log pages to disk. That way, after a crash,
> > the current database only shows transactions as committed where we are
> > sure all the data has made it to disk.
>
> OK as far as it goes, but probably bad for concurrancy if I have understood
> you.
Interesed in hearing your comments.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 1998-03-12 21:15:54 | Re: SCO vs. the monster macro |
Previous Message | David Gould | 1998-03-12 21:12:41 | Re: [HACKERS] SCO vs. the monster macro |