Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: dg(at)illustra(dot)com (David Gould)
Cc: scrappy(at)hub(dot)org, chris(at)topdog(dot)pas1(dot)logicon(dot)com, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?
Date: 1998-03-12 21:13:53
Message-ID: 199803122113.QAA06519@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> At least on the systems I am intimately familiar with, the prefetch that the
> OS does (assuming a modern OS like Linux) is pretty hard to beat. If you have
> a table that was bulk loaded in key order, a sequential scan is going to
> result in a sequential access pattern to the underlying file and the OS
> prefetch does the right thing. If you have an unindexed table with rows
> inserted at the end, the OS prefetch still works. If you are using a secondary
> index on some sort of chopped up table with rows inserted willy-nilly, it
> then, it may be worth doing async reads in a burst and let the disk request
> sort make the best of it.
>
> As far as I am aware, Postgres does not do async I/O. Perhaps it should.

I am adding this to the TODO list:

* Do async I/O to do better read-ahead of data

Because we are not threaded, we really can't do anything else while we
are waiting for I/O, but we can pre-request data we know we will need.

>
> > Also nice so you can control what gets written to disk/fsync'ed and what doesn't
> > get fsync'ed.
>
> This is really the big win.

Yep, and this is what we are trying to work around in our buffered
pg_log change. Because we have the transaction ids all compact in one
place, this seems like a workable solution to our lack of write-to-disk
control. We just control the pg_log writes.

>
> > Our idea is to control when pg_log gets written to disk. We keep active
> > pg_log pages in shared memory, and every 30-60 seconds, we make a memory
> > copy of the current pg_log active pages, do a system sync() (which
> > happens anyway at that interval), update the pg_log file with the saved
> > changes, and fsync() the pg_log pages to disk. That way, after a crash,
> > the current database only shows transactions as committed where we are
> > sure all the data has made it to disk.
>
> OK as far as it goes, but probably bad for concurrancy if I have understood
> you.

Interesed in hearing your comments.

--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1998-03-12 21:15:54 Re: SCO vs. the monster macro
Previous Message David Gould 1998-03-12 21:12:41 Re: [HACKERS] SCO vs. the monster macro