Re: hackers-digest V1 #771 (safe/fast I/O)

From: Paul A Vixie <paul(at)vix(dot)com>
To: pgsql-hackers(at)hub(dot)org
Subject: Re: hackers-digest V1 #771 (safe/fast I/O)
Date: 1998-04-12 23:38:02
Message-ID: 199804122338.QAA03476@wisdom.rc.vix.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

mmap() is cool since it avoids copying data between kernel and user address
spaces. However, mmap() is going to be either synchronous ("won't return 'til
it has set up the page table stuff and maybe allocated backing store") or not
("will return immediately but your process will silently block if you try to
access the address range before the back office work is done for the region").
There is no callback facility and no way to poll for region readiness.

aio_*() is cool since you can queue a read or write and then either get a
callback when it's complete or poll it. However, there's no way to allocate
the backing store before you start scribbling, so there is always a copy on
aio_write(). And there's no page flipping in aio_read()'s definition, so
unless you allocate your read buffers in page boundaries and unless your
kernel is really smart, you're always going to see a copy in aio_read().

O_ASYNC and select() are only useful for externally synchronized I/O like
TTY and network. select() always returns both readable and writable for
either files in a file system or for block or character special disk files.

As far as I know, other than on the MASSCOMP (which more or less did what
VMS did and what Win/NT now does in this area), no UNIX system, especially
including POSIX.1B systems, has quite what's wanted for high performance
transactional I/O.

True asynchrony means having the ability to choose when to block, and to
parallize computation with I/O, and to get more total work done per unit time
by doing application level seek ordering and write buffering (avoiding excess
mechanical movement). In the last I/O intensive system I helped build here,
we decided that mmap(), even with its periodic time losses, gave us better
total throughput due to the lack of copy overhead. It helps if you both mmap
things with a lot of regionality, and access them with high locality of
reference. But it was the savings of memory bus bandwidth that bought us
the most.

#ifndef BUFFER_H
#define BUFFER_H

#include <stdio.h>
#include "misc.h"

#define BUF_SIZE 4096

typedef struct buffer {
void * opaque;
} buffer;

typedef enum bufprot {
buf_ro,
buf_rw
/* Note that there is no buf_wo since RMW is the processor standard. */
} bufprot;

int buf_init(int nmax, int grow);
int buf_shutdown(FILE *);
int buf_get(buffer *);
int buf_mget(buffer *, int, off_t, bufprot);
int buf_refcount(buffer);
void buf_ref(buffer);
void buf_unref(buffer);
void buf_clear(buffer);
void buf_add(buffer, size_t);
void buf_sub(buffer, size_t);
void buf_shift(buffer, size_t);
size_t buf_used(buffer);
size_t buf_avail(buffer);
void * buf_used_ptr(buffer);
void * buf_avail_ptr(buffer);
struct iovec buf_used_iov(buffer);
struct iovec buf_avail_iov(buffer);
region buf_used_reg(buffer);
region buf_avail_reg(buffer);
int buf_printf(buffer, const char *, ...);

#endif /* !BUFFER_H */

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ryan Kirkpatrick 1998-04-12 23:44:19 Re: [HACKERS] Linux/Alpha and pgsql....
Previous Message David Gould 1998-04-12 19:25:24 Re: [HACKERS] Safe/Fast I/O ...