Re: performance for high-volume log insertion

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: david(at)lang(dot)hm
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Date: 2009-04-21 02:44:58
Message-ID: 20090421024458.GS8123@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

David,

* david(at)lang(dot)hm (david(at)lang(dot)hm) wrote:
> any idea what sort of difference binary mode would result in?

It depends a great deal on your application..

> currently rsyslog makes use of it's extensive formatting capabilities to
> format a string along the lines of
> $DBformat="insert into table X values ('$timestamp','$msg');"

Is this primairly the commands sent to the database? If so, I don't
think you'll get much by going to binary-mode. The text '$msg' isn't
going to be any different in binary. The '$timestamp' would be, but I'm
guessing you'd have to restructure it some to match the PG binary
timestamp format and while that *would* be a win, I don't think it would
end up being all that much of a win.

> I proposed a 5 variable replacement for this to allow for N log entries
> to be combined into one string to be sent to the database:
>
> DBinit (one-time things like initialinzing prepared statements, etc)
> DBstart (string for the start of a transaction)
> DBjoin (tring to use to join multiple DBitems togeather)
> DBend (string for the end of a transaction)
> DBitem (formatting of a single action )
>
> so you could do something like
>
> DBstart = "insert into table X values"
> DBjoin = ","
> DBend = ";"
> DBitem = "('$timestampe','$msg')"
>
> and it would create a string like #2

Using this textual representation for the DBitem would cause difficulty
for any kind of prepared statement usage (Oracle or PG), and so I would
really recommend getting away from it if possible. Instead, I would
encourage going with the PG (and Oracle, as I recall) structure of
having an array of pointers to the values.

Take a look at the documentation for PQexecParams here:
http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html

(note that you'll want to use PQprepare and PQexecPrepared in the end,
but the detailed documentation is under PQexecParams)

Basically, you would have:

DBnParams = 2;
DBparamValues[0] = ptr to $timestamp
DBparamValues[1] = ptr to $msg

If you just use the text format, you don't actually need anything else
for PG, just pass in NULL for paramTypes, paramLengths, and
paramFormats, and 0 for resultFormat.

Of course, if that's your only structure, then you can just make a C
struct that has those two pointers in it and simplify your API by
passing the struct around.

> this is extremely flexible. I think it can do everything except binary
> mode operations, including copy. It is also pretty database agnostic.

With that DBitem, I'm not sure how you would do copy easily. You'd have
to strip out the params and possibly the comma depending on what you're
doing, and you might have to adjust your escaping (how is that done
today in $msg?). All-in-all, not using prepared queries is just messy
and I would recommend avoiding that, regardless of anything else.

> but people are asking about how to do binary mode, and some were thinking
> that you couldn't do prepared statements in Oracle with a string-based
> interface.

Prepared statements pretty much require that you are able to pass in the
items in a non-string-based way (I don't mean to imply that you can't
use *strings*, you can, but it's 1 string per column). Otherwise,
you've got the whole issue of figuring out where one column ends and the
next begins again, which is half the point of prepared statements.

> so I decided to post here to try and get an idea of (1) how much
> performance would be lost by sticking with strings, and (2) of all the
> various ways of inserting the data, what sort of performance differences
> are we talking about

Sticking with strings, if that's the format that's going to end up in
the database, is fine. That's an orthogonal issue to using prepared
statements though, which you should really do. Once you've converted to
using prepared statements of some kind, and batching together inserts in
larger transactions instead of one insert per transactions, then you can
come back to the question of passing things-which-can-be-binary as
binary (eg, timestamps, integers, floats, doubles, etc) and do some
performance testing to see how much an improvment it will get you.

Thanks,

Stephen

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2009-04-21 03:12:25 Re: performance for high-volume log insertion
Previous Message david 2009-04-21 02:24:22 Re: performance for high-volume log insertion