From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Jim C(dot) Nasby" <decibel(at)decibel(dot)org> |
Cc: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Adriaan van Os <postgres(at)microbizz(dot)nl>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: TRUNCATE TABLE |
Date: | 2007-07-17 00:18:18 |
Message-ID: | 25418.1184631498@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
"Jim C. Nasby" <decibel(at)decibel(dot)org> writes:
> On Fri, Jul 13, 2007 at 09:12:34PM +0200, Pavel Stehule wrote:
>> I tested speed difference between TRUNCATE TABLE and DROP TABLE
>> (tested on my notebook ext3 and Linux fedora 7):
> Are you sure you can ignore the added cost of an EXECUTE? I tried the
> following as a test, but my repeatability sucks... :/
The repeatability was sucky for me too, until I turned off autovacuum.
I am not sure why autovac is interfering with truncate more than with
create/drop, but that seems to be what's happening. Note that all the
tables involved are temp, so autovac really shouldn't be touching them
at all, so this is a bit surprising.
[ investigates awhile... ] There is a fairly serious mistake in Pavel's
test script, which is that it is testing 1000 iterations *within a
single transaction*. We do not drop removable tables until end of
transaction, and that means that the actual filesystem effects of drop
or truncate are not being timed by the script. The part that he's
really timing is:
* for DROP: mark a bunch of system catalog tuples as deleted
* for TRUNCATE: create one new, empty disk file, then mark one pg_class
tuple as deleted and insert a replacement one.
Thus the timing issue (at least as exhibited by this script) has nothing
whatever to do with the time to delete a file, but with the time to
create one. Since the part of DROP being timed has probably got no I/O
involved at all (the tuples being touched are almost surely still in
shared buffers), it's unsurprising that it is consistently fast.
I tried strace -T on the backend while running the TRUNCATE script, and
got a smoking gun: most of the open(O_CREAT) calls take only 130 to 150
microseconds, but the tail of the distribution is awful:
0.000186
0.000187
0.000188
0.000190
0.000193
0.000194
0.000204
0.000208
0.000235
0.000265
0.000274
0.000289
0.000357
0.000387
0.000410
0.000434
0.000435
0.000488
0.000563
0.065674
0.583236
Somehow, autovac is doing something that makes the filesystem go nuts
every so often, and take an astonishingly long time to create an empty
file. But autovac itself doesn't create or delete any files, so what's
up here?
Also, I was able to reproduce the variability in timing on HPUX and
Darwin as well as Linux, so we can't put all the blame on ext3.
(I didn't drill down to the strace level on the other two machines,
though, so it's possible that there is a different mechanism at work
there.)
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Marc Mamin | 2007-07-17 14:10:30 | Postgres configuration for 64 CPUs, 128 GB RAM... |
Previous Message | Jim C. Nasby | 2007-07-16 21:39:45 | Re: TRUNCATE TABLE |