Quick Links

Re: Speed of lo_unlink vs. DELETE on BYTEA

From:	"Reuven M(dot) Lerner" <reuven(at)lerner(dot)co(dot)il>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Speed of lo_unlink vs. DELETE on BYTEA
Date:	2011-09-22 22:38:43
Message-ID:	4E7BB8F3.7020608@lerner.co.il
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

<html style="direction: ltr;">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<style>body p { margin-bottom: 10pt; margin-top: 0pt; } </style>
</head>
<body style="direction: ltr;"
bidimailui-detected-decoding-type="UTF-8" bgcolor="#FFFFFF"
text="#000000">
Hi again, everyone. I'm replying to my own posting, to add some
information: I decided to do some of my own benchmarking. And if my
benchmarks are at all accurate, then I'm left wondering why people
use large objects at all, given their clunky API and their extremely
slow speed. I'm posting my benchmarks as a sanity test, because I'm
blown away by the results. 
 
I basically tried three different scenarios, each with 1,000 and
10,000 records. In each scenario, there was a table named
MasterTable that contained a SERIAL "id" column and a "one_value"
integer column, containing a number from generate_series, and a
second table named SecondaryTable containing its own SERIAL "id"
column, a "one_value" value (from generate_series, identical to the
"id" column, and a "master_value" column that's a foreign key back
to the main table. That is, here's the definition of the tables in
the 10,000-record benchmark: 
 
CREATE TABLE MasterTable ( 
 id SERIAL NOT NULL, 
 one_value INTEGER NOT NULL, 
 
 PRIMARY KEY(id) 
); 
INSERT INTO MasterTable (one_value) values
(generate_series(1,10000)); 
 
CREATE TABLE SecondaryTable ( 
 id SERIAL NOT NULL, 
 one_value INTEGER NOT NULL, 
 master_value INTEGER NOT NULL REFERENCES MasterTable ON
DELETE CASCADE 
 
 PRIMARY KEY(id) 
); 
 
INSERT INTO SecondaryTable (master_value, one_value) 
(SELECT s.a, s.a FROM generate_series(1,10000) AS s(a)); 
 
I also had two other versions of SecondaryTable: In one scenario,
there is a my_blob column, of type BYTEA, containing 5 million 'x"
characters. A final version had a 5-million 'x' character document
loaded into a large object in SecionaryTable. 
 
The idea was simple: I wanted to see how much faster or slower it
was to delete (not truncate) all of the records in MasterTable,
given these different data types. Would bytea be significantly
faster than large objects? How would the cascading delete affect
things? And how long does it take to pg_dump with large objects
around? 
 
Here are the results, which were pretty dramatic. Basically,
pg_dump seems to always be far, far slower than BYTEA columns.
Again, I'm wondering whether I'm doing something wrong here, or if
this explains why in my many years of using PostgreSQL, I've neither
used nor been tempted to use large objects before. 
 
<tt>1.1 1,000 records 
================== 
 
 Delete Dump 
---------------+---------+-------- 
Empty content 0.172s 0.057s 
bytea 0.488s 0.066s 
large object 30.833s 9.275s 
 
 
1.2 10,000 records 
=================== 
 
 Delete Dump 
---------------+-----------+----------- 
Empty content 8.162s 0.064s 
bytea 1m0.417s 0.157s 
large object 4m44.501s 1m38.454s 
</tt> 

Any ideas? If this is true, should we be warning people away from
large objects in the documentation, and toward bytea? 
 
Reuven 
</body>
</html>

Attachment	Content-Type	Size
unknown_filename	text/html	4.0 KB

In response to

Speed of lo_unlink vs. DELETE on BYTEA at 2011-09-22 12:26:38 from Reuven M. Lerner

Responses

Re: Speed of lo_unlink vs. DELETE on BYTEA at 2011-09-23 18:12:55 from Daniel Verite

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tim Landscheidt	2011-09-22 22:43:43	Re: Quick Date/Time Index Question
Previous Message	Tom Lane	2011-09-22 22:35:19	Re: Statistics collector failure messages on startup