PostgreSQL runs a query much slower than BDE and MySQL

From: "Peter Hardman" <peter(at)ssbg(dot)zetnet(dot)co(dot)uk>
To: pgsql-performance(at)postgresql(dot)org
Subject: PostgreSQL runs a query much slower than BDE and MySQL
Date: 2006-08-16 16:48:13
Message-ID: 44E35A5D.19933.35F376@peter.ssbg.zetnet.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm in the process of migrating a Paradox 7/BDE 5.01 database from single-user
Paradox to a web based interface to either MySQL or PostgreSQL.
The database is a pedigree sheep breed society database recording sheep and
flocks (amongst other things).

My current problem is with one table and an associated query which takes 10
times longer to execute on PostgreSQL than BDE, which in turn takes 10 times
longer than MySQL. The table links sheep to flocks and is created as follows:

CREATE TABLE SHEEP_FLOCK
(
regn_no varchar(7) NOT NULL,
flock_no varchar(6) NOT NULL,
transfer_date date NOT NULL,
last_changed date NOT NULL,
CONSTRAINT SHEEP_FLOCK_pkey PRIMARY KEY (regn_no, flock_no,
transfer_date)
)
WITHOUT OIDS;
ALTER TABLE SHEEP_FLOCK OWNER TO postgres;

I then populate the table with

COPY SHEEP_FLOCK
FROM 'e:/ssbg/devt/devt/export_data/sheep_flock.txt'
WITH CSV HEADER

The table then has about 82000 records

The query I run is:

/* Select all sheep who's most recent transfer was into the subject flock */
SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in
FROM SHEEP_FLOCK f1 JOIN
/* The last transfer date for each sheep */
(SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date
FROM SHEEP_FLOCK f
GROUP BY f.regn_no) f2
ON f1.regn_no = f2.regn_no
WHERE f1.flock_no = '1359'
AND f1.transfer_date = f2.last_xfer_date

The sub-select on it's own returns about 32000 rows.

Using identically structured tables and the same primary key, if I run this on
Paradox/BDE it takes about 120ms, on MySQL (5.0.24, local server) about 3ms,
and on PostgresSQL (8.1.3, local server) about 1290ms). All on the same
Windows XP Pro machine with 512MB ram of which nearly half is free.

The query plan shows most of the time is spent sorting the 30000+ rows from the subquery, so I added a further
subquery as follows:

/* Select all sheep who's most recent transfer was into the subject flock */
SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in
FROM SHEEP_FLOCK f1 JOIN
/* The last transfer date for each sheep */
(SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date
FROM SHEEP_FLOCK f
WHERE f.regn_no IN
/* Limit the rows extracted by the outer sub-query to those relevant to the
subject flock */
/* This typically reduces the time from 1297ms to 47ms - from 35000 rows
to 127 rows */
(SELECT s.regn_no FROM SHEEP_FLOCK s where s.flock_no = '1359')
GROUP BY f.regn_no) f2
ON f1.regn_no = f2.regn_no
WHERE f1.flock_no = '1359'
AND f1.transfer_date = f2.last_xfer_date

then as the comment suggests I get a considerable improvement, but it's still an
order of magnitude slower than MySQL.

Can anyone suggest why PostgreSQL performs the original query so much slower than even BDE?
--
Peter Hardman
Acre Cottage, Horsebridge
King's Somborne
Stockbridge
SO20 6PT

== Breeder of Shetland Cattle and Shetland Sheep ==

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Arjen van der Meijden 2006-08-16 18:02:24 Re: PostgreSQL runs a query much slower than BDE and MySQL
Previous Message Carl Youngblood 2006-08-16 15:34:24 Re: Beginner optimization questions, esp. regarding Tsearch2