Re: Request for help with slow query

From: "Woolcock, Sean" <Sean(dot)Woolcock(at)emc(dot)com>
To: salah jubeh <s_jubeh(at)yahoo(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Request for help with slow query
Date: 2012-10-29 19:25:56
Message-ID: 998490E92A198A48BE60F3A9C44CA3F1010A7DCF3903@MX40A.corp.emc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I thought that an index was implicitly created for foreign keys, but I see that that's not true. I've just created one now and re-ran the query but it did not change the query plan or run time.

Thanks,
Sean

________________________________________
From: salah jubeh [s_jubeh(at)yahoo(dot)com]
Sent: Monday, October 29, 2012 3:18 PM
To: Woolcock, Sean; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Request for help with slow query

Did you try to add an index on filesystem_id

________________________________
From: "Woolcock, Sean" <Sean(dot)Woolcock(at)emc(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Sent: Monday, October 29, 2012 6:41 PM
Subject: [PERFORM] Request for help with slow query

Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.

A description of what you are trying to achieve and what results you expect:
I have a large (3 million row) table called "tape" that represents files,
which I join to a small (100 row) table called "filesystem" that represents
filesystems. I have a web interface that allows you to sort by a number of
fields in the tape table and view the results 100 at a time (using LIMIT
and OFFSET).

The data only changes hourly and I do a "vacuum analyze" after all changes.

The tables are defined as:

create table filesystem (
id serial primary key,
host varchar(256),
storage_path varchar(2048) not null check (storage_path != ''),
mounted_on varchar(2048) not null check (mounted_on != ''),
constraint unique_fs unique(host, storage_path)
);
create table tape (
id serial primary key,
volser char(255) not null check (volser != ''),
path varchar(2048) not null check (path != ''),
scratched boolean not null default FALSE,
last_write_date timestamp not null default current_timestamp,
last_access_date timestamp not null default current_timestamp,
filesystem_id integer references filesystem not null,
size bigint not null check (size >= 0),
worm_status char,
encryption char,
job_name char(8),
job_step char(8),
dsname char(17),
recfm char(3),
block_size int,
lrecl int,
constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
);

An example query that's running slowly for me is:

select tape.volser,
tape.path,
tape.scratched,
tape.size,
extract(epoch from tape.last_write_date) as last_write_date,
extract(epoch from tape.last_access_date) as last_access_date
from tape
inner join filesystem
on (tape.filesystem_id = filesystem.id<http://filesystem.id/>)
order by last_write_date desc
limit 100
offset 100;

On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.

Here's the explain output:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
-> Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1)
Sort Key: date_part('epoch'::text, tape.last_write_date)
-> Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1)
Hash Cond: ("outer".filesystem_id = "inner".id)
-> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1)
-> Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
-> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1)
Total runtime: 66553.643 ms

Here's a depesz link with that output: http://explain.depesz.com/s/AUR

Things I've tried:

1. I added an index on last_write_date with:

create index tape_last_write_date_idx on tape(last_write_date);

and there was no improvement in query time.

2. I bumped:
effective_cache_size to 1/2 system RAM (1GB)
shared_buffers to 1/4 system RAM (512MB)
work_mem to 10MB
and there was no improvement in query time.

3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
using the same hardware and it was about 5 times faster (nice work,
whoever did that!). Unfortunately upgrading is not an option, so this
is more of an anecdote. I would think the query could go much faster
in either environment with some optimization.

The EXACT PostgreSQL version you are running:
PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)

How you installed PostgreSQL:
Standard SuSE SLES 10-SP3 RPMs:
postgresql-devel-8.1.17-0.3
postgresql-pl-8.1.17-0.4
postgresql-libs-8.1.17-0.3
postgresql-8.1.17-0.3
postgresql-server-8.1.17-0.3
postgresql-contrib-8.1.17-0.3

Changes made to the settings in the postgresql.conf file:
Only the memory changes mentioned above.

Operating system and version:
Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux

SLES 10-SP3

What program you're using to connect to PostgreSQL:
Perl DBI
Perl v5.8.8

What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
perl-DBD-Pg 1.43

If you're using a connection pool, load balancer or application server, which one you're using and its version:
None.

Is there anything remotely unusual in the PostgreSQL server logs?
No, they're empty.

CPU manufacturer and model:
Intel Celeron CPU 440 @ 2.00GHz

Amount and size of RAM installed:
2GB RAM

Storage details (important for performance and corruption questions):

Do you use a RAID controller?
No.
How many hard disks are connected to the system and what types are they?
We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
How are your disks arranged for storage?
Postgres lives on the same 100GB ext3 partition as the OS.

Thanks,
Sean

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org<mailto:pgsql-performance(at)postgresql(dot)org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2012-10-29 19:32:43 Re: Request for help with slow query
Previous Message salah jubeh 2012-10-29 19:18:19 Re: Request for help with slow query