Re: Request for help with slow query

From: salah jubeh <s_jubeh(at)yahoo(dot)com>
To: "Woolcock, Sean" <Sean(dot)Woolcock(at)emc(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Request for help with slow query
Date: 2012-10-29 19:18:19
Message-ID: 1351538299.55595.YahooMailNeo@web122203.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Did you try to add an index on filesystem_id

________________________________
From: "Woolcock, Sean" <Sean(dot)Woolcock(at)emc(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Sent: Monday, October 29, 2012 6:41 PM
Subject: [PERFORM] Request for help with slow query

Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.

A description of what you are trying to achieve and what results you expect:
    I have a large (3 million row) table called "tape" that represents files,
    which I join to a small (100 row) table called "filesystem" that represents
    filesystems.  I have a web interface that allows you to sort by a number of
    fields in the tape table and view the results 100 at a time (using LIMIT
    and OFFSET).

    The data only changes hourly and I do a "vacuum analyze" after all changes.

    The tables are defined as:

        create table filesystem (
            id          serial primary key,
            host        varchar(256),
            storage_path varchar(2048) not null check (storage_path != ''),
            mounted_on  varchar(2048) not null check (mounted_on != ''),
            constraint  unique_fs unique(host, storage_path)
        );
        create table tape (
            id              serial primary key,
            volser          char(255) not null check (volser != ''),
            path            varchar(2048)  not null check (path != ''),
            scratched        boolean        not null default FALSE,
            last_write_date  timestamp      not null default current_timestamp,
            last_access_date timestamp      not null default current_timestamp,
            filesystem_id    integer references filesystem not null,
            size            bigint not null check (size >= 0),
            worm_status      char,
            encryption      char,
            job_name        char(8),
            job_step        char(8),
            dsname          char(17),
            recfm            char(3),
            block_size      int,
            lrecl            int,
            constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
        );

    An example query that's running slowly for me is:

        select tape.volser,
              tape.path,
              tape.scratched,
              tape.size,
              extract(epoch from tape.last_write_date) as last_write_date,
              extract(epoch from tape.last_access_date) as last_access_date
            from tape
            inner join filesystem
                on (tape.filesystem_id = filesystem.id)
            order by last_write_date desc
            limit 100
            offset 100;

    On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.

    Here's the explain output:
                                                                    QUERY PLAN
        ---------------------------------------------------------------------------------------------------------------------------------------
        Limit  (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
        ->  Sort  (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1)
                Sort Key: date_part('epoch'::text, tape.last_write_date)
                ->  Hash Join  (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1)
                    Hash Cond: ("outer".filesystem_id = "inner".id)
                    ->  Seq Scan on tape  (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1)
                    ->  Hash  (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
                            ->  Seq Scan on filesystem  (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1)
        Total runtime: 66553.643 ms

    Here's a depesz link with that output: http://explain.depesz.com/s/AUR

Things I've tried:

    1. I added an index on last_write_date with:

            create index tape_last_write_date_idx on tape(last_write_date);

      and there was no improvement in query time.

    2. I bumped:
            effective_cache_size to 1/2 system RAM (1GB)
            shared_buffers to 1/4 system RAM (512MB)
            work_mem to 10MB
      and there was no improvement in query time.

    3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
      using the same hardware and it was about 5 times faster (nice work,
      whoever did that!).  Unfortunately upgrading is not an option, so this
      is more of an anecdote. I would think the query could go much faster
      in either environment with some optimization.

The EXACT PostgreSQL version you are running:
    PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)

How you installed PostgreSQL:
    Standard SuSE SLES 10-SP3 RPMs:
        postgresql-devel-8.1.17-0.3
        postgresql-pl-8.1.17-0.4
        postgresql-libs-8.1.17-0.3
        postgresql-8.1.17-0.3
        postgresql-server-8.1.17-0.3
        postgresql-contrib-8.1.17-0.3

Changes made to the settings in the postgresql.conf file:
    Only the memory changes mentioned above.

Operating system and version:
    Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux

    SLES 10-SP3

What program you're using to connect to PostgreSQL:
    Perl DBI
    Perl v5.8.8

What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
    perl-DBD-Pg 1.43

If you're using a connection pool, load balancer or application server, which one you're using and its version:
    None.

Is there anything remotely unusual in the PostgreSQL server logs?
    No, they're empty.

CPU manufacturer and model:
    Intel Celeron CPU 440 @ 2.00GHz

Amount and size of RAM installed:
    2GB RAM

Storage details (important for performance and corruption questions):

    Do you use a RAID controller?
        No.
    How many hard disks are connected to the system and what types are they?
        We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
    How are your disks arranged for storage?
        Postgres lives on the same 100GB ext3 partition as the OS.

Thanks,
Sean

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Woolcock, Sean 2012-10-29 19:25:56 Re: Request for help with slow query
Previous Message Kevin Grittner 2012-10-29 18:51:14 Re: Setting Statistics on Functional Indexes