Re: Poor performance using CTE

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: David Greco <David_Greco(at)harte-hanks(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Poor performance using CTE
Date: 2012-11-20 15:04:13
Message-ID: CAHyXU0xeEVX2P=HaNxPiKayEhkWp3GobgAXCQPrfsqbawXVV-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Nov 13, 2012 at 2:57 PM, David Greco
<David_Greco(at)harte-hanks(dot)com> wrote:
> Have a query using a CTE that is performing very poorly. The equivalent
> query against the same data in an Oracle database runs in under 1 second, in
> Postgres it takes 2000 seconds.
>
>
>
> The smp_pkg.get_invoice_charges queries fedexinvoices for some data and
> normalizes it into a SETOF some record type. It is declared to be STABLE.
> Fedexinvoices consists of about 1.3M rows of medium width. Fedexinvoices.id
> is the primary key on that table, and trim(fedexinvoices.trackno) is indexed
> via the function trim.
>
>
>
> The plan for the equivalent query in Oracle is much smaller and simpler. No
> sequential (or full table) scans on fedexinvoices.
>
>
>
>
>
>
>
> WITH charges as (
>
> SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id)
> charge_info from fedexinvoices fi2
>
> )
>
> select fedexinvoices.* from
>
> fedexinvoices
>
> inner join charges on charges.id = fedexinvoices.id AND
> (charges.charge_info).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS
> CORRECTION')
>
> where
>
> trim(fedexinvoices.trackno)='799159791643'
>
> ;
>
>
>
> Explain Analyze output, I abbreviated some of the column lists for brevity:
>
>
>
> Nested Loop (cost=457380.38..487940.77 rows=1 width=1024) (actual
> time=1978019.858..1978019.858 rows=0 loops=1)
>
> Output: fedexinvoices.id, ………
>
> Join Filter: (fedexinvoices.id = charges.id)
>
> Buffers: shared hit=20387611, temp written=94071
>
> CTE charges
>
> -> Seq Scan on hits.fedexinvoices fi2 (cost=0.00..457380.38
> rows=1350513 width=8) (actual time=0.613..1964632.763 rows=9007863 loops=1)
>
> Output: fi2.id, smp_pkg.get_invoice_charges(fi2.id,
> NULL::character varying)
>
> Buffers: shared hit=20387606
>
> -> Index Scan using fedexinvoices_trim_track_idx on hits.fedexinvoices
> (cost=0.00..5.46 rows=1 width=1024) (actual time=0.024..0.026 rows=1
> loops=1)
>
> Output: fedexinvoices.id, ………
>
> Index Cond: (btrim((fedexinvoices.trackno)::text) =
> '799159791643'::text)
>
> Buffers: shared hit=5
>
> -> CTE Scan on charges (cost=0.00..30386.54 rows=13471 width=8) (actual
> time=1978019.827..1978019.827 rows=0 loops=1)
>
> Output: charges.id, charges.charge_info
>
> Filter: (((charges.charge_info).charge_name)::text = ANY ('{"ADDRESS
> CORRECTION CHARGE","ADDRESS CORRECTION"}'::text[]))
>
> Buffers: shared hit=20387606, temp written=94071
>
> Total runtime: 1978214.743 ms

The problem here is very clear. Oracle is optimizing through the CTE.
PostgreSQL does not do this by design -- CTE's are used as a forced
materialization step.

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2012-11-20 15:10:11 Re: Poor performance using CTE
Previous Message Kevin Grittner 2012-11-20 13:06:09 Re: help on slow query using postgres 8.4