Re: Oddity with view (now with test case)

From: Jim 'Decibel!' Nasby <jnasby(at)cashnetusa(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgresql performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Oddity with view (now with test case)
Date: 2008-11-10 18:36:12
Message-ID: 1E5A4D7D-3B95-440B-A377-B6B9F8768F7A@cashnetusa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Nov 10, 2008, at 12:21 PM, Richard Huxton wrote:
> Jim 'Decibel!' Nasby wrote:
>> On Nov 10, 2008, at 7:06 AM, Tom Lane wrote:
>>> "Jim 'Decibel!' Nasby" <jnasby(at)cashnetusa(dot)com> writes:
>>>> loan_tasks effectively does SELECT * FROM loan_tasks_committed
>>>> UNION
>>>> ALL SELECT * FROM loan_tasks_pending;.
>>>
>>> You seem to have neglected to mention a join or two.
>>
>>
>> Yeah, though I did show them at the end of the message...
>>
>> SELECT true AS "committed", loan_tasks_committed.id, ...,
>> loan_tasks_committed.task_amount
>> FROM loan_tasks_committed
>> UNION ALL
>> SELECT false AS "committed", ltp.id, ..., NULL::"unknown" AS
>> task_amount
>> FROM loan_tasks_pending ltp
>> JOIN loan_task_codes ltc ON ltp.loan_task_code_id = ltc.id;
>>
>> Thing is, there's no data to be had on that side. All of the time is
>> going into the seqscan of loan_tasks_committed. But here's what's
>> really
>> disturbing...
>
>> -> Seq Scan on loan_tasks_committed (cost=0.00..929345.35
>> rows=26112135 width=0) (actual time=0.012..5116.776 rows=26115689
>> loops=1)
>
>> -> Seq Scan on loan_tasks_committed
>> (cost=0.00..929345.35 rows=26112135 width=162) (actual
>> time=0.014..22531.902 rows=26115689 loops=1)
>
> It's the width - the view is fetching all the rows. Is the "true as
> committed" bit confusing it?

Turns out, no. I was just writing up a stand-alone test case and
forgot to include that, but there's still a big difference (note what
I'm pasting is now from HEAD as of a bit ago, but I see the effect on
8.2 as well):

decibel(at)platter(dot)local=# explain analyze select count(*) from a;
QUERY PLAN
------------------------------------------------------------------------
---------------------------------------------
Aggregate (cost=137164.57..137164.58 rows=1 width=0) (actual
time=4320.986..4320.986 rows=1 loops=1)
-> Seq Scan on a (cost=0.00..120542.65 rows=6648765 width=0)
(actual time=0.188..2707.433 rows=9999999 loops=1)
Total runtime: 4321.039 ms
(3 rows)

Time: 4344.158 ms
decibel(at)platter(dot)local=# explain analyze select count(*) from v;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------
Aggregate (cost=270286.52..270286.53 rows=1 width=0) (actual
time=14766.630..14766.630 rows=1 loops=1)
-> Append (cost=0.00..187150.20 rows=6650905 width=36) (actual
time=0.039..12810.073 rows=9999999 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..187030.30
rows=6648765 width=36) (actual time=0.039..10581.367 rows=9999999
loops=1)
-> Seq Scan on a (cost=0.00..120542.65 rows=6648765
width=36) (actual time=0.038..5731.748 rows=9999999 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=37.67..119.90
rows=2140 width=40) (actual time=0.002..0.002 rows=0 loops=1)
-> Hash Join (cost=37.67..98.50 rows=2140 width=40)
(actual time=0.002..0.002 rows=0 loops=1)
Hash Cond: (b.c_id = c.c_id)
-> Seq Scan on b (cost=0.00..31.40 rows=2140
width=8) (actual time=0.000..0.000 rows=0 loops=1)
-> Hash (cost=22.30..22.30 rows=1230
width=36) (never executed)
-> Seq Scan on c (cost=0.00..22.30
rows=1230 width=36) (never executed)
Total runtime: 14766.784 ms
(11 rows)

Time: 14767.550 ms

In 8.2, it took 20 seconds to go through the view:

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------
Aggregate (cost=303960.98..303960.99 rows=1 width=0) (actual
time=20268.877..20268.877 rows=1 loops=1)
-> Append (cost=0.00..211578.98 rows=7390560 width=40) (actual
time=0.038..17112.190 rows=9999999 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..211467.40
rows=7388620 width=36) (actual time=0.038..13973.782 rows=9999999
loops=1)
-> Seq Scan on a (cost=0.00..137581.20 rows=7388620
width=36) (actual time=0.037..8280.204 rows=9999999 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=36.10..111.58
rows=1940 width=40) (actual time=0.003..0.003 rows=0 loops=1)
-> Hash Join (cost=36.10..92.18 rows=1940 width=40)
(actual time=0.002..0.002 rows=0 loops=1)
Hash Cond: (b.c_id = c.c_id)
-> Seq Scan on b (cost=0.00..29.40 rows=1940
width=8) (actual time=0.000..0.000 rows=0 loops=1)
-> Hash (cost=21.60..21.60 rows=1160
width=36) (never executed)
-> Seq Scan on c (cost=0.00..21.60
rows=1160 width=36) (never executed)
Total runtime: 20269.333 ms
(11 rows)

The results for 8.3 are similar to HEAD.

Here's the commands to generate the test case:

create table a(a int, b text default 'test text');
create table c(c_id serial primary key, c_text text);
insert into c(c_text) values('a'),('b'),('c');
create table b(a int, c_id int references c(c_id));
create view v as select a, b, null as c_id, null as c_text from a
union all select a, null, b.c_id, c_text from b join c on (b.c_id=
c.c_id);
\timing
insert into a(a) select generate_series(1,9999999);
select count(*) from a;
select count(*) from v;
explain analyze select count(*) from a;
explain analyze select count(*) from v;
--
Decibel! jnasby(at)cashnetusa(dot)com (512) 569-9461

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2008-11-10 19:31:39 Re: Oddity with view (now with test case)
Previous Message Tom Lane 2008-11-10 18:31:38 Re: Oddity with view