Re: Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "andremachado" <andremachado(at)techforce(dot)com(dot)br>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows)
Date: 2006-05-12 22:19:38
Message-ID: 23876.1147472378@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"andremachado" <andremachado(at)techforce(dot)com(dot)br> writes:
> continuing the saga,
> http://archives.postgresql.org/pgsql-performance/2006-04/msg00558.php ,
> my coleague created a test database with fake data (see below).

Thanks. I played around with this a bit, and got results like these:
original query, 8.1 branch from a couple weeks back: 945 sec
original query, 8.1 branch tip: 184 sec
modified query, 8.1 branch tip: 15 sec

The first differential is because of this patch:
http://archives.postgresql.org/pgsql-committers/2006-04/msg00355.php
viz
Remove the restriction originally coded into
optimize_minmax_aggregates() that MIN/MAX not be converted to
use an index if the query WHERE clause contains any volatile
functions or subplans.

Allowing the max(DEC2.AM_REFERENCIA) subquery to be converted to an
indexscan makes for about a 5X reduction in the number of times the
EXISTS sub-subquery is executed. But the real problem is that Postgres
isn't excessively smart about EXISTS subqueries. I manually changed it
into an IN to get the 15-second runtime: instead of

(select max(DEC2.AM_REFERENCIA) from DECLARACAO DEC2
where DEC2.IN_FOI_RETIFICADA=0 and
exists (select CAD3.ID_CADASTRO from CADASTRO CAD3 where
CAD3.ID_DECLARACAO=DEC2.ID_DECLARACAO and
CAD3.ID_EMPRESA=CADASTRO.ID_EMPRESA ) )

write

(select max(DEC2.AM_REFERENCIA) from DECLARACAO DEC2
where DEC2.IN_FOI_RETIFICADA=0 and DEC2.ID_DECLARACAO in
(select CAD3.ID_DECLARACAO from CADASTRO CAD3 where
CAD3.ID_EMPRESA=CADASTRO.ID_EMPRESA ) )

I'm not clear on how Firebird is managing to do this query in under
a second --- I can believe that they know how to do EXISTS as a join
but it still seems like the subqueries need to be done many thousand
times. I thought maybe they were caching the results of the overall
subquery for specific values of CADASTRO.ID_EMPRESA, but now that I
see your test data, there are several thousand distinct values of
that, so there's not a lot of traction to be gained that way.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message André Felipe Machado 2006-05-13 09:58:02 Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows)
Previous Message Joshua D. Drake 2006-05-12 19:54:11 Re: Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows)