Re: Recheck condition

From: "Josh Harrison" <joshques(at)gmail(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Josh Harrison" <joshques(at)gmail(dot)com>, "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, "Postgresql General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Recheck condition
Date: 2007-11-30 14:37:32
Message-ID: 8d89ea1d0711300637j5c567847vece5299ff6f8e8fc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
>
> > > > *Query1*
> > > > SELECT person_id FROM person WHERE (column1=1 AND column2='62')
> > > > INTERSECT
> > > > SELECT person_id FROM person WHERE (column1=1 AND column2='189')
>
> > I get the same plan(see below) with 'sort' for 'intersect all'
> operation
> > too. Why is intersect not an effecient way? Is there any other way this
> > query/index can be written/created so that I can get the intersect
> results
> > in an efficient way?
>
> Set operations are rather inefficient. To find the intersection of two
> arbitrary sets you need to sort them and compare. A query like you
> write would be better expressed as a join, something like:
>
> SELECT a.person_id
> FROM (SELECT person_id FROM person WHERE (column1=1 AND column2='62')
> a,
> (SELECT person_id FROM person WHERE (column1=1 AND column2='189') b
> WHERE a.person_id = b.person_id;
>
> or perhaps:
>
> SELECT a.person_id
> FROM person a, person b
> WHERE a.column1=1 AND a.column2='62'
> AND b.column1=1 AND b.column2='189'
> AND a.person_id = b.person_id;
>
> Which will probably generate a merge join...
>

Thanks. But this query seems to be more expensive than using intersect
operator.
This is the explain analyse plan for this query. It took 5 1/2 minutes to
generate this. I also tried to disable the mergejoin and in that case it
uses hash join and still takes more than 3 minutes (intersect took only 40
sec)

QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Merge Join (cost=1610648.92..10280119.99 rows=577856095 width=4) (actual
time=30562.630..264534.677 rows=225145385
loops=1)
Merge Cond: (a.patient_id = b.patient_id)

-> Sort (cost=527974.81..528929.79 rows=381993 width=4) (actual time=
3755.361..3845.134 rows=213435
loops=1)
Sort Key: a.patient_id

Sort Method: quicksort Memory:
15868kB

-> Bitmap Heap Scan on clinical_variable2 a (cost=
17886.42..492557.97 rows=381993 width=4) (actual
time=315.753..3410.366rows=327498 loops=1)
Recheck Cond: ((top_parent_service_sys_id = 1) AND
((top_parent_service_code)::text =
'62'::text))
-> Bitmap Index Scan on clinical_variable_idx_topserv
(cost=0.00..17790.92 rows=381993 width=0) (actual
time=277.185..277.185rows=327498 loops=1)
Index Cond: ((top_parent_service_sys_id = 1) AND
((top_parent_service_code)::text =
'62'::text))
-> Materialize (cost=1082674.11..1124546.38 rows=3349781 width=4)
(actual time=26807.248..99885.620 rows=225148250
loops=1)
-> Sort (cost=1082674.11..1091048.57 rows=3349781 width=4)
(actual time=26807.238..30343.870 rows=3429228
loops=1)
Sort Key: b.patient_id

Sort Method: external merge Disk:
53552kB

-> Bitmap Heap Scan on clinical_variable2 b (cost=
156754.24..679555.96 rows=3349781 width=4) (actual
time=2744.126..20106.160rows=3429228 loops=1)
Recheck Cond: ((top_parent_service_sys_id = 1) AND
((top_parent_service_code)::text =
'189'::text))
-> Bitmap Index Scan on clinical_variable_idx_topserv
(cost=0.00..155916.80 rows=3349781 width=0) (actual
time=2686.456..2686.456rows=3429228 loops=1)
Index Cond: ((top_parent_service_sys_id = 1) AND
((top_parent_service_code)::text =
'189'::text))
Total runtime: 324646.035ms

18 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms] a: 0/ms]

Is there any other way you can think of to solve this problem. May be
creating the indexes in a different way or something?

Thanks
jo

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ivan Sergio Borgonovo 2007-11-30 14:38:33 Re: postgresql table inheritance
Previous Message Andrew Sullivan 2007-11-30 14:36:50 Re: PostgresSQL vs Ingress