From: | rob stone <floriparob(at)gmail(dot)com> |
---|---|
To: | Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: More correlated (?) index woes |
Date: | 2016-03-28 21:01:04 |
Message-ID: | 1459198864.4165.7.camel@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, 2016-03-28 at 20:23 +0100, Geoff Winkless wrote:
> So I accept that when using MIN(sc_id) against scdate it makes
> statistical sense to use the sc_id index for a reasonable percentage
> of the full range of scdate, unless we know in advance that scdate is
> closely correlated to sc_id (because using MIN means we can stop
> immediately we hit a value).
>
> However I'm now finding a similar problem when using a multi-table
> DELETE, where the same obviously can't apply.
>
> This query:
>
> DELETE FROM pa USING legs WHERE legs.scdate BETWEEN 20160220 AND
> 20160222 AND legs.sc_id=pa.sc_id;
>
> does what one would hope:
>
> Delete on pa (cost=0.99..705406.24 rows=36570 width=12)
> -> Nested Loop (cost=0.99..705406.24 rows=36570 width=12)
> -> Index Scan using legs_scdate_idx on legs
> (cost=0.43..1171.88 rows=14458 width=10)
> Index Cond: ((scdate >= 20160220) AND (scdate <=
> 20160222))
> -> Index Scan using pa_pkey on pa (cost=0.56..48.33
> rows=38 width=10)
> Index Cond: (sc_id = legs.sc_id)
>
>
> However as soon as I add an extra test for field1 IS NULL, it
> apparently goes insane:
>
> Delete on pa (cost=577260.90..626296.46 rows=23732 width=12)
> (actual time=41870.770..41870.770 rows=0 loops=1)
> -> Hash Join (cost=577260.90..626296.46 rows=23732 width=12)
> (actual time=37886.396..41315.668 rows=44960 loops=1)
> Hash Cond: (legs.sc_id = pa.sc_id)
> -> Index Scan using legs_scdate_idx on legs
> (cost=0.43..1171.88 rows=14458 width=10) (actual time=0.030..13.667
> rows=21281 loops=1)
> Index Cond: ((scdate >= 20160220) AND (scdate <=
> 20160222))
> -> Hash (cost=481691.12..481691.12 rows=5497868 width=10)
> (actual time=37805.756..37805.756 rows=4875870 loops=1)
> Buckets: 131072 Batches: 64 Memory Usage: 4311kB
> -> Seq Scan on pa (cost=0.00..481691.12 rows=5497868
> width=10) (actual time=0.008..35869.304 rows=4875870 loops=1)
> Filter: (field1 IS NULL)
> Rows Removed by Filter: 2688634
> Planning time: 0.447 ms
> Execution time: 41870.832 ms
>
> Running ANALYZE makes no difference.
>
> Table pa has 7522676 rows, 4834042 of which have field1 NULL, so it's
> absolutely not reasonable to expect this to be an optimal strategy.
>
> Any suggestions as to how I can improve this query?
>
> Thanks :)
>
> Geoff
What does:-
DELETE FROM pa
WHERE pa.field1 IS NULL
AND pa.sc_id IN (SELECT legs.sc_id FROM legs
WHERE legs.scdate BETWEEN 20160220 AND > 20160222)
give as a cost when you run ANALYZE over it?
HTH
Rob
From | Date | Subject | |
---|---|---|---|
Next Message | Geoff Winkless | 2016-03-28 21:23:36 | Re: More correlated (?) index woes |
Previous Message | Rob Sargent | 2016-03-28 20:59:14 | Re: Table size for partitioned setup |