Quick Links

Re: More correlated (?) index woes

From:	rob stone <floriparob(at)gmail(dot)com>
To:	Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject:	Re: More correlated (?) index woes
Date:	2016-03-28 21:01:04
Message-ID:	1459198864.4165.7.camel@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Mon, 2016-03-28 at 20:23 +0100, Geoff Winkless wrote:
> So I accept that when using MIN(sc_id) against scdate it makes
> statistical sense to use the sc_id index for a reasonable percentage
> of the full range of scdate, unless we know in advance that scdate is
> closely correlated to sc_id (because using MIN means we can stop
> immediately we hit a value).
>
> However I'm now finding a similar problem when using a multi-table
> DELETE, where the same obviously can't apply.
>
> This query:
>
> DELETE FROM pa USING legs WHERE legs.scdate BETWEEN 20160220 AND
> 20160222 AND legs.sc_id=pa.sc_id;
>
> does what one would hope:
>
> Delete on pa (cost=0.99..705406.24 rows=36570 width=12)
> -> Nested Loop (cost=0.99..705406.24 rows=36570 width=12)
> -> Index Scan using legs_scdate_idx on legs
> (cost=0.43..1171.88 rows=14458 width=10)
> Index Cond: ((scdate >= 20160220) AND (scdate <=
> 20160222))
> -> Index Scan using pa_pkey on pa (cost=0.56..48.33
> rows=38 width=10)
> Index Cond: (sc_id = legs.sc_id)
>
>
> However as soon as I add an extra test for field1 IS NULL, it
> apparently goes insane:
>
> Delete on pa (cost=577260.90..626296.46 rows=23732 width=12)
> (actual time=41870.770..41870.770 rows=0 loops=1)
> -> Hash Join (cost=577260.90..626296.46 rows=23732 width=12)
> (actual time=37886.396..41315.668 rows=44960 loops=1)
> Hash Cond: (legs.sc_id = pa.sc_id)
> -> Index Scan using legs_scdate_idx on legs
> (cost=0.43..1171.88 rows=14458 width=10) (actual time=0.030..13.667
> rows=21281 loops=1)
> Index Cond: ((scdate >= 20160220) AND (scdate <=
> 20160222))
> -> Hash (cost=481691.12..481691.12 rows=5497868 width=10)
> (actual time=37805.756..37805.756 rows=4875870 loops=1)
> Buckets: 131072 Batches: 64 Memory Usage: 4311kB
> -> Seq Scan on pa (cost=0.00..481691.12 rows=5497868
> width=10) (actual time=0.008..35869.304 rows=4875870 loops=1)
> Filter: (field1 IS NULL)
> Rows Removed by Filter: 2688634
> Planning time: 0.447 ms
> Execution time: 41870.832 ms
>
> Running ANALYZE makes no difference.
>
> Table pa has 7522676 rows, 4834042 of which have field1 NULL, so it's
> absolutely not reasonable to expect this to be an optimal strategy.
>
> Any suggestions as to how I can improve this query?
>
> Thanks :)
>
> Geoff

What does:-

DELETE FROM pa
WHERE pa.field1 IS NULL
AND pa.sc_id IN (SELECT legs.sc_id FROM legs
WHERE legs.scdate BETWEEN 20160220 AND > 20160222)

give as a cost when you run ANALYZE over it?

HTH

Rob

In response to

More correlated (?) index woes at 2016-03-28 19:23:29 from Geoff Winkless

Responses

Re: More correlated (?) index woes at 2016-03-28 21:23:36 from Geoff Winkless

Browse pgsql-general by date

	From	Date	Subject
Next Message	Geoff Winkless	2016-03-28 21:23:36	Re: More correlated (?) index woes
Previous Message	Rob Sargent	2016-03-28 20:59:14	Re: Table size for partitioned setup