Re: workaround for column cross-correlation

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: workaround for column cross-correlation
Date: 2017-06-13 04:40:08
Message-ID: 20170613044008.GC4653@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jun 12, 2017 at 08:46:57PM -0700, Jeff Janes wrote:
> On Mon, Jun 12, 2017 at 8:17 PM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> > I know PG 10 will have support "CREATE STATISTICS.." for this..
> >
> > ..but I wondered if there's a recommended workaround in earlier versions ?
>
> Not without seeing the query....

With my workaround:
ts=# explain ANALYZE SELECT t1.sect_id, t1.start_time as period, sum (1) FROM enodeb_ncell_view t1, enodeb_ncell_view inc
WHERE ((t1.start_time >= '2017-04-30 00:00:00' AND t1.start_time < '2017-05-01 00:00:00'))
AND ((inc.start_time >= '2017-04-30 00:00:00' AND inc.start_time < '2017-05-01 00:00:00'))
AND t1.start_time = inc.start_time
AND ROW((t1.sect_id,t1.neigh_sect_id))= ROW((inc.neigh_sect_id,inc.sect_id))
GROUP BY t1.sect_id, period;
HashAggregate (cost=63149.59..63371.74 rows=22215 width=10) (actual time=80092.652..80097.521 rows=22464 loops=1)
...

Without:
ts=# explain ANALYZE SELECT t1.sect_id, t1.start_time as period, sum (1) FROM enodeb_ncell_view t1, enodeb_ncell_view inc
WHERE ((t1.start_time >= '2017-04-30 00:00:00' AND t1.start_time < '2017-05-01 00:00:00'))
AND ((inc.start_time >= '2017-04-30 00:00:00' AND inc.start_time < '2017-05-01 00:00:00'))
AND t1.start_time = inc.start_time
AND t1.sect_id=inc.neigh_sect_id AND t1.neigh_sect_id=inc.sect_id
GROUP BY t1.sect_id, period;
GroupAggregate (cost=57847.32..62265.54 rows=402 width=10) (actual time=694.186..952.744 rows=22464 loops=1)
...

This is a small inner subquery of a larger report - sum(1) is a placeholder for
other aggregates I've stripped out.

> > 2) memory explosion in hash join (due to poor estimate?) caused OOM.
>
> As far as I know, the only way a hash join should do this is if the join
> includes a huge number of rows with exactly the same 32 bit hash codes.
> Otherwise, it should spill to disk without causing OOM. Hash aggregates,
> on the other hand, are a different matter.

That's almost certainy what I meant.

-> Subquery Scan on data_rx_enb (cost=3409585.76..3422861.74 rows=663799 width=20) (actual time=510475.987..512069.064 rows=2169821 loops=1)
-> HashAggregate (cost=3409585.76..3416223.75 rows=663799 width=16) (actual time=510475.984..511650.337 rows=2169821 loops=1)
Group Key: eric_enodeb_cell_metrics_1.site_id, eric_enodeb_cell_metrics_1.start_time

Thanks,
Justin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Potukanuma, Vishnu 2017-06-13 04:59:45 FULL_PAGE_WRITES
Previous Message Jeff Janes 2017-06-13 03:46:57 Re: workaround for column cross-correlation