Re: 600 million rows of data. Bad hardware or need partitioning?

From: Arya F <arya6000(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Michael Lewis <mlewis(at)entrata(dot)com>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: 600 million rows of data. Bad hardware or need partitioning?
Date: 2020-05-06 00:15:14
Message-ID: CAFoK1azUSgoVLQgPd-GP9UJkPn=91Gc=_UwTOV_SzyuWA01F5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, May 4, 2020 at 12:44 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> How long does it take if you first do:
>
> SET enable_nestloop TO off;

I tried this, but it takes much longer

Time: 318620.319 ms (05:18.620)

Below is the EXPLAIN ANALYZE

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on test_table (cost=639.96..4581378.80 rows=10000 width=160)
(actual time=290593.159..290593.159 rows=0 loops=1)
-> Hash Join (cost=639.96..4581378.80 rows=10000 width=160)
(actual time=422.313..194430.318 rows=10000 loops=1)
Hash Cond: (test_table.id = subquery.id)
-> Seq Scan on test_table (cost=0.00..4200967.98
rows=101238898 width=92) (actual time=296.970..177731.611
rows=101189271 loops=1)
-> Hash (cost=514.96..514.96 rows=10000 width=56) (actual
time=125.312..125.312 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 959kB
-> Subquery Scan on subquery (cost=0.00..514.96
rows=10000 width=56) (actual time=0.030..123.031 rows=10000 loops=1)
-> Limit (cost=0.00..414.96 rows=10000
width=16) (actual time=0.024..121.014 rows=10000 loops=1)
-> Seq Scan on test_table test_table_1
(cost=0.00..4200967.98 rows=101238898 width=16) (actual
time=0.021..120.106 rows=10000 loops=1)
Planning Time: 0.304 ms
JIT:
Functions: 12
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 2.178 ms, Inlining 155.980 ms, Optimization
100.611 ms, Emission 39.481 ms, Total 298.250 ms
Execution Time: 290595.448 ms
(15 rows)

> If you find it's faster then you most likely have random_page_cost set
> unrealistically low. In fact, I'd say it's very unlikely that a nested
> loop join will be a win in this case when random pages must be read
> from a mechanical disk, but by all means, try disabling it with the
> above command and see for yourself.

It's much slower with SET enable_nestloop TO off. Any other suggestions?

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Arya F 2020-05-06 00:31:29 Re: 600 million rows of data. Bad hardware or need partitioning?
Previous Message Tom Lane 2020-05-05 21:42:48 Re: Please help! Query jumps from 1s -> 4m