Quick Links

Re: Death postgres

From:	Marc Millas <marc(dot)millas(at)mokadb(dot)com>
To:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc:	"pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject:	Re: Death postgres
Date:	2023-05-10 14:35:04
Message-ID:	CADX_1aZn2NAxKW6mdmRqP8ciyPhztQ2bRZv6UYYYVJt+jDceeQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Sun, May 7, 2023 at 8:42 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
wrote:

> On Sat, 2023-05-06 at 03:14 +0200, Marc Millas wrote:
> > postgres 14.2 on Linux redhat
> >
> > temp_file_limit set around 210 GB.
> >
> > a select request with 2 left join have crashed the server (oom killer)
> after the postgres
> > disk occupation did grow from 15TB to 16 TB.
> >
> > What are the cases where postgres may grow without caring about
> temp_file_limit ?
>
> That's too little information for a decent answer.
> One obvious answer is: if it is not writing temporary files.
>
> Yours,
> Laurenz Albe
>

Logical ...

so here is a little more info:

db=# analyze myschema.table_a;
ANALYZE

db=# with ta as(select 'myschema' as s, 'table_a' as t), p as (select *
from information_schema.columns, ta where table_schema=ta.s and
table_name=ta.t), tableid as(select relid from pg_stat_user_tables, ta
where schemaname=ta.s and relname=ta.t) select staattnum, column_name,
stanullfrac, stadistinct from tableid, pg_statistic join p on
p.ordinal_position=staattnum where starelid=tableid.relid order by
staattnum;
staattnum | column_name | stanullfrac | stadistinct
-----------+--------------+-------------+-------------
1 | col_ne | 0 | -0.6100224
2 | col_brg | 0.0208 | 6
3 | col_ano | 0 | 447302
4 | col_ine | 0 | -0.5341927
5 | col_cha | 0 | 11
(5 rows)

db=# select count(*) from myschema.table_a;
count
----------
13080776
(1 row)

db=# select count(distinct col_ano) from myschema.table_a;
count
----------
10149937
(1 row)

// stats looks somewhat absurd, as analyze guess 20 times less distinct
values as a select distinct does on column col_ano...
db=# analyze myschema.table_b;
ANALYZE
db=# with ta as(select 'myschema' as s, 'table_b' as t), p as (select *
from information_schema.columns, ta where table_schema=ta.s and
table_name=ta.t), tableid as(select relid from pg_stat_user_tables, ta
where schemaname=ta.s and relname=ta.t) select staattnum, column_name,
stanullfrac, stadistinct from tableid, pg_statistic join p on
p.ordinal_position=staattnum where starelid=tableid.relid order by
staattnum;
staattnum | column_name | stanullfrac | stadistinct
-----------+-----------------+-------------+-------------
1 | col_nerg | 0 | 161828
2 | col_ibi | 0 | 362161
3 | col_imi | 0 | 381023
4 | col_ipi | 0 | 391915
5 | col_ne | 0 | -0.53864235
6 | col_ano | 0 | 482225
(6 rows)

db=# select count(*) from myschema.table_b;
count
----------
14811485
(1 row)

db=# select count(distinct col_ano) from myschema.table_b;
count
----------
10149937
(1 row)

//same remark

db=# explain select distinct t1.col_ine, case when t2.col_ibi is null
then t3.col_ibi else t2.col_ibi end from myschema.table_a t1 left join
myschema.table_b t2 on t1.col_ano=t2.col_ano Left join myschema.table_b t3
on t1.NUM_ENQ=t3.NUM_ENQ;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Unique (cost=72377463163.02..201012533981.80 rows=1021522829864 width=97)
-> Gather Merge (cost=72377463163.02..195904919832.48
rows=1021522829864 width=97)
Workers Planned: 5
-> Sort (cost=72377462162.94..72888223577.87 rows=204304565973
width=97)
Sort Key: t1.col_ine, (CASE WHEN (t2.col_ibi IS NULL) THEN
t3.col_ibi ELSE t2.col_ibi END)
-> Parallel Hash Left Join (cost=604502.76..1276224253.51
rows=204304565973 width=97)
Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
-> Parallel Hash Left Join
(cost=300803.38..582295.38 rows=4857277 width=52)
Hash Cond: ((t1.col_ne)::text =
(t3.col_ne)::text)
-> Parallel Seq Scan on table_a t1
(cost=0.00..184052.35 rows=2616335 width=53)
-> Parallel Hash (cost=243466.06..243466.06
rows=2965306 width=31)
-> Parallel Seq Scan on table_b t3
(cost=0.00..243466.06 rows=2965306 width=31)
-> Parallel Hash (cost=243466.06..243466.06
rows=2965306 width=34)
-> Parallel Seq Scan on table_b t2
(cost=0.00..243466.06 rows=2965306 width=34)
JIT:
Functions: 19
Options: Inlining true, Optimization true, Expressions true, Deforming
true
(17 rows)

//so.. the planner guess that those 2 join will generate 1000 billions
rows...

//so, I try to change stats

db=# alter table myschema.table_a alter column col_ano set
(n_distinct=-0.8);
ALTER TABLE
db=# analyze myschema.table_a;
ANALYZE
db=# with ta as(select 'myschema' as s, 'table_a' as t), p as (select *
from information_schema.columns, ta where table_schema=ta.s and
table_name=ta.t), tableid as(select relid from pg_stat_user_tables, ta
where schemaname=ta.s and relname=ta.t) select staattnum, column_name,
stanullfrac, stadistinct from tableid, pg_statistic join p on
p.ordinal_position=staattnum where starelid=tableid.relid order by
staattnum;
staattnum | column_name | stanullfrac | stadistinct
-----------+--------------+-------------+-------------
1 | col_ne | 0 | -0.6694066
2 | col_brg | 0.0224 | 6
3 | col_ano | 0 | -0.8
4 | col_ine | 0 | -0.542651
5 | col_cha | 0 | 12
(5 rows)

db=# alter table myschema.table_b alter column col_ano set
(n_distinct=-0.8);
ALTER TABLE
db=# analyze myschema.table_b;
ANALYZE
db=# with ta as(select 'myschema' as s, 'table_b' as t), p as (select *
from information_schema.columns, ta where table_schema=ta.s and
table_name=ta.t), tableid as(select relid from pg_stat_user_tables, ta
where schemaname=ta.s and relname=ta.t) select staattnum, column_name,
stanullfrac, stadistinct from tableid, pg_statistic join p on
p.ordinal_position=staattnum where starelid=tableid.relid order by
staattnum;
staattnum | column_name | stanullfrac | stadistinct
-----------+-----------------+-------------+-------------
1 | col_nerg | 0 | 166151
2 | col_ibi | 0 | 374699
3 | col_imi | 0 | 394296
4 | col_ipi | 0 | 404515
5 | col_ne | 0 | -0.47193906
6 | col_ano | 0 | -0.8
(6 rows)

//doing the explain of the joins again

---------------------------------------------------------------------------------------------------------------------------
Unique (cost=62824833042.98..174732492390.53 rows=888686328950 width=97)
-> Gather Merge (cost=62824833042.98..170289060745.78
rows=888686328950 width=97)
Workers Planned: 5
-> Sort (cost=62824832042.91..63269175207.38 rows=177737265790
width=97)
Sort Key: t1.col_ine, (CASE WHEN (t2.col_ibi IS NULL) THEN
t3.col_ibi ELSE t2.col_ibi END)
-> Parallel Hash Left Join (cost=604274.78..1148041043.61
rows=177737265790 width=97)
Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
-> Parallel Hash Left Join
(cost=300691.39..581280.67 rows=4424044 width=53)
Hash Cond: ((t1.col_ne)::text =
(t3.col_ne)::text)
-> Parallel Seq Scan on table_a t1
(cost=0.00..184049.73 rows=2616073 width=54)
-> Parallel Hash (cost=243427.84..243427.84
rows=2961484 width=31)
-> Parallel Seq Scan on table_b t3
(cost=0.00..243427.84 rows=2961484 width=31)
-> Parallel Hash (cost=243427.84..243427.84
rows=2961484 width=34)
-> Parallel Seq Scan on table_b t2
(cost=0.00..243427.84 rows=2961484 width=34)
JIT:
Functions: 19
Options: Inlining true, Optimization true, Expressions true, Deforming
true
(17 rows)

//almost equivalent.
//temp_file_limit is set to 210 GB.
// if I run the request, or explain analyze it (same thing) the disk
occupation grows of 1.1 TB, and the RAM occupation grows until oom kills
//BTW, if I simplify the select to keep only the first join, the number of
estimated rows is still ""huge"" and I didnt try to execute it.

thanks

In response to

Re: Death postgres at 2023-05-07 18:42:00 from Laurenz Albe

Responses

Re: Death postgres at 2023-05-10 16:59:44 from Laurenz Albe
Re: Death postgres at 2023-05-10 17:23:59 from Peter J. Holzer

Browse pgsql-general by date

	From	Date	Subject
Next Message	Laurenz Albe	2023-05-10 16:59:44	Re: Death postgres
Previous Message	Evgeny Morozov	2023-05-10 13:32:16	Re: "PANIC: could not open critical system index 2662" - twice