Re: Query executed during pg_dump leads to excessive memory usage

From: "Gogala, Mladen" <gogala(dot)mladen(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Query executed during pg_dump leads to excessive memory usage
Date: 2021-09-19 19:20:56
Message-ID: deedf5a5-bef3-a15e-bd73-a87e8dde8608@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 9/19/2021 8:05 AM, Ranier Vilela wrote:
> Em dom., 19 de set. de 2021 às 07:05, Ulf Lohbrügge
> <ulf(dot)lohbruegge(at)gmail(dot)com <mailto:ulf(dot)lohbruegge(at)gmail(dot)com>> escreveu:
>
> Hi there,
>
> A database cluster (PostgreSQL 12.4 running on Amazon Aurora @
> db.r5.xlarge) with a single database of mine consists of 1,656,618
> rows in pg_class. Using pg_dump on that database leads to
> excessive memory usage and sometimes even a kill by signal 9:
>
> 2021-09-18 16:51:24 UTC::@:[29787]:LOG:  Aurora Runtime process
> (PID 29794) was terminated by signal 9: Killed
> 2021-09-18 16:51:25 UTC::@:[29787]:LOG:  terminating any other
> active server processes
> 2021-09-18 16:51:27 UTC::@:[29787]:FATAL:  Can't handle storage
> runtime process crash
> 2021-09-18 16:51:31 UTC::@:[29787]:LOG:  database system is shut down
>
> The query that is being fired by pg_dump is the following:
> SELECT t.tableoid, t.oid, t.typname, t.typnamespace, (SELECT
> pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n
> FROM
> pg_catalog.unnest(coalesce(t.typacl,pg_catalog.acldefault('T',t.typowner)))
> WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1
> FROM
> pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('T',t.typowner)))
> AS init(init_acl) WHERE acl = init_acl)) as foo) AS typacl,
> (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl,
> row_n FROM
> pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('T',t.typowner)))
> WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1
> FROM
> pg_catalog.unnest(coalesce(t.typacl,pg_catalog.acldefault('T',t.typowner)))
> AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rtypacl, NULL
> AS inittypacl, NULL AS initrtypacl, (SELECT rolname FROM
> pg_catalog.pg_roles WHERE oid = t.typowner) AS rolname, t.typelem,
> t.typrelid, CASE WHEN t.typrelid = 0 THEN ' '::"char" ELSE (SELECT
> relkind FROM pg_class WHERE oid = t.typrelid) END AS typrelkind,
> t.typtype, t.typisdefined, t.typname[0] = '_' AND t.typelem != 0
> AND (SELECT typarray FROM pg_type te WHERE oid = t.typelem) =
> t.oid AS isarray FROM pg_type t LEFT JOIN pg_init_privs pip ON
> (t.oid = pip.objoid AND pip.classoid = 'pg_type'::regclass AND
> pip.objsubid = 0);
>
> The query plan looks like this. It takes almost 13 minutes(!) to
> execute that query:
>             QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------
>  Hash Left Join  (cost=4.65..8147153.76 rows=1017962 width=280)
> (actual time=2.526..106999.294 rows=1026902 loops=1)
>    Hash Cond: (t.oid = pip.objoid)
>    ->  Seq Scan on pg_type t  (cost=0.00..36409.62 rows=1017962
> width=122) (actual time=0.008..8836.693 rows=1026902 loops=1)
>    ->  Hash  (cost=4.64..4.64 rows=1 width=45) (actual
> time=2.342..41.972 rows=0 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 8kB
>          ->  Seq Scan on pg_init_privs pip  (cost=0.00..4.64
> rows=1 width=45) (actual time=2.341..22.109 rows=0 loops=1)
>                Filter: ((classoid = '1247'::oid) AND (objsubid = 0))
>                Rows Removed by Filter: 176
>    SubPlan 1
>      ->  Aggregate  (cost=0.38..0.39 rows=1 width=32) (actual
> time=0.031..0.031 rows=1 loops=1026902)
>            ->  Hash Anti Join  (cost=0.24..0.37 rows=1 width=20)
> (actual time=0.008..0.008 rows=0 loops=1026902)
>                  Hash Cond: (perm.acl = init.init_acl)
>                  ->  Function Scan on unnest perm
>  (cost=0.01..0.11 rows=10 width=20) (actual time=0.001..0.001
> rows=2 loops=1026902)
>                  ->  Hash  (cost=0.11..0.11 rows=10 width=12)
> (actual time=0.002..0.002 rows=2 loops=1026902)
>                        Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                        ->  Function Scan on unnest init
>  (cost=0.01..0.11 rows=10 width=12) (actual time=0.001..0.001
> rows=2 loops=1026902)
>    SubPlan 2
>      ->  Aggregate  (cost=0.38..0.39 rows=1 width=32) (actual
> time=0.050..0.050 rows=1 loops=1026902)
>            ->  Hash Anti Join  (cost=0.24..0.37 rows=1 width=20)
> (actual time=0.008..0.008 rows=0 loops=1026902)
>                  Hash Cond: (initp.acl = permp.orig_acl)
>                  ->  Function Scan on unnest initp
>  (cost=0.01..0.11 rows=10 width=20) (actual time=0.001..0.001
> rows=2 loops=1026902)
>                  ->  Hash  (cost=0.11..0.11 rows=10 width=12)
> (actual time=0.002..0.002 rows=2 loops=1026902)
>                        Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                        ->  Function Scan on unnest permp
>  (cost=0.01..0.11 rows=10 width=12) (actual time=0.001..0.001
> rows=2 loops=1026902)
>    SubPlan 3
>      ->  Index Scan using pg_authid_oid_index on pg_authid
>  (cost=0.28..2.29 rows=1 width=64) (actual time=0.002..0.002
> rows=1 loops=1026902)
>            Index Cond: (oid = t.typowner)
>    SubPlan 4
>      ->  Index Scan using pg_class_oid_index on pg_class
>  (cost=0.43..2.45 rows=1 width=1) (actual time=0.003..0.003 rows=1
> loops=671368)
>            Index Cond: (oid = t.typrelid)
>    SubPlan 5
>      ->  Index Scan using pg_type_oid_index on pg_type te
>  (cost=0.42..2.44 rows=1 width=4) (actual time=0.020..0.020 rows=1
> loops=355428)
>            Index Cond: (oid = t.typelem)
>  Planning Time: 0.535 ms
>  Execution Time: 774011.175 ms
> (35 rows)
>
> The high number of rows in pg_class result from more than ~550
> schemata, each containing more than 600 tables. It's part of a
> multi tenant setup where each tenant lives in its own schema.
>
> I began to move schemata to another database cluster to reduce the
> number of rows in pg_class but I'm having a hard time doing so as
> a call to pg_dump might result in a database restart.
>
> Is there anything I can do to improve that situation?
>
> Can you try:
>
> 1. Limit resource usage by Postgres, with cgroups configuration.
> 2. pg_dump compression: man pgsql -Z
> 3. Run vacuum and reindex before?
>
> regards,
> Ranier Vilela

Try setting enable_seqscan=off for the user executing pg_dump, usually
"postgres".  You have a ton of full table scans and hash joins which are
the best plan for pg_dump, given the fact that you are dumping the
entire database, but use a lot of memory.  Index scan will be slower,
probably a lot slower, but will avoid allocating large memory areas for
hash.

--
Mladen Gogala
Oracle DBA
Tel: (347) 321-1217
Blog: https://dbwhisperer.wordpress.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2021-09-19 19:21:06 Re: Query executed during pg_dump leads to excessive memory usage
Previous Message Ulf Lohbrügge 2021-09-19 18:28:48 Re: Query executed during pg_dump leads to excessive memory usage