Re: Query executed during pg_dump leads to excessive memory usage

From: Ulf Lohbrügge <ulf(dot)lohbruegge(at)gmail(dot)com>
To: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query executed during pg_dump leads to excessive memory usage
Date: 2021-09-19 18:28:48
Message-ID: CABZYQR+neagRaV5FYp9mgu8S1Bu_1MJM0ENoLgLG=zou-BGdzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Sep 19, 2021 at 2:06 PM Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:

> Can you try:
>
> 1. Limit resource usage by Postgres, with cgroups configuration.
>
Since the database cluster is running at AWS, I have no access to any
cgroups configuration.

> 2. pg_dump compression: man pgsql -Z
>
I don't see how this will improve the actual process of dumping the
database. If I understand correctly the compression is applied after the
data has been created by pg_dump.

> 3. Run vacuum and reindex before?
>
I did a manual 'VACUUM ANALYZE;' on the whole database 2-3 weeks ago but
didn't check if a reindex is necessary yet. Table pg_stat_user_indexes
currently lists 672,244 indexes. But I don't see how this will help with
the query I posted that almost takes 13 minutes to finish.

Best regards
Ulf

On Sun, Sep 19, 2021 at 2:06 PM Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:

> Em dom., 19 de set. de 2021 às 07:05, Ulf Lohbrügge <
> ulf(dot)lohbruegge(at)gmail(dot)com> escreveu:
>
>> Hi there,
>>
>> A database cluster (PostgreSQL 12.4 running on Amazon Aurora @
>> db.r5.xlarge) with a single database of mine consists of 1,656,618 rows in
>> pg_class. Using pg_dump on that database leads to excessive memory usage
>> and sometimes even a kill by signal 9:
>>
>> 2021-09-18 16:51:24 UTC::@:[29787]:LOG: Aurora Runtime process (PID
>> 29794) was terminated by signal 9: Killed
>> 2021-09-18 16:51:25 UTC::@:[29787]:LOG: terminating any other active
>> server processes
>> 2021-09-18 16:51:27 UTC::@:[29787]:FATAL: Can't handle storage runtime
>> process crash
>> 2021-09-18 16:51:31 UTC::@:[29787]:LOG: database system is shut down
>>
>> The query that is being fired by pg_dump is the following:
>> SELECT t.tableoid, t.oid, t.typname, t.typnamespace, (SELECT
>> pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM
>> pg_catalog.unnest(coalesce(t.typacl,pg_catalog.acldefault('T',t.typowner)))
>> WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM
>> pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('T',t.typowner)))
>> AS init(init_acl) WHERE acl = init_acl)) as foo) AS typacl, (SELECT
>> pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM
>> pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('T',t.typowner)))
>> WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM
>> pg_catalog.unnest(coalesce(t.typacl,pg_catalog.acldefault('T',t.typowner)))
>> AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rtypacl, NULL AS
>> inittypacl, NULL AS initrtypacl, (SELECT rolname FROM pg_catalog.pg_roles
>> WHERE oid = t.typowner) AS rolname, t.typelem, t.typrelid, CASE WHEN
>> t.typrelid = 0 THEN ' '::"char" ELSE (SELECT relkind FROM pg_class WHERE
>> oid = t.typrelid) END AS typrelkind, t.typtype, t.typisdefined,
>> t.typname[0] = '_' AND t.typelem != 0 AND (SELECT typarray FROM pg_type te
>> WHERE oid = t.typelem) = t.oid AS isarray FROM pg_type t LEFT JOIN
>> pg_init_privs pip ON (t.oid = pip.objoid AND pip.classoid =
>> 'pg_type'::regclass AND pip.objsubid = 0);
>>
>> The query plan looks like this. It takes almost 13 minutes(!) to execute
>> that query:
>> QUERY
>> PLAN
>>
>> ----------------------------------------------------------------------------------------------------------------------------------------------
>> Hash Left Join (cost=4.65..8147153.76 rows=1017962 width=280) (actual
>> time=2.526..106999.294 rows=1026902 loops=1)
>> Hash Cond: (t.oid = pip.objoid)
>> -> Seq Scan on pg_type t (cost=0.00..36409.62 rows=1017962
>> width=122) (actual time=0.008..8836.693 rows=1026902 loops=1)
>> -> Hash (cost=4.64..4.64 rows=1 width=45) (actual time=2.342..41.972
>> rows=0 loops=1)
>> Buckets: 1024 Batches: 1 Memory Usage: 8kB
>> -> Seq Scan on pg_init_privs pip (cost=0.00..4.64 rows=1
>> width=45) (actual time=2.341..22.109 rows=0 loops=1)
>> Filter: ((classoid = '1247'::oid) AND (objsubid = 0))
>> Rows Removed by Filter: 176
>> SubPlan 1
>> -> Aggregate (cost=0.38..0.39 rows=1 width=32) (actual
>> time=0.031..0.031 rows=1 loops=1026902)
>> -> Hash Anti Join (cost=0.24..0.37 rows=1 width=20) (actual
>> time=0.008..0.008 rows=0 loops=1026902)
>> Hash Cond: (perm.acl = init.init_acl)
>> -> Function Scan on unnest perm (cost=0.01..0.11
>> rows=10 width=20) (actual time=0.001..0.001 rows=2 loops=1026902)
>> -> Hash (cost=0.11..0.11 rows=10 width=12) (actual
>> time=0.002..0.002 rows=2 loops=1026902)
>> Buckets: 1024 Batches: 1 Memory Usage: 9kB
>> -> Function Scan on unnest init (cost=0.01..0.11
>> rows=10 width=12) (actual time=0.001..0.001 rows=2 loops=1026902)
>> SubPlan 2
>> -> Aggregate (cost=0.38..0.39 rows=1 width=32) (actual
>> time=0.050..0.050 rows=1 loops=1026902)
>> -> Hash Anti Join (cost=0.24..0.37 rows=1 width=20) (actual
>> time=0.008..0.008 rows=0 loops=1026902)
>> Hash Cond: (initp.acl = permp.orig_acl)
>> -> Function Scan on unnest initp (cost=0.01..0.11
>> rows=10 width=20) (actual time=0.001..0.001 rows=2 loops=1026902)
>> -> Hash (cost=0.11..0.11 rows=10 width=12) (actual
>> time=0.002..0.002 rows=2 loops=1026902)
>> Buckets: 1024 Batches: 1 Memory Usage: 9kB
>> -> Function Scan on unnest permp
>> (cost=0.01..0.11 rows=10 width=12) (actual time=0.001..0.001 rows=2
>> loops=1026902)
>> SubPlan 3
>> -> Index Scan using pg_authid_oid_index on pg_authid
>> (cost=0.28..2.29 rows=1 width=64) (actual time=0.002..0.002 rows=1
>> loops=1026902)
>> Index Cond: (oid = t.typowner)
>> SubPlan 4
>> -> Index Scan using pg_class_oid_index on pg_class
>> (cost=0.43..2.45 rows=1 width=1) (actual time=0.003..0.003 rows=1
>> loops=671368)
>> Index Cond: (oid = t.typrelid)
>> SubPlan 5
>> -> Index Scan using pg_type_oid_index on pg_type te
>> (cost=0.42..2.44 rows=1 width=4) (actual time=0.020..0.020 rows=1
>> loops=355428)
>> Index Cond: (oid = t.typelem)
>> Planning Time: 0.535 ms
>> Execution Time: 774011.175 ms
>> (35 rows)
>>
>> The high number of rows in pg_class result from more than ~550 schemata,
>> each containing more than 600 tables. It's part of a multi tenant setup
>> where each tenant lives in its own schema.
>>
>> I began to move schemata to another database cluster to reduce the number
>> of rows in pg_class but I'm having a hard time doing so as a call to
>> pg_dump might result in a database restart.
>>
>> Is there anything I can do to improve that situation?
>>
> Can you try:
>
> 1. Limit resource usage by Postgres, with cgroups configuration.
> 2. pg_dump compression: man pgsql -Z
> 3. Run vacuum and reindex before?
>
> regards,
> Ranier Vilela
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gogala, Mladen 2021-09-19 19:20:56 Re: Query executed during pg_dump leads to excessive memory usage
Previous Message Michel SALAIS 2021-09-19 15:41:48 RE: sql execution time in pg_stat_statements