High process memory consumption when running sort

From: Shai Shapira <Shai(dot)Shapira(at)Amdocs(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Cc: Shai Shapira <Shai(dot)Shapira(at)Amdocs(dot)com>
Subject: High process memory consumption when running sort
Date: 2022-03-23 14:42:06
Message-ID: VI1PR06MB5567FA134E185D5243CA726CEF189@VI1PR06MB5567.eurprd06.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

When running our application, we noticed that some processes are taking a lot of memory ( 10, 15, 20GB or so, of RSS ).
It is also reproduced when running in psql.

PG version is 12.6

2 examples:

* Common table, PG_BUFFERCACHE, when doing group by, session takes 140MB, which is not a lot, but still more than the 20MB that set for work_mem.
* Application table, ~15M rows, grouping by a smallint columns - takes ~1000 MB.

As I wrote before, application processes reached tens of GB.

In the first case, PG also used temp files, at the second case, when more memory was used and also in the application case, temp files were not created.

I will try to add as much details as possible, please let me know if there is something additional that is required.

Thanks,
Shai

More details:

First what I see, and then versions, parameters, etc.
Note: this DB is set with Patroni, replication, etc. but the scenario was reproduce ( up to few hundreds MB, not tens of GB ) on other environment, without it.

Queries:

1. PG_BUFFERCACHE :

* INSERT INTO PGAWR_BUFFERCACHE_SUMMARY( SELECT 1, NOW(), RELFILENODE, RELDATABASE, COUNT(*) AS BUFFERS_COUNT FROM PG_BUFFERCACHE GROUP BY RELFILENODE, RELDATABASE) ;
* Insert 12309 rows.
* Table has 2097152 rows.

1. Application table:

* Query: select cycle_code, count(*) from ape1_subscr_offers group by cycle_code ;

* table has 15318725 rows.

* The cycle_code column is the first column of an index.

* Table is partitioned, 176 partitions.

* The result
cycle_code | count
------------+---------
1 | 3824276
2 | 3824745
3 | 3834609
9 | 3835095
(4 rows)

paaspg=> show work_mem;
work_mem
----------
20MB
(1 row)

Table structure:

paaspg=> \d ape1_subscr_offers
Partitioned table "vm1app.ape1_subscr_offers"
Column | Type | Collation | Nullable | Default
-----------------------+-----------------------------+-----------+----------+---------
cycle_code | smallint | | not null |
customer_segment | smallint | | not null |
subscriber_id | bigint | | not null |
offer_id | integer | | not null |
offer_instance | bigint | | not null |
offer_eff_date | timestamp without time zone | | not null |
sys_creation_date | timestamp without time zone | | not null |
sys_update_date | timestamp without time zone | | |
operator_id | integer | | |
application_id | character(6) | | |
dl_service_code | character(5) | | |
dl_update_stamp | smallint | | | 0
update_id | bigint | | |
offer_exp_date | timestamp without time zone | | |
source_offer_agr_id | bigint | | |
source_offer_instance | bigint | | |
eff_act_code_pror | character varying(25) | | |
exp_act_code_pror | character varying(25) | | |
load_ind | character(1) | | |
Partition key: RANGE (cycle_code, customer_segment)
Indexes:
"ape1_subscr_offers_pkey" PRIMARY KEY, btree (cycle_code, customer_segment, subscriber_id, offer_id, offer_instance, offer_eff_date)
"ape1_subscr_offers_1ix" btree (update_id)
Number of partitions: 176 (Use \d+ to list them.)

Explain:
paaspg=> explain select cycle_code, count(*) from ape1_subscr_offers group by cycle_code ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=385331.98..385382.65 rows=200 width=10)
Group Key: ape1_subscr_offers_p40.cycle_code
-> Gather Merge (cost=385331.98..385378.65 rows=400 width=10)
Workers Planned: 2
-> Sort (cost=384331.96..384332.46 rows=200 width=10)
Sort Key: ape1_subscr_offers_p40.cycle_code
-> Partial HashAggregate (cost=384322.31..384324.31 rows=200 width=10)
Group Key: ape1_subscr_offers_p40.cycle_code
-> Parallel Append (cost=0.00..352347.81 rows=6394900 width=2)
-> Parallel Seq Scan on ape1_subscr_offers_p40 (cost=0.00..5052.94 rows=101094 width=2)
-> Parallel Seq Scan on ape1_subscr_offers_p46 (cost=0.00..5042.73 rows=100972 width=2)
-> Parallel Seq Scan on ape1_subscr_offers_p37 (cost=0.00..5040.12 rows=100912 width=2)
-> Parallel Seq Scan on ape1_subscr_offers_p149 (cost=0.00..5037.25 rows=100825 width=2)
-> Parallel Seq Scan on ape1_subscr_offers_p145 (cost=0.00..5029.36 rows=100536 width=2)

..
-> Parallel Seq Scan on ape1_subscr_offers_p183 (cost=0.00..11.53 rows=153 width=2)
-> Parallel Seq Scan on ape1_subscr_offers_p184 (cost=0.00..11.53 rows=153 width=2)
-> Parallel Seq Scan on ape1_subscr_offers_p185 (cost=0.00..11.53 rows=153 width=2)
(185 rows)

Memory consumption: ( of case 2, application table, using system_stats )

select act.pid, application_name,
backend_type,
pretty_timestamp(xact_start) as xact_start, pretty_timestamp(query_start) as query_start,
pretty_timestamp(backend_start) as backend_start,
cpu_usage,
pg_size_pretty(memory_bytes) as memory_bytes,
pretty_query(query,50 ) as query
from pg_sys_cpu_memory_by_process() stat, pg_stat_activity act
where stat.pid = act.pid
and act.application_name like 'psql%'
order by 1
;
pid | application_name | backend_type | xact_start | query_start | backend_start | cpu_usage | memory_bytes | query
-------+------------------+----------------+---------------------+---------------------+---------------------+-----------+--------------+----------------------------------------------------
10142 | psql | client backend | 2022-03-23 16:32:20 | 2022-03-23 16:32:20 | 2022-03-23 16:32:20 | 8.79 | 8568 kB | select act.pid, application_name, backend_type, pr
15298 | psql | client backend | | 2022-03-23 16:32:11 | 2022-03-23 16:05:44 | 0 | 1134 MB | select cycle_code, count(*) from ape1_subscr_offer

Using top:

top - 16:30:46 up 17 days, 3:10, 3 users, load average: 0.41, 0.35, 0.37
Tasks: 507 total, 1 running, 506 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.4 us, 0.6 sy, 0.0 ni, 94.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 65804144 total, 5241032 free, 1811912 used, 58751200 buff/cache
KiB Swap: 15728636 total, 13837292 free, 1891344 used. 46488956 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15298 postgres 20 0 16.8g 1.1g 1.1g S 0.0 1.7 0:02.63 postgres
13524 postgres 20 0 17.1g 777016 510644 S 0.0 1.2 7:35.34 postgres
19971 postgres 20 0 17.1g 776540 517872 S 0.0 1.2 7:22.66 postgres
8514 postgres 20 0 16.8g 639680 638964 S 0.0 1.0 0:53.79 postgres
26120 postgres 20 0 16.8g 574916 557856 S 0.0 0.9 0:20.33 postgres
22529 postgres 20 0 16.9g 572728 556956 S 0.0 0.9 0:04.80 postgres

PG version:

paaspg=> SELECT version()
paaspg-> ;
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 12.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit

OS version:

postgres(at)illin7504:pgsql/Users/Shai> uname -a
Linux illin7504 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 11 19:12:04 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

Parameters: ( which are not default )

with params as
(
SELECT name, source, context, substring(setting,1,50) val, unit,
substring(boot_val,1,20) default_val
FROM pg_settings
)
select * from params
WHERE source != 'default'
ORDER BY 1;
name | source | context | val | unit | default_val
-------------------------------------+----------------------+-------------------+----------------------------------------------------+------+----------------------
application_name | client | user | psql | |
archive_command | configuration file | sighup | (disabled) | |
archive_mode | configuration file | postmaster | off | | off
autovacuum_analyze_scale_factor | configuration file | sighup | 0.15 | | 0.1
autovacuum_max_workers | configuration file | postmaster | 4 | | 3
autovacuum_naptime | configuration file | sighup | 15 | s | 60
autovacuum_vacuum_cost_limit | configuration file | sighup | 1200 | | -1
autovacuum_vacuum_scale_factor | configuration file | sighup | 0.05 | | 0.2
checkpoint_completion_target | configuration file | sighup | 0.9 | | 0.5
cluster_name | command line | postmaster | postgres-cluster | |
config_file | override | postmaster | /pgcluster/pgdata/12.6/data/postgresql.conf | |
data_checksums | override | internal | off | | off
data_directory | override | postmaster | /pgcluster/pgdata/12.6/data | |
DateStyle | configuration file | user | ISO, MDY | | ISO, MDY
default_text_search_config | configuration file | user | pg_catalog.english | | pg_catalog.simple
dynamic_shared_memory_type | configuration file | postmaster | posix | | posix
effective_cache_size | configuration file | user | 6291456 | 8kB | 524288
effective_io_concurrency | configuration file | user | 200 | | 1
hba_file | override | postmaster | /pgcluster/pgdata/12.6/data/pg_hba.conf | |
hot_standby | command line | postmaster | on | | on
ident_file | override | postmaster | /pgcluster/pgdata/12.6/data/pg_ident.conf | |
idle_in_transaction_session_timeout | configuration file | user | 3600000 | ms | 0
lc_collate | override | internal | en_US.UTF-8 | | C
lc_ctype | override | internal | en_US.UTF-8 | | C
lc_messages | configuration file | superuser | en_US.UTF-8 | |
lc_monetary | configuration file | user | en_US.UTF-8 | | C
lc_numeric | configuration file | user | en_US.UTF-8 | | C
lc_time | configuration file | user | en_US.UTF-8 | | C
listen_addresses | command line | postmaster | 10.234.167.191,10.234.166.148,127.0.0.1 | | localhost
log_autovacuum_min_duration | configuration file | sighup | 0 | ms | -1
log_checkpoints | configuration file | sighup | on | | off
log_connections | configuration file | superuser-backend | on | | off
log_destination | configuration file | sighup | stderr | | stderr
log_directory | configuration file | sighup | pg_log | | log
log_disconnections | configuration file | superuser-backend | on | | off
log_filename | configuration file | sighup | postgresql-%a-%H.log | | postgresql-%Y-%m-%d_
logging_collector | configuration file | postmaster | on | | off
log_hostname | configuration file | sighup | on | | off
log_line_prefix | configuration file | sighup | %t:%r:%u(at)%d:[%p]: | | %m [%p]
log_lock_waits | configuration file | superuser | on | | off
log_min_duration_statement | configuration file | superuser | 100 | ms | -1
log_rotation_age | configuration file | sighup | 60 | min | 1440
log_rotation_size | configuration file | sighup | 0 | kB | 10240
log_statement | configuration file | superuser | all | | none
log_temp_files | configuration file | superuser | 4096 | kB | -1
log_timezone | configuration file | sighup | Asia/Jerusalem | | GMT
log_transaction_sample_rate | configuration file | superuser | 0 | | 0
log_truncate_on_rotation | configuration file | sighup | on | | off
maintenance_work_mem | configuration file | user | 2097152 | kB | 65536
max_connections | command line | postmaster | 3000 | | 100
max_locks_per_transaction | command line | postmaster | 100 | | 64
max_parallel_maintenance_workers | configuration file | user | 2 | | 2
max_parallel_workers | configuration file | user | 8 | | 8
max_parallel_workers_per_gather | configuration file | user | 2 | | 2
max_prepared_transactions | command line | postmaster | 0 | | 0
max_replication_slots | command line | postmaster | 18 | | 10
max_stack_depth | environment variable | superuser | 2048 | kB | 100
max_wal_senders | command line | postmaster | 10 | | 10
max_wal_size | configuration file | sighup | 8192 | MB | 1024
max_worker_processes | command line | postmaster | 8 | | 8
min_wal_size | configuration file | sighup | 2048 | MB | 80
pg_stat_statements.track | configuration file | superuser | all | | top
port | command line | postmaster | 5432 | | 5432
primary_conninfo | configuration file | postmaster | user=replicator passfile=/tmp/pgpass host=10.234.1 | |
primary_slot_name | configuration file | postmaster | illin7504 | |
random_page_cost | configuration file | user | 1.1 | | 4
recovery_target_lsn | configuration file | postmaster | | |
recovery_target_name | configuration file | postmaster | | |
recovery_target_time | configuration file | postmaster | | |
recovery_target_timeline | configuration file | postmaster | latest | | latest
recovery_target_xid | configuration file | postmaster | | |
server_encoding | override | internal | UTF8 | | SQL_ASCII
shared_buffers | configuration file | postmaster | 2097152 | 8kB | 1024
shared_preload_libraries | configuration file | postmaster | pg_stat_statements,auto_explain | |
synchronous_standby_names | configuration file | sighup | illin7505 | |
temp_buffers | configuration file | user | 8192 | 8kB | 1024
TimeZone | configuration file | user | Asia/Jerusalem | | GMT
track_commit_timestamp | command line | postmaster | on | | off
track_io_timing | configuration file | superuser | on | | off
transaction_deferrable | override | user | off | | off
transaction_isolation | override | user | read committed | | read committed
transaction_read_only | override | user | off | | off
unix_socket_directories | configuration file | postmaster | /var/run/postgresql | | /var/run/postgresql,
wal_buffers | override | postmaster | 2048 | 8kB | -1
wal_keep_segments | configuration file | sighup | 8 | | 0
wal_level | command line | postmaster | logical | | replica
wal_log_hints | command line | postmaster | on | | off
wal_segment_size | override | internal | 16777216 | B | 16777216
wal_sync_method | configuration file | sighup | fdatasync | | fdatasync
work_mem | configuration file | user | 20480 | kB | 4096
(90 rows)

Shai Shapira
* shai(dot)shapira(at)amdocs(dot)com<mailto:shai(dot)shapira(at)amdocs(dot)com>
* +972 9 776 4171

This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service <https://www.amdocs.com/about/email-terms-of-service>

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Lars Aksel Opsahl 2022-03-23 14:49:39 Re: Using system tables directly takes many hours, using temp tables with no indexes takes a few seconds for geometry_columns view.
Previous Message Justin Pryzby 2022-03-23 13:19:21 Re: Using system tables directly takes many hours, using temp tables with no indexes takes a few seconds for geometry_columns view.