Re: Out of Memory errors are frustrating as heck!

From: Gunther <raj(at)gusw(dot)net>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Out of Memory errors are frustrating as heck!
Date: 2019-04-21 05:31:06
Message-ID: dc4cf865-36f1-1d83-d8ee-d1aaa0bd3224@gusw.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I  am now running Justin's patch after undoing Tomas' patches and any of
my own hacks (which would not have interfered with Tomas' patch)

On 4/20/2019 15:30, Justin Pryzby wrote:
> With my patch, the number of batches is nonlinear WRT work_mem, and reaches a
> maximum for moderately small work_mem. The goal is to choose the optimal
> number of batches to minimize the degree to which work_mem is exceeded.

Now I seem to be in that slow massive growth phase or maybe still in an
earlier step, but I can see the top RES behavior different already.  The
size lingers around 400 MB.  But then it's growing too, at high CPU%,
goes past 700, 800, 900 MB now 1.5 GB, 1.7 GB, 1.8 GB, 1.9 GB, 2.0 GB,
2.1, and still 98% CPU. 2.4 GB, wow, it has never been that big ... and
BOOM!

TopMemoryContext: 120544 total in 7 blocks; 9760 free (7 chunks); 110784 used
HandleParallelMessages: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used
Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used
pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 416 free (0 chunks); 7776 used
TopTransactionContext: 8192 total in 1 blocks; 6680 free (0 chunks); 1512 used
RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used
MessageContext: 32768 total in 3 blocks; 13488 free (10 chunks); 19280 used
Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
smgr relation table: 32768 total in 3 blocks; 16832 free (8 chunks); 15936 used
TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks); 256 used
Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used
PortalHoldContext: 24632 total in 2 blocks; 7392 free (0 chunks); 17240 used
PortalContext: 1482752 total in 184 blocks; 11216 free (9 chunks); 1471536 used:
ExecutorState: 2361536 total in 27 blocks; 1827536 free (3163 chunks); 534000 used
TupleSort main: 3957712 total in 22 blocks; 246792 free (39 chunks); 3710920 used
TupleSort main: 4219912 total in 23 blocks; 246792 free (39 chunks); 3973120 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
HashTableContext: 8192 total in 1 blocks; 7336 free (6 chunks); 856 used
HashBatchContext: 2523874568 total in 76816 blocks; 7936 free (0 chunks); 2523866632 used
TupleSort main: 41016 total in 3 blocks; 6504 free (6 chunks); 34512 used
Caller tuples: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
Relcache by OID: 16384 total in 2 blocks; 3512 free (2 chunks); 12872 used
CacheMemoryContext: 1101328 total in 14 blocks; 288672 free (1 chunks); 812656 used
index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: pg_toast_2619_index
index info: 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: entity_id_fkidx
...
index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: pg_attribute_relid_attnum_index
index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_class_oid_index
WAL record construction: 49768 total in 2 blocks; 6368 free (0 chunks); 43400 used
PrivateRefCount: 8192 total in 1 blocks; 2624 free (0 chunks); 5568 used
MdSmgr: 8192 total in 1 blocks; 6176 free (1 chunks); 2016 used
LOCALLOCK hash: 16384 total in 2 blocks; 4600 free (2 chunks); 11784 used
Timezones: 104120 total in 2 blocks; 2624 free (0 chunks); 101496 used
ErrorContext: 8192 total in 1 blocks; 7936 free (4 chunks); 256 used
Grand total: 2538218304 bytes in 77339 blocks; 3075256 free (3372 chunks); 2535143048 used
2019-04-21 05:27:07.731 UTC [968] ERROR: out of memory
2019-04-21 05:27:07.731 UTC [968] DETAIL: Failed on request of size 32800 in memory context "HashBatchContext".
2019-04-21 05:27:07.731 UTC [968] STATEMENT: explain analyze select * from reports.v_BusinessOperation;

so we're ending up with the same problem.

No cigar. But lots of admiration and gratitude for all your attempts to
pinpoint this.

Also, again, if anyone (of the trusted people) wants access to hack
directly, I can provide.

regards,
-Gunther

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2019-04-21 08:08:22 Re: Out of Memory errors are frustrating as heck!
Previous Message Gunther 2019-04-21 05:03:50 Re: Out of Memory errors are frustrating as heck!