Re: Out of memory condition

From: Carlos Henrique Reimer <carlos(dot)reimer(at)opendb(dot)com(dot)br>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Out of memory condition
Date: 2014-12-11 16:42:56
Message-ID: CAJnnue1BEtddyjAjDhKnKX7Mhay5Qp9Q0x-v+r3ZWz9x0Qx9BQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Yes, I agree, 8.3 is out of support for a long time and this is the reason
we are trying to migrate to 9.3 using SLONY to minimize downtime.

I eliminated the possibility of data corruption as the limit/offset
technique indicated different rows each time it was executed. Actually, the
failure is still happening and as it is running in a virtual machine,
memory size configuration for this virtual machine was increased from 48GB
to 64GB and we have scheduled a server shutdown/restart for the next coming
weekend in order to try to get rid of the failure.

The replication activity was aborted: SLONY triggers removed, SLONY
processes terminated and SLONY schemas removed.

Ulimit output was appended at the end of this note.

Memory statistics dump from postmaster log resulted from a select * from
"8147_spunico"."sincdc"; command:

Thank you!

TopMemoryContext: 80800 total in 9 blocks; 4088 free (10 chunks); 76712 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
TopTransactionContext: 8192 total in 1 blocks; 7648 free (1 chunks); 544
used
MessageContext: 57344 total in 3 blocks; 40760 free (6 chunks); 16584 used
smgr relation table: 24576 total in 2 blocks; 13904 free (4 chunks);
10672 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 720 free (0 chunks); 304 used
ExecutorState: 381096528 total in 6 blocks; 49856 free (30 chunks);
381046672 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used
CacheMemoryContext: 817392 total in 20 blocks; 230456 free (3 chunks);
586936 used
pg_toast_729119_index: 2048 total in 1 blocks; 608 free (0 chunks);
1440 used
idx_sincdc_situacao: 2048 total in 1 blocks; 752 free (0 chunks); 1296
used
idx_sincdc_esqtab: 2048 total in 1 blocks; 656 free (0 chunks); 1392
used
idx_sincdc_datahoraexp: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
pk_sincdc: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0
chunks); 1440 used
pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_statistic_relid_att_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks);
1688 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3
chunks); 1448 used
pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
pg_cast_source_target_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3
chunks); 1640 used
pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328
used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1624 free (3
chunks); 1448 used
pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks);
1688 used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1432 free (3
chunks); 1640 used
pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_ts_config_map_index: 3072 total in 2 blocks; 1624 free (3 chunks);
1448 used
pg_namespace_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_type_typname_nsp_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_operator_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1648 free (2 chunks);
1424 used
pg_proc_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328
used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3
chunks); 1448 used
pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
MdSmgr: 8192 total in 1 blocks; 7648 free (0 chunks); 544 used
LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used

Ulimit output:
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 385725
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

On Thu, Dec 11, 2014 at 1:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Carlos Henrique Reimer <carlos(dot)reimer(at)opendb(dot)com(dot)br> writes:
> > I've facing an out of memory condition after running SLONY several hours
> to
> > get a 1TB database with about 23,000 tables replicated. The error occurs
> > after about 50% of the tables were replicated.
>
> I'd try bringing this up with the Slony crew.
>
> > I guess postgresql is trying to perform an atomic allocation (those which
> > cannot wait for reclaim) to get a continues memory area and is failing
> due
> > to memory fragmentation.
>
> This theory has nothing to do with reality. More likely it's just a
> garden variety memory leak. If it was an out-of-memory error reported
> by Postgres, there should have been a memory statistics dump written in
> the postmaster log --- can you find that and post it?
>
> Another possible theory is that you're just looking at lots of memory
> needed to hold relcache entries for all 23000 tables :-(. If so there
> may not be any easy way around it, except perhaps replicating subsets
> of the tables. Unless you can boost the memory available to the backend
> --- since this is a 64 bit build, the only reason I can see for
> out-of-memory failures would be a restrictive ulimit setting.
>
> > After SLONY gets the out of memory condition, select * of the table also
> > does not work:
> > FiscalWeb=# select * from "8147_spunico"."sincdc";
> > ERROR: out of memory
> > DETAIL: Failed on request of size 268435456.
>
> That's odd ... looks more like data corruption than anything else.
> Does this happen even in a fresh session? What do you have to do
> to get rid of the failure?
>
> > PostgreSQL 8.3.21 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
> > 4.4.6 20120305 (Red Hat 4.4.6-4)
>
> You realize of course that this version is years out of support, and that
> even if this problem traces to a bug in Postgres, 8.3 is not going to get
> fixed.
>
> regards, tom lane
>

--
Reimer
47-3347-1724 47-9183-0547 msn: carlos(dot)reimer(at)opendb(dot)com(dot)br

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jim McLaughlin 2014-12-11 16:48:04 libpq equivalent of jdbc:default:connection
Previous Message Tom Lane 2014-12-11 15:30:54 Re: Out of memory condition