Loading table with indexed jsonb field is stalling

From: Will Hartung <willhartung(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Loading table with indexed jsonb field is stalling
Date: 2019-05-17 19:10:24
Message-ID: CAKMEDdxE95SC76wadMzrH454HpvCV4tOdEmuZRO=pwArEw6YVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am trying to load data in to a table with a jsonb field that is indexed
as gin (field jsonb_path_ops).

It's a large table, and I'm loading it 100K rows at a time. Eventually, it
would be 30M+ rows in the table.

Originally I simply loaded the table and then tried to create the index,
but it never finished.

So, I'm trying to load it incrementally.

I have 2.8M rows in the table so far, the jsonb field size is, on average,
1600 bytes, with the largest (of the 2.8M loaded) 1930. Simply, these are
not large structures.

The first batches to load took various times for each file. Most of them <
1m, some took 1/2 hr.

The current file is "stuck", pushing past 20hrs so far.

The VM only has 4G of RAM, it is certainly "busy", but it is not swapping
(not at the OS level).

Here is a recent top:

top - 11:34:01 up 1 day, 1:49, 2 users, load average: 5.84, 4.94, 4.52
Tasks: 103 total, 1 running, 59 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 1.0 sy, 0.0 ni, 0.0 id, 95.3 wa, 0.0 hi, 3.7 si,
0.0 st
KiB Mem : 4040212 total, 152336 free, 181792 used, 3706084 buff/cache
KiB Swap: 4194300 total, 4189948 free, 4352 used. 3443628 avail Mem

Postgres is pretty much default configurations, I have not tweaked any of
the memory settings (such as work memory).

My Mac OS host isn’t that busy either, but the VM adds some load, and it's
not thrashing.

While I was loading the file in 100K row chunks, here are the times of each
respective chunk to actually load:

0:46s
3:17s
8:12s
9:54s
14:09s
12:07s
18:50s
9:01s
25:28s
38:49s
25:24s
1:21s
0:47s
0:32s
0:39s
0:31s
0:31s
0:28s
0:29s
0:28s
0:21s
0:27s
0:36s
0:22s
0:27s
0:20s
0:21s
0:19s
2:16:21s <— the last to date, but this was yesterday, now it's past 20hrs

It stalled early, but then rocketed to the stalling cliff staring climax
that it's at now.

The only reason this is on a Linux VM is that I saw similar behavior
running native Postgres 9.6 on Mac OS (the host). It didn’t make any sense
that Mac OS would be causing this, but, who knows. Try it and see.

Since the start of the load of the stalled piece, something has consumed
over 800M of storage, I can’t say what, I did not check with any higher
fidelity as to where the storage was going.

I do not understand why this suddenly falls over a cliff. The JSONs are not
large, so I don’t see how any individual one could crush the memory
subsystem. I have to assume that PG is somehow internally thrashing or
paging or something. I appreciate that the db is not tuned, but I would not
expect that it would struggle so to create this index, with these values,
and such small JSON payloads.

Also, it’s not unique to the 29th piece. I’ve tried in the past to skip
those, and it still failed. This has been happening for some time (months),
but I keep putting it away.

Any insight is helpful. My biggest fear is that for whatever reason we will
not be able to reload this table during any particular crisis in the future
should it come to that.

Thanks.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Stefan Keller 2019-05-17 22:26:01 Re: Data entry / data editing tools (more end-user focus).
Previous Message Tom Lane 2019-05-17 17:09:22 Re: FATAL: SMgrRelation hashtable corrupted