Re: Loading table with indexed jsonb field is stalling

From: Tim Cross <theophilusx(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Loading table with indexed jsonb field is stalling
Date: 2019-05-18 02:53:53
Message-ID: 878sv4bhhj.fsf@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Will Hartung <willhartung(at)gmail(dot)com> writes:

> I am trying to load data in to a table with a jsonb field that is indexed
> as gin (field jsonb_path_ops).
>
> It's a large table, and I'm loading it 100K rows at a time. Eventually, it
> would be 30M+ rows in the table.
>
> Originally I simply loaded the table and then tried to create the index,
> but it never finished.
>
> So, I'm trying to load it incrementally.
>
> I have 2.8M rows in the table so far, the jsonb field size is, on average,
> 1600 bytes, with the largest (of the 2.8M loaded) 1930. Simply, these are
> not large structures.
>
> The first batches to load took various times for each file. Most of them <
> 1m, some took 1/2 hr.
>
> The current file is "stuck", pushing past 20hrs so far.
>
> The VM only has 4G of RAM, it is certainly "busy", but it is not swapping
> (not at the OS level).
>
> Here is a recent top:
>
> top - 11:34:01 up 1 day, 1:49, 2 users, load average: 5.84, 4.94, 4.52
> Tasks: 103 total, 1 running, 59 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 0.0 us, 1.0 sy, 0.0 ni, 0.0 id, 95.3 wa, 0.0 hi, 3.7 si,
> 0.0 st
> KiB Mem : 4040212 total, 152336 free, 181792 used, 3706084 buff/cache
> KiB Swap: 4194300 total, 4189948 free, 4352 used. 3443628 avail Mem
>
> Postgres is pretty much default configurations, I have not tweaked any of
> the memory settings (such as work memory).
>
> My Mac OS host isn’t that busy either, but the VM adds some load, and it's
> not thrashing.
>
> While I was loading the file in 100K row chunks, here are the times of each
> respective chunk to actually load:
>
> 0:46s
> 3:17s
> 8:12s
> 9:54s
> 14:09s
> 12:07s
> 18:50s
> 9:01s
> 25:28s
> 38:49s
> 25:24s
> 1:21s
> 0:47s
> 0:32s
> 0:39s
> 0:31s
> 0:31s
> 0:28s
> 0:29s
> 0:28s
> 0:21s
> 0:27s
> 0:36s
> 0:22s
> 0:27s
> 0:20s
> 0:21s
> 0:19s
> 2:16:21s <— the last to date, but this was yesterday, now it's past 20hrs
>
> It stalled early, but then rocketed to the stalling cliff staring climax
> that it's at now.
>
> The only reason this is on a Linux VM is that I saw similar behavior
> running native Postgres 9.6 on Mac OS (the host). It didn’t make any sense
> that Mac OS would be causing this, but, who knows. Try it and see.
>
> Since the start of the load of the stalled piece, something has consumed
> over 800M of storage, I can’t say what, I did not check with any higher
> fidelity as to where the storage was going.
>
> I do not understand why this suddenly falls over a cliff. The JSONs are not
> large, so I don’t see how any individual one could crush the memory
> subsystem. I have to assume that PG is somehow internally thrashing or
> paging or something. I appreciate that the db is not tuned, but I would not
> expect that it would struggle so to create this index, with these values,
> and such small JSON payloads.
>
> Also, it’s not unique to the 29th piece. I’ve tried in the past to skip
> those, and it still failed. This has been happening for some time (months),
> but I keep putting it away.
>
> Any insight is helpful. My biggest fear is that for whatever reason we will
> not be able to reload this table during any particular crisis in the future
> should it come to that.
>
> Thanks.

Which version of postgres?

How are you loading the data? (application, psql, pg_restore) using
(insert, copy)?

--
Tim Cross

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter J. Holzer 2019-05-18 09:35:19 Re: bigint out of range
Previous Message Stefan Keller 2019-05-17 22:26:01 Re: Data entry / data editing tools (more end-user focus).