Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"
Date: 2023-11-10 01:44:21
Message-ID: CAMkU=1yCAKtv86dMrD__Ja-7KzjE=uMeKX8y__cx5W-OEWy2ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I was looking into a possible scalability problem with GIN indexes under
concurrent insert, but instead I found an uncharacterized bug. One of the
processes will occasionally throw an error "ERROR: buffer 10112 is not
owned by resource owner Portal" where the buffer number changes from run to
run.

I've verified this with both 14.9 and 16.1, on ubuntu 22.04. I use an AWS
m5.4xlarge machine, and haven't tried to verify it on anything else. I
don't currently have any real hardware with enough CPUs to do a meaningful
test.

I've attached the "user data" file I feed to AWS to run the test, this one
is for v14.9. The v16.1 is similar except I compile PostgreSQL myself
(without JIT) rather than getting it from apt. I standup an ubuntu 22.04
m5.4xlarge machine with all the defaults, except changing the storage from
8GB to 80GB, and fed it the attached user data cloud init file.

If you don't want to parse the meat out of the file, the core of the test
is to run this command with some escalating level of concurrency in a
loop. Each call just inserts one JSONB object with highly redundant keys
(the same 10 keys present in every row) but a more distinctive value for
each key.

insert into j (j) select jsonb_object_agg(x::text,
left(md5(random()::text),5)) from generate_series(1,10) f(x);

I've never seen the error occur until the concurrency reaches at least 4,
but sample size is too low for that to be definitive.

Unless someone has some better idea, my next step will be to switch the
column from jsonb to text[] and see if it exists there as well.

I assume the synchronous_commit=off is needed because without it you
couldn't accumulate enough trials to spot the bug, even though it
would exist in that setting. I guess I could run the test on a machine
with very fast SSD and leave synchronous_commit=on, but I'm not looking
forward to the cost of renting a machine that can do that or figuring out
how to configure it. I also haven't tried it with fastupdate on. I assume
the test would not work because the pending list would grow without bound
at high concurrencies (it would grow faster than a single-threaded cleaner
could clean it) and so not seeing the bug would not mean it wasn't present.

The test loops the insert for one minute, at each concurrency from 1 to 10,
then starts over at -c 1 again. It seems like if you don't see the bug
within the first 20 minutes (the first two 1-to-10 concurrency cycles) you
are unlikely to see it at all. But that is more a hunch than a formal
analysis.

Cheers,

Jeff

Attachment Content-Type Size
cloud_init_gin_bug application/octet-stream 1.1 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Junwang Zhao 2023-11-10 03:42:08 Re: Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"
Previous Message Tom Lane 2023-11-09 20:51:01 Re: BUG #18184: ERROR: wrong varnullingrels (b) (expected (b 3)) for Var 2/2