Re: Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"
Date: 2023-11-10 03:42:08
Message-ID: CAEG8a3LkcfaUw=p2MF4w97RSc=EJpdWVziskH93xTX4Vs8npdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Nov 10, 2023 at 9:44 AM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> I was looking into a possible scalability problem with GIN indexes under concurrent insert, but instead I found an uncharacterized bug. One of the processes will occasionally throw an error "ERROR: buffer 10112 is not owned by resource owner Portal" where the buffer number changes from run to run.
>
> I've verified this with both 14.9 and 16.1, on ubuntu 22.04. I use an AWS m5.4xlarge machine, and haven't tried to verify it on anything else. I don't currently have any real hardware with enough CPUs to do a meaningful test.
>
> I've attached the "user data" file I feed to AWS to run the test, this one is for v14.9. The v16.1 is similar except I compile PostgreSQL myself (without JIT) rather than getting it from apt. I standup an ubuntu 22.04 m5.4xlarge machine with all the defaults, except changing the storage from 8GB to 80GB, and fed it the attached user data cloud init file.
>
> If you don't want to parse the meat out of the file, the core of the test is to run this command with some escalating level of concurrency in a loop. Each call just inserts one JSONB object with highly redundant keys (the same 10 keys present in every row) but a more distinctive value for each key.
>
> insert into j (j) select jsonb_object_agg(x::text, left(md5(random()::text),5)) from generate_series(1,10) f(x);
>
> I've never seen the error occur until the concurrency reaches at least 4, but sample size is too low for that to be definitive.
>
> Unless someone has some better idea, my next step will be to switch the column from jsonb to text[] and see if it exists there as well.
>
> I assume the synchronous_commit=off is needed because without it you couldn't accumulate enough trials to spot the bug, even though it would exist in that setting. I guess I could run the test on a machine with very fast SSD and leave synchronous_commit=on, but I'm not looking forward to the cost of renting a machine that can do that or figuring out how to configure it. I also haven't tried it with fastupdate on. I assume the test would not work because the pending list would grow without bound at high concurrencies (it would grow faster than a single-threaded cleaner could clean it) and so not seeing the bug would not mean it wasn't present.
>
> The test loops the insert for one minute, at each concurrency from 1 to 10, then starts over at -c 1 again. It seems like if you don't see the bug within the first 20 minutes (the first two 1-to-10 concurrency cycles) you are unlikely to see it at all. But that is more a hunch than a formal analysis.
>
> Cheers,
>
> Jeff

I can reproduce this by checking to e9f075f9a15593fe31c610e15cfc71a5fa281ede,
but master seems ok since Heikki has some ResourceOwner related patch
committed after that.

--
Regards
Junwang Zhao

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-11-10 04:31:02 Re: Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"
Previous Message Jeff Janes 2023-11-10 01:44:21 Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"