Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Alvaro Herrera from 2ndQuadrant <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date: 2019-12-12 04:19:29
Message-ID: CAA4eK1+CFYAwXxsNwnhLPpeObZu-PBsWgDszB_VJUTxo9Y+yvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 11, 2019 at 4:17 PM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>
> On Sat, 7 Dec 2019 at 11:37, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Dec 6, 2019 at 5:00 PM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> > >
> > > On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > 1.
> > > > + /* Now that the state fields are initialized, it is safe to return it. */
> > > > + *iter_state = state;
> > > > +
> > > > /* allocate heap */
> > > > state->heap =
> > > > binaryheap_allocate(state->nr_txns,
> > > > ReorderBufferIterCompare,
> > > >
> > > > Is there a reason for not initializing iter_state after
> > > > binaryheap_allocate? If we do so, then we don't need additional check
> > > > you have added in ReorderBufferIterTXNFinish.
> > >
> > > If iter_state is initialized *after* binaryheap_allocate, then we
> > > won't be able to close the vfds if binaryheap_allocate() ereports().
> > >
> >
> > Is it possible to have vfds opened before binaryheap_allocate(), if so how?
> No it does not look possible for the vfds to be opened before
> binaryheap_allocate(). But actually, the idea behind placing the
> iter_state at the place where I put is that, we should return back the
> iter_state at the *earliest* place in the code where it is safe to
> return.
>

Sure, I get that point, but it seems it is equally good to do this
after binaryheap_allocate(). It will be sligthly better because if
there is any error in binaryheap_allocate, then we don't need to even
call ReorderBufferIterTXNFinish().

>
> >> 4. I think we should also check how much time increase will happen for
> >> test_decoding regression test after the test added by this patch?
> Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> > Yeah, it currently takes noticeably longer compared to the others.
> > Let's see if setting logical_decoding_work_mem to a min value allows
> > us to reproduce the test with much lesser number of inserts.
>
> The test in the patch takes around 20 seconds, as compared to the max
> time of 2 seconds any of the other tests take in that test suite.
>
> But if we set the max_files_per_process to a very low value (say 26)
> as against the default 1000, we can reproduce the issue with as low as
> 20 sub-transactions as against the 600 that I used in spill.sql test.
> And with this, the test runs in around 4 seconds, so this is good.
>

Do you get 4s even after setting the minimum value of
logical_decoding_work_mem? I think 4s is also too much for this test
which is going to test one extreme scenario. Can you try with some
bigger rows or something else to reduce this time? I think if we can
get it close to 2s or whatever maximum time taken by any other logical
decoding tests, then good, otherwise, it doesn't seem like a good idea
to add this test.

> But
> the problem is : max_files_per_process needs server restart. So either
> we have to shift this test to src/test/recovery in one of the
> logical_decoding test, or retain it in contrib/test_decoding and let
> it run for 20 seconds. Let me know if you figure out any other
> approach.
>

I don't think 20s for one test is acceptable. So, we should just give
up that path, you can try what I suggested above, if we can reduce the
test time, then good, otherwise, I suggest to drop the test from the
patch. Having said that, we should try our best to reduce the test
time as it will be good if we can have such a test.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message asaba.takanori@fujitsu.com 2019-12-12 04:51:48 RE: Conflict handling for COPY FROM
Previous Message Dilip Kumar 2019-12-12 04:14:50 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions