From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: Perf Benchmarking and regression. |
Date: | 2016-05-12 15:58:45 |
Message-ID: | 20160512155845.lcbdg563ikj4p624@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2016-05-12 10:49:06 -0400, Robert Haas wrote:
> On Thu, May 12, 2016 at 8:39 AM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> > Please find the test results for the following set of combinations taken at
> > 128 client counts:
> >
> > 1) Unpatched master, default *_flush_after : TPS = 10925.882396
> >
> > 2) Unpatched master, *_flush_after=0 : TPS = 18613.343529
> >
> > 3) That line removed with #if 0, default *_flush_after : TPS = 9856.809278
> >
> > 4) That line removed with #if 0, *_flush_after=0 : TPS = 18158.648023
>
> I'm getting increasingly unhappy about the checkpoint flush control.
> I saw major regressions on my parallel COPY test, too:
Yes, I'm concerned too.
The workload in this thread is a bit of an "artificial" workload (all
data is constantly updated, doesn't fit into shared_buffers, fits into
the OS page cache), and only measures throughput not latency. But I
agree that that's way too large a regression to accept, and that there's
a significant number of machines with way undersized shared_buffer
values.
> http://www.postgresql.org/message-id/CA+TgmoYoUQf9cGcpgyGNgZQHcY-gCcKRyAqQtDU8KFE4N6HVkA@mail.gmail.com
>
> That was a completely different machine (POWER7 instead of Intel,
> lousy disks instead of good ones) and a completely different workload.
> Considering these results, I think there's now plenty of evidence to
> suggest that this feature is going to be horrible for a large number
> of users. A 45% regression on pgbench is horrible.
I asked you over there whether you could benchmark with just different
values for backend_flush_after... I chose the current value because it
gives the best latency / most consistent throughput numbers, but 128kb
isn't a large window. I suspect we might need to disable backend guided
flushing if that's not sufficient :(
> > Here, That line points to "AddWaitEventToSet(FeBeWaitSet,
> > WL_POSTMASTER_DEATH, -1, NULL, NULL); in pq_init()."
>
> Given the above results, it's not clear whether that is making things
> better or worse.
Yea, me neither. I think it's doubful that you'd see performance
difference due to the original ac1d7945f866b1928c2554c0f80fd52d7f977772
, independent of the WaitEventSet stuff, at these throughput rates.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2016-05-12 17:54:38 | Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan) |
Previous Message | Robert Haas | 2016-05-12 15:56:28 | Re: Change error code for hstore syntax error |