From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: DBT-3 with SF=20 got failed |
Date: | 2015-09-25 12:51:31 |
Message-ID: | 56054353.8070005@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/25/2015 02:54 AM, Robert Haas wrote:
> On Thu, Sep 24, 2015 at 1:58 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Meh, you're right - I got the math wrong. It's 1.3% in both cases.
>>
>> However the question still stands - why should we handle the
>> over-estimate in one case and not the other? We're wasting the
>> samefraction of memory in both cases.
>
> Well, I think we're going around in circles here. It doesn't seem
> likely that either of us will convince the other.
Let's agree we disagree ;-) That's perfectly OK, no hard feelings.
> But for the record, I agree with you that in the scenario you lay
> out, it's the about the same problem in both cases. I could argue
> that it's slightly different because of [ tedious and somewhat
> tenuous argument omitted ], but I'll spare you that.
OK, although that makes kinda prevents further discussion.
> However, consider the alternative scenario where, on the same
> machine, perhaps even in the same query, we perform two hash joins,
> one of which involves hashing a small table (say, 2MB) and one of
> which involves hashing a big table (say, 2GB). If the small query
> uses twice the intended amount of memory, probably nothing bad will
> happen. If the big query does the same thing, a bad outcome is much
> more likely. Say the machine has 16GB of RAM. Well, a 2MB
> over-allocation is not going to break the world. A 2GB
> over-allocation very well might.
I've asked about case A. You've presented case B and shown that indeed,
the limit seems to help here. I don't see how that makes any difference
in case A, which I asked about?
> I really don't see why this is a controversial proposition. It seems
> clearly as daylight from here.
I wouldn't say controversial, but I do see the proposed solution as
misguided because we're fixing A and claiming to also fix B. Not only
we're not really fixing B, but may actually make it needlessly slower
for people who don't have problems with B at all.
We've ran into problem with allocating more than MaxAllocSize. The
proposed fix (imposing arbitrary limit) is also supposedly fixing
over-estimation problems, but actually it not (IMNSHO).
And I think my view is supported by the fact that solutions that seem to
actually fix the over-estimation properly emerged. I mean the "let's not
build the buckets at all, until the very end" and "let's start with
nbatches=0" discussed yesterday. (And I'm not saying that because I
proposed those two things.)
Anyway, I think you're right we're going in circles here. I think we
both presented all the arguments we had and we still disagree. I'm not
going to continue with this - I'm unlikely to win an argument against
two committers if that didn't happen until now. Thanks for the
discussion though.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2015-09-25 13:28:54 | cluster_name and update_process_title documentation |
Previous Message | Rushabh Lathia | 2015-09-25 12:09:41 | Re: Why can't we used CAPITAL LETTERS into replication slot_name? |