From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Josh Berkus <josh(at)agliodbs(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Yet another abort-early plan disaster on 9.3 |
Date: | 2014-09-29 15:00:26 |
Message-ID: | CAHyXU0wFAKiwgybbEAX=597-MxtPGdf0bmQvRSJq9db1KTZ6nA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The problem, as I see it, is different. We assume that if there are
> 100 distinct values and you use LIMIT 1 that you would only need to
> scan 1% of rows. We assume that the data is arranged in the table in a
> very homogenous layout. When data is not, and it seldom is, we get
> problems.
Hm, good point -- 'data proximity'. At least in theory, can't this be
measured and quantified? For example, given a number of distinct
values, you could estimate the % of pages read (or maybe non
sequential seeks relative to the number of pages) you'd need to read
all instances of a particular value in the average (or perhaps the
worst) case. One way of trying to calculate that would be to look at
proximity of values in sampled pages (and maybe a penalty assigned for
high update activity relative to table size). Data proximity would
then become a cost coefficient to the benefits of LIMIT.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-09-29 15:02:49 | Re: [REVIEW] Re: Compression of full-page-writes |
Previous Message | Robert Haas | 2014-09-29 14:59:20 | Re: Replication identifiers, take 3 |
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2014-09-29 20:53:10 | Re: Yet another abort-early plan disaster on 9.3 |
Previous Message | Graeme B. Bell | 2014-09-29 11:08:22 | Re: Very slow postgreSQL 9.3.4 query |