From: | Jeremy Finzel <finzelj(at)gmail(dot)com> |
---|---|
To: | Michael Lewis <mlewis(at)entrata(dot)com> |
Cc: | PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: BRIN index which is much faster never chosen by planner |
Date: | 2019-10-11 14:19:33 |
Message-ID: | CAMa1XUgnHL9760p9ZqSQNG4u=WSbnedgzx453e_QfpXFYDDY3g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Michael,
On Thu, Oct 10, 2019 at 5:20 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:
> Since the optimizer is choosing a seq scan over index scan when it seems
> like it has good row estimates in both cases, to me that may mean costs of
> scanning index are expected to be high. Is this workload on SSD? Has the
> random_page_cost config been decreased from default 4 (compared with cost
> of 1 unit for sequential scan)?
>
It's 1.5
> Your buffer hits aren't great. What is shared_buffers set to? How much ram
> on this cluster?
>
shared_buffers is 4GB. It has 500G of RAM, but server has several clusters
on it.
>
> With this table being insert only, one assumes correlation is very high on
> the data in this column as shown in pg_stats, but have your confirmed?
>
Yes, but the issue isn't with the BRIN index performing badly or being
fragmented. It's that it performs great (7x faster than the seq scan) but
postgres doesn't pick using it. I have seen this same issue also in other
attempts I have made to use BRIN.
> To me, distinct ON is often a bad code smell and probably can be
> re-written to be much more efficient with GROUP BY, lateral & order by, or
> some other tool. Same with the window function. It is a powerful tool, but
> sometimes not the right one.
>
I don't really agree, but it's beside the point because the issue is not in
aggregation. It's pre-aggregation. Indeed if I run my query as a simple
select (as I tried) it's the exact same planning issue. (In my experience,
distinct on for given example is the fastest. Same with window functions
which prevent inefficient self-joins)
> Is "source" a function that is called on field1? What is it doing/how is
> it defined?
>
I can't see how that matters either, but the "source" function is a mask
for a built-in pg function that is trivial. This whole query is masked so
as not to expose our actual prod query, but I hope it's still
understandable enough :).
My question is not how to make this query faster in general. It's that I
want to use BRIN indexes very much, but I'm not sure I can trust they will
scale with the right query plan like I know BTREE will.
Thanks!
Jeremy
From | Date | Subject | |
---|---|---|---|
Next Message | Jeremy Finzel | 2019-10-11 14:31:08 | Re: BRIN index which is much faster never chosen by planner |
Previous Message | Tom Lane | 2019-10-11 14:08:37 | Re: Issues with PAM : log that it failed, whether it actually failed or not |