Re: Improving Performance of Query ~ Filter by A, Sort by B

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Lincoln Swaine-Moore <lswainemoore(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Improving Performance of Query ~ Filter by A, Sort by B
Date: 2018-07-17 15:59:33
Message-ID: CAMkU=1yar573WF7PPCdWWsVOc6sLpWDPdX_-8xC+9guJ033CEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jul 16, 2018 at 5:29 PM, Lincoln Swaine-Moore <
lswainemoore(at)gmail(dot)com> wrote:

> Tom and Jeff,
>
> Thanks very much for the suggestions!
>
> Here's what I've found so far after playing around for a few more days:
>
> What is your default_statistics_target? What can you tell us about the
>> distribution of parent_id? (exponential, power law, etc?). Can you show
>> the results for select * from pg_stats where tablename='a' and
>> attname='parent_id' \x\g\x ?
>
>
> The default_statistics_target is 500, which I agree seems quite
> insufficient for these purposes. I bumped this up to 2000, and saw some
> improvement in the row count estimation, but pretty much the same query
> plans. Unfortunately the distribution of counts is not intended to be
> correlated to parent_id, which is one reason I imagine the histograms might
> not be particularly effective unless theres one bucket for every value.
> Here is the output you requested:
>
> select * from pg_stats where tablename='a' and attname='parent_id';
>
> schemaname | public
> tablename | a
> attname | parent_id
> inherited | t
> null_frac | 0
> avg_width | 4
> n_distinct | 18871
> most_common_vals | {15503,49787,49786,24595,49784,17549, ...} (2000
> values)
> most_common_freqs | {0.0252983,0.02435,0.0241317,
> 0.02329,0.019095,0.0103967,0.00758833,0.004245, ...} (2000 values)
>

You showed the 8 most common frequencies. But could you also show the last
couple of them? When your queried parent_id value is not on the MCV list,
it is the frequency of the least frequent one on the list which puts an
upper limit on how frequent the one you queried for can be.

> A few questions re: statistics:
> 1) would it be helpful to bump column statistics to, say, 20k (the number
> of distinct values of parent_id)?
>

Only one way to find out...
However you can only go up to 10k, not 20k.

> 2) is the discrepancy between the statistics on the parent and child
> table be expected? certainly I would think that the statistics would be
> different, but I would've imagined they would have histograms of the same
> size given the settings being the same.
>

Is the n_distinct estimate accurate for the partition? There is an
algorithm (which will change in v11) to stop the MCV from filling the
entire statistics target size if it thinks adding more won't be useful.
But I don't know why the histogram boundary list would be short. But, I
doubt that that is very important here. The histogram is only used for
inequality/range, not for equality/set membership.

> 3) is there a way to manually specify the the distribution of rows to be
> even? that is, set the frequency of each value to be ~ n_rows/n_distinct.
> This isn't quite accurate, but is a reasonable assumption about the
> distribution, and might generate better query plans.
>

This would be going in the wrong direction. Your queries seem to
preferentially use rare parent_ids, not typical parent_ids. In fact, it
seems like many of your hard-coded parent_ids don't exist in the table at
all. That certainly isn't going to help the planner any. Could you
somehow remove those before constructing the query?

You might also take a step back, where is that list of parent_ids coming
from in the first place, and why couldn't you convert the list of literals
into a query that returns that list naturally?

> You could try reversing the order and adding a column to be (tmstmp,
>> parent_id, id) and keeping the table well vacuumed. This would allow the
>> slow plan to still walk the indexes in tmstmp order but do it as an
>> index-only scan, so it could omit the extra trip to the table. That trip to
>> the table must be awfully slow to explain the numbers you show later in the
>> thread.
>
>
> Just to clarify, do you mean building indexes like:
> CREATE INDEX "a_tmstmp_parent_id_id_idx_[PART_KEY]" on
> "a_partition[PART_KEY]" USING btree("tmstmp", "parent_id", "id")
> That seems promising! Is the intuition here that we want the first key of
> the index to be the one we are ultimately ordering by? Sounds like I make
> have had that flipped initially. My understanding of this whole situation
> (and please do correct me if this doesn't make sense) is the big bottleneck
> here is reading pages from disk (when looking at stopped up queries, the
> wait_event is DataFileRead), and so anything that can be done to minimize
> the pages read will be valuable. Which is why I would love to get the query
> plan to use the tmstmp index without having to filter thereafter by
> parent_id.
>

Yes, that is the index.

You really want it to filter by parent_id in the index, rather than going
to the table to do the filter on parent_id. The index pages with tmstmp as
the leading column are going to be more tightly packed with potentially
relevant rows, while the table pages are less likely to be densely packed.
So filtering in the index leads to less IO. Currently, the only way to
make that in-index filtering happen is with an index-only scan. So the
tmstmp needs to be first, and other two need to be present in either
order. Note that if you change select list of your query to select more
than just "id", you will again defeat the index-only-scan.

>
> What happens when you remove that extra order by phrase that you added?
>> The original slow plan should become much faster when the number of
>> parent_ids is large (it has to dig through fewer index entries before
>> accumulating 20 good ones), so you should try going back to that.
>
>
> Unfortunately, I've found that even when the number of parent_ids is large
> (2000), it's still prohibitively slow (haven't got it to finish), and
> maintains a query plan that involves an Index Scan Backward across the
> a_tmstmp_idxs (with a filter for parent_id).
>

I think the reason for this is that your 2000 literal parent_id values are
preferentially rare or entirely absent from the table. Therefore adding
more parent_ids doesn't cause more rows to meet the filter qualification,
so you don't accumulate a LIMIT worth of rows very fast.

The original index you discussed, '(parent_id, tmpstmp desc)', would be
ideal if the parent_id were tested for equality rather than set
membership. So another approach to speeding this up would be to rewrite
the query to test for equality on each parent_id separately and then
combine the results.

select id from (
(select * from a where parent_id= 34226 order by tmstmp desc limit 20)
union all
(select * from a where parent_id= 24506 order by tmstmp desc limit 20)
union all
(select * from a where parent_id= 40986 order by tmstmp desc limit 20)
union all
(select * from a where parent_id= 27162 order by tmstmp desc limit 20)
) foo
ORDER BY tmstmp DESC LIMIT 20;

I don't know what is going to happen when you go from 4 parent_ids up to
2000, though.

It would be nice if PostgreSQL would plan your original query this way
itself, but it lacks the machinery to do so. I think there was a patch to
add that machinery, but I don't remember seeing any work on it recently.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2018-07-17 23:04:44 Re: Why HDD performance is better than SSD in this case
Previous Message Tomas Vondra 2018-07-17 14:57:32 Re: Special bloom index of INT, BIGINT, BIT, VARBIT for bitwise operation