From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Adaptive query optimization |
Date: | 2019-06-12 11:36:08 |
Message-ID: | b7644989-9c35-d0f2-2775-28f9c886c2f5@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12.06.2019 0:43, Tomas Vondra wrote:
>
>
> I don't think "learning phase" is an issue, in fact I think that's
> something we need to do - it ensures we have enough data to make good
> decisions.
>
What is wrong with learning phase is that it requires some DBA
assistance: somebody should determine when to start learning,
provide relevant workload and determine when learning is finished.
One of the most recent trends in DBMSes is autonomous databases with
zero administration effort.
It is especially important for clouds. And of one the main advantages of
AQO is that it allows to optimize queries without human interaction.
But unfortunately I really do not know how to avoid learning phase,
especially if we what to run queries at replica.
>> 2. It saves collected data in Postgres tables, which makes read-only
>> transaction executing only queries to become read-write transaction,
>> obtaining XID...
>
> Yeah, that's an issue because it makes it useless on standbys etc. I
> think it'd be enough to do something similar to pg_stat_statements, i.e.
> store it in memory and flush it to disk once in a while.
>
Thus is why my AQO implementation is storing data in file.
>> 3. It doesn't take in account concrete values of literals used in
>> clauses, so it is not able to address data skews.
>
> Yep. I don't think it's necessarily an issue with all approaches to
> adaptive optimization, though. But I agree we should detect both
> systemic estimation issues, and misestimates for particular parameter
> values. I think that's doable.
>
>> 4. Machine learning can be quite expensive and seems to be overkill
>> if we want just to adjust selectivities according to actual number of
>> affected rows.
>>
>
> I think that depends - some machine learning approaches are not that
> bad. But I think there's a more serious issue - explainability. We need
> a solution where we can explain/justify why it makes some decisions. I
> really don't want a black box that produces numbers that you just need
> to take at face value.
>
> The good thing is that the simpler the method, the less expensive and
> more explainable it is.
>
>> I tried to create much simpler version of AQO based on auto_explain
>> extension.
>> This extension provide all necessary infrastructure to analyze
>> statements with long execution time.
>> I have added two new modes to auto_explain:
>> 1. Auto generation of multicolumn statistics for variables using in
>> clauses with large estimation error.
>
> Interesting! I probably wouldn't consider this part of adaptive query
> optimization, but it probably makes sense to make it part of this. I
> wonder if we might improve this to also suggest "missing" indexes?
I think that it should be nest step of adaptive query optimization:
- autogeneration of indexes
- auto adjustment of optimizer cost parameters (cpu cost,
random/sequential page access cost,...)
There is already extension hypothetical index
https://github.com/HypoPG/hypopg
which can be used to estimate effect of introducing new indexes.
>
> Right. But I think I might have an idea how to address (some of) this.
>
> As I already mentioned, I was experimenting with something similar,
> maybe two or three years ago (I remember chatting about it with Teodor
> at pgcon last week). I was facing the same issues, and my approach was
> based on hooks too.
>
> But my idea was to not to track stats for a plan as a whole, but instead
> decompose it into individual nodes, categoried into three basic groups -
> scans, joins and aggregations. And then use this extracted information
> to other plans, with "matching" nodes.
>
> For example, let's consider a simple "single-scan" query
>
> SELECT * FROM t1 WHERE a = ? AND b = ? AND c < ?;
>
> Now, if you execute this enought times (say, 100x or 1000x), tracking
> the estimates and actual row counts, you may then compute the average
> misestimate (maybe a geometric mean would be more appropriate here?):
>
> AVG(actual/estimate)
Certainly stats should be collected for each plan node, not for the
whole plan.
And it is done now in Oleg's and my implementation.
Oleg is using gradient descent method. I first tried to calculate
average, but then find out that building something like "histogram",
where bin is determined as log10 of estimated number of rows.
>
> and if this is significantly different from 1.0, then we can say there's
> a systemic misestimate, and we can use this as a correction coefficient
> when computing the scan estimate. (And we need to be careful about
> collection new data, because the estimates will include this correction.
> But that can be done by tracking "epoch" of the plan.)
>
> Now, if someone uses this same scan in a join, like for example
>
> SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id)
> WHERE (t1.a = ? AND t1.b = ? AND t1.c < ?)
> AND (t2.x = ? AND t2.y = ?)
>
> then we can still apply the same correction to the t1 scan (I think).
> But then we can also collect data for the t1-t2 join, and compute a
> correction coefficient in a similar way. It requires a bit of care
> because we need to compensate for misestimates of inputs, but I think
> that's doable.
>
> Of course, all this is rather speculative, and I never got to anything
> beyond a very simple PoC. So I hope it makes at least some sense.
>
As far as I know Oleg's AQO is now used by Amason.
So it is something more than just PoC. But certainly there are still
many problems
and my experiments with JOB benchmark shown that there are still a lot
of things to improve.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2019-06-12 11:52:40 | catalog files simplification |
Previous Message | Peter Eisentraut | 2019-06-12 11:16:54 | Re: check_recovery_target_lsn() does a PG_CATCH without a throw |