Re: Erratically behaving query needs optimization

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Barbu Paul - Gheorghe <barbu(dot)paul(dot)gheorghe(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Erratically behaving query needs optimization
Date: 2019-08-25 14:50:55
Message-ID: CAMkU=1w6fCk0SeaCqSRFsnWu64sjha5MvrqFLE-RkMNVu4uL_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Aug 20, 2019 at 10:22 AM Barbu Paul - Gheorghe <
barbu(dot)paul(dot)gheorghe(at)gmail(dot)com> wrote:

>
> The query takes a long time to execute, specially at first, due to
> cold caches I think, but the performance varies greatly during a run
> of the application (while applying the said action by the user several
> times).
>

Yes, it certainly looks like it is due to cold caches. But you say it is
slow at first, and then say it varies greatly during a run. Is being slow
at first the only way it varies greatly, or is there large variation even
beyond that?

You can use pg_rewarm to overcome the cold cache issue when you first start
up the server.

EXPLAIN (ANALYZE,BUFFERS)
> SELECT DISTINCT ON (results.attribute_id) results.timestamp,
> results.data FROM results
> JOIN scheduler_operation_executions ON
> scheduler_operation_executions.id = results.operation_execution_id
> JOIN scheduler_task_executions ON scheduler_task_executions.id =
> scheduler_operation_executions.task_execution_id
> WHERE scheduler_task_executions.device_id = 97
> AND results.data <> '<NullData/>'
> AND results.data IS NOT NULL
> AND results.object_id = 1955
> AND results.attribute_id IN (4, 5) -- possibly a longer list here
> AND results.data_access_result = 'SUCCESS'
> ORDER BY results.attribute_id, results.timestamp DESC
> LIMIT 2 -- limit by the length of the attributes list
>

If you query only on "results" with only the conditions that apply to
"results", what is the expected number of rows, and what is the actual
number of rows?

...

How can I improve it to be consistently fast (is it possible to get to
> several milliseconds?)?
>

Maybe. Depends on the answer to my previous question.

> What I don't really understand is why the nested loop has 3 loops
> (three joined tables)?
>

Each parallel execution counts as a loop. There are 2 parallel workers,
plus the leader also participates, making three.

> And why does the first index scan indicate ~60k loops? And does it
> really work? It doesn't seem to filter out any rows.
>

The parallel hash join returns about 20,000 rows, but I think that that is
just for one of the three parallel executions, making about 60,000 in
total. I don't know why one of the nodes report combined execution and the
other just a single worker. Parallel queries are hard to understand. When
I want to optimize a query that does parallel execution, I just turn off
parallelism ("set max_parallel_workers_per_gather TO 0;") at first to make
is simpler to understand.

Apparently everything with device_id = 97 just happens to pass all the rest
of your filters. If you need those filters to make sure you get the right
answer in all cases, then you need them. A lifeboat isn't useless just
because your ship didn't happen to sink today.

>
> Should I add an index only on (attribute_id, object_id)? And maybe
> data_access_result?
> Does it make sens to add it on a text column (results.data)?
>

Which parts of query you give are going to change from execution to
execution?

Assuming the parts for object_id and attribute_id are variable and the rest
are static, I think the optimal index would be "create index on results
(object_id, attribute_id) where data IS NOT NULL and data <> '<NullData/>'
and data_access_result = 'SUCCESS'"

Why does results.data have two different "spellings" for null data?

However, if the number of rows from "results" that meet all your criteria
are high, the index won't make much of a difference. The planner has a
fundamental choice to make, should it seek things with device_id = 97, and
then check each of those to see if they satisfy your conditions on
"results" fields conditions. Or, should it seek things that satisfy the
"results" fields conditions, and then check each of those to see if they
satisfy device_id = 97. It is currently doing the first of those. Whether
it should be doing the second, and whether creating the index will cause it
to switch to using the second, are two (separate) questions which can't be
answered with the data given.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Barbu Paul - Gheorghe 2019-08-26 08:25:57 Re: Erratically behaving query needs optimization
Previous Message Andres Freund 2019-08-24 19:54:03 Re: Extremely slow HashAggregate in simple UNION query