Re: Tuning a query with ORDER BY and LIMIT

From: Michael van der Kolff <mvanderkolff(at)gmail(dot)com>
To: "Dirschel, Steve" <steve(dot)dirschel(at)thomsonreuters(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Tuning a query with ORDER BY and LIMIT
Date: 2022-06-22 20:02:25
Message-ID: CAFBbO2TqqLUBO8qAocqG0bEAUfHcx+azpqBNR1Mx-R2uQY8+=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

What do you see when you remove the LIMIT clause? It may be possible to
rewrite this using ROW_NUMBER.

--Michael

On Thu, Jun 23, 2022 at 5:39 AM Dirschel, Steve <
steve(dot)dirschel(at)thomsonreuters(dot)com> wrote:

> I am fairly new to tuning Postgres queries. I have a long background
> tuning Oracle queries.
>
>
>
> Posrgres version 10.11
>
>
>
> Here is the DDL for the index the query is using:
>
>
>
> create index workflow_execution_initial_ui_tabs
>
> on workflow_execution (workflow_id asc, status asc, result asc,
> completed_datetime desc);
>
>
>
>
>
> explain (analyze, verbose, costs, buffers, timing, summary, hashes)
>
> select * from workflow_execution
>
> where workflow_id = 14560 and
>
> status = 'COMPLETED' and
>
> result in
> ('SUCCEEDED','REEXECUTED','ABORTED','DISCONTINUED','FAILED','PARTIAL_SUCCESS')
>
> order by completed_datetime desc limit 50;
>
>
>
> --
>
> Limit (cost=56394.91..56395.04 rows=50 width=1676) (actual
> time=3400.608..3400.622 rows=50 loops=1)
>
> " Output: execution_id, state_machine_id, workflow_id, started_datetime,
> completed_datetime, status, execution_context_s3_arn, ol_version,
> created_datetime, updated_datetime, deleted_millis, acquisition_channel_id,
> correlation_id, result, state_machine_execution_arn, created_by_id,
> updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message,
> acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id,
> notification_trigger_workflow_id, acquired_object_name, subscription_guid"
>
> Buffers: shared hit=142368
>
> -> Sort (cost=56394.91..56432.71 rows=15118 width=1676) (actual
> time=3400.607..3400.615 rows=50 loops=1)
>
> " Output: execution_id, state_machine_id, workflow_id,
> started_datetime, completed_datetime, status, execution_context_s3_arn,
> ol_version, created_datetime, updated_datetime, deleted_millis,
> acquisition_channel_id, correlation_id, result,
> state_machine_execution_arn, created_by_id, updated_by_id,
> acquired_gcs_s3_object, sqs_trigger_id, trigger_message,
> acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id,
> notification_trigger_workflow_id, acquired_object_name, subscription_guid"
>
> Sort Key: workflow_execution.completed_datetime DESC
>
> Sort Method: top-N heapsort Memory: 125kB
>
> Buffers: shared hit=142368
>
> -> Index Scan using workflow_execution_initial_ui_tabs on
> workflow.workflow_execution (cost=0.69..55892.70 rows=15118 width=1676)
> (actual time=0.038..2258.579 rows=2634718 loops=1)
>
> " Output: execution_id, state_machine_id, workflow_id,
> started_datetime, completed_datetime, status, execution_context_s3_arn,
> ol_version, created_datetime, updated_datetime, deleted_millis,
> acquisition_channel_id, correlation_id, result,
> state_machine_execution_arn, created_by_id, updated_by_id,
> acquired_gcs_s3_object, sqs_trigger_id, trigger_message,
> acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id,
> notification_trigger_workflow_id, acquired_object_name, subscription_guid"
>
> " Index Cond: ((workflow_execution.workflow_id = 14560) AND
> ((workflow_execution.status)::text = 'COMPLETED'::text) AND
> ((workflow_execution.result)::text = ANY
> ('{SUCCEEDED,REEXECUTED,ABORTED,DISCONTINUED,FAILED,PARTIAL_SUCCESS}'::text[])))"
>
> Buffers: shared hit=142368
>
> Planning time: 0.217 ms
>
> Execution time: 3400.656 ms
>
>
>
> With Oracle for a query like this since the index is on the 3 columns
> matching the WHERE clause and the ORDER BY clause is in the 4th position
> Oracle would be able to scan that index and as soon as it finds the first
> matching 50 rows. But as you can see above Postgres is finding 2,634,718
> matching rows for the WHERE clause , sorts them, and then returns the first
> 50 rows.
>
>
>
> I was questioning if the result IN clause was causing the issue so I ran
> the query with result = and see the same results:
>
>
>
> explain (analyze, verbose, costs, buffers, timing, summary, hashes)
>
> select * from workflow_execution
>
> where workflow_id = 14560 and
>
> status = 'COMPLETED' and
>
> result = 'SUCCEEDED'
>
> order by completed_datetime desc limit 50;
>
>
>
> Limit (cost=54268.09..54268.22 rows=50 width=1676) (actual
> time=3372.453..3372.467 rows=50 loops=1)
>
> " Output: execution_id, state_machine_id, workflow_id, started_datetime,
> completed_datetime, status, execution_context_s3_arn, ol_version,
> created_datetime, updated_datetime, deleted_millis, acquisition_channel_id,
> correlation_id, result, state_machine_execution_arn, created_by_id,
> updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message,
> acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id,
> notification_trigger_workflow_id, acquired_object_name, subscription_guid"
>
> Buffers: shared hit=140313
>
> -> Sort (cost=54268.09..54304.46 rows=14547 width=1676) (actual
> time=3372.452..3372.460 rows=50 loops=1)
>
> " Output: execution_id, state_machine_id, workflow_id,
> started_datetime, completed_datetime, status, execution_context_s3_arn,
> ol_version, created_datetime, updated_datetime, deleted_millis,
> acquisition_channel_id, correlation_id, result,
> state_machine_execution_arn, created_by_id, updated_by_id,
> acquired_gcs_s3_object, sqs_trigger_id, trigger_message,
> acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id,
> notification_trigger_workflow_id, acquired_object_name, subscription_guid"
>
> Sort Key: workflow_execution.completed_datetime DESC
>
> Sort Method: top-N heapsort Memory: 125kB
>
> Buffers: shared hit=140313
>
> -> Index Scan using workflow_execution_initial_ui_tabs on
> workflow.workflow_execution (cost=0.69..53784.85 rows=14547 width=1676)
> (actual time=0.034..2238.867 rows=2616747 loops=1)
>
> " Output: execution_id, state_machine_id, workflow_id,
> started_datetime, completed_datetime, status, execution_context_s3_arn,
> ol_version, created_datetime, updated_datetime, deleted_millis,
> acquisition_channel_id, correlation_id, result,
> state_machine_execution_arn, created_by_id, updated_by_id,
> acquired_gcs_s3_object, sqs_trigger_id, trigger_message,
> acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id,
> notification_trigger_workflow_id, acquired_object_name, subscription_guid"
>
> Index Cond: ((workflow_execution.workflow_id = 14560) AND
> ((workflow_execution.status)::text = 'COMPLETED'::text) AND
> ((workflow_execution.result)::text = 'SUCCEEDED'::text))
>
> Buffers: shared hit=140313
>
> Planning time: 0.264 ms
>
> Execution time: 3372.511 ms
>
>
>
>
>
> Is Postgres unable to optimize the query similar to Oracle? Is it
> possible this is possible but we are running on too old of a version?
>
>
>
> Thanks in advance for any input.
>
>
>
> Steve
>
>
>
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website:
> https://www.thomsonreuters.com/en/resources/disclosures.html
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tomas Pospisek 2022-06-22 20:18:08 Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)
Previous Message Dirschel, Steve 2022-06-22 19:39:33 Tuning a query with ORDER BY and LIMIT