Re: slow query with inline function on AWS RDS with RDS 24x large

From: Ayub Khan <ayub(dot)hp(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: slow query with inline function on AWS RDS with RDS 24x large
Date: 2021-06-04 08:51:13
Message-ID: CAHdeyE+BJUwXiyk7Xb5bdbsbke6BMUdH8bEeX6GTtpzmPhXkqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

You are right, I dropped BRIN and created btree and the performance on 0
rows matching criteria table is good, below is the plan with BTREE. I will
test by inserting lot of data.

Hash Join (cost=50186.91..3765911.10 rows=5397411 width=291) (actual
time=1.501..1.504 rows=0 loops=1)
Hash Cond: (b.restaurant_id = r.restaurant_id)
-> Hash Left Join (cost=49845.88..2078197.48 rows=5397411 width=216)
(actual time=0.079..0.081 rows=0 loops=1)
Hash Cond: (ro.order_id = oom.order_id)
-> Hash Join (cost=933.18..2007856.35 rows=5397411 width=209)
(actual time=0.078..0.080 rows=0 loops=1)
Hash Cond: (b.city_id = c.city_id)
-> Hash Join (cost=930.92..1956181.11 rows=19276467
width=188) (actual time=0.048..0.050 rows=0 loops=1)
Hash Cond: (ro.branch_id = b.branch_id)
-> Nested Loop (cost=0.56..1904639.80 rows=19276467
width=108) (actual time=0.048..0.048 rows=0 loops=1)
-> Function Scan on start_date (cost=0.00..0.01
rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=1)
-> Index Scan using rest_ord_date_brin on
restaurant_order ro (cost=0.56..1711875.12 rows=19276467 width=108)
(actual time=0.042..0.042 rows=0 loops=1)
Index Cond: ((date_time >=
start_date.start_date) AND (date_time <= '2021-06-04
08:48:45.377833+00'::timestamp with time zone))
-> Hash (cost=668.49..668.49 rows=20949 width=88)
(never executed)
-> Seq Scan on branch b (cost=0.00..668.49
rows=20949 width=88) (never executed)
-> Hash (cost=1.56..1.56 rows=56 width=29) (actual
time=0.026..0.027 rows=56 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on city c (cost=0.00..1.56 rows=56
width=29) (actual time=0.009..0.016 rows=56 loops=1)
-> Hash (cost=33000.09..33000.09 rows=1273009 width=13) (never
executed)
-> Seq Scan on order_offer_map oom (cost=0.00..33000.09
rows=1273009 width=13) (never executed)
-> Hash (cost=233.42..233.42 rows=8609 width=8) (actual
time=1.403..1.403 rows=8609 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 465kB
-> Index Only Scan using "restaurant_idx$$_274b003d" on restaurant
r (cost=0.29..233.42 rows=8609 width=8) (actual time=0.007..0.634
rows=8609 loops=1)
Heap Fetches: 0
Planning Time: 1.352 ms
Execution Time: 1.571 ms

On Fri, Jun 4, 2021 at 11:41 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
wrote:

> Hi
>
>
> pá 4. 6. 2021 v 10:32 odesílatel Ayub Khan <ayub(dot)hp(at)gmail(dot)com> napsal:
>
>> BRIN index is only on the date_time column, I even tried with btree index
>> with no performance gains.
>>
>
> -> Bitmap Heap Scan on restaurant_order ro (cost=5427.94..3353966.60 rows=19275986 width=108) (actual time=1036.793..1036.793 rows=0 loops=1)
> Recheck Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
> Rows Removed by Index Recheck: 5039976
> Heap Blocks: lossy=275230
>
> When the most rows are removed in recheck, then the effectivity of the index is not good
>
> Pavel
>
>
>
>
>>
>> On Fri, Jun 4, 2021 at 11:23 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
>> wrote:
>>
>>>
>>>
>>> pá 4. 6. 2021 v 10:07 odesílatel Ayub Khan <ayub(dot)hp(at)gmail(dot)com> napsal:
>>>
>>>>
>>>> below query is slow even with no data
>>>>
>>>>
>>>> explain ANALYZE
>>>>
>>>> WITH business AS( SELECT * FROM get_businessday_utc_f() start_date)
>>>> SELECT ro.order_id,
>>>> ro.date_time,
>>>> round(ro.order_amount, 2) AS order_amount,
>>>> b.branch_id,
>>>> b.branch_name,
>>>> st_x(b.location) AS from_x,
>>>> st_y(b.location) AS from_y,
>>>> b.user_id AS branch_user_id,
>>>> b.contact_info,
>>>> r.restaurant_id,
>>>> c.city_id,
>>>> c.city_name,
>>>> c.city_name_ar,
>>>> st_linefromtext(((((((('LINESTRING('::text || st_x(b.location)) || ' '::text) || st_y(b.location)) || ','::text) || st_x(ro.location_geometry)) || ' '::text) || st_y(ro.location_geometry)) || ')'::text, 28355) AS from_to,
>>>> to_char(ro.date_time, 'HH24:MI'::text) AS order_time,
>>>> ro.customer_comment,
>>>> 'N'::text AS is_new_customer,
>>>> ro.picked_up_time,
>>>> ro.driver_assigned_date_time,
>>>> oom.offer_amount,
>>>> oom.offer_type_code AS offer_type,
>>>> ro.uk_vat
>>>> FROM business, restaurant_order ro
>>>>
>>>> JOIN branch b ON b.branch_id = ro.branch_id
>>>> JOIN restaurant r ON r.restaurant_id = b.restaurant_id
>>>> JOIN city c ON c.city_id = b.city_id
>>>> LEFT JOIN order_offer_map oom using (order_id)
>>>> WHERE ro.date_time >= business.start_date AND ro.date_time<= f_now_immutable_with_tz();
>>>>
>>>>
>>>>
>>>> Hash Left Join (cost=55497.32..5417639.59 rows=5397276 width=291) (actual time=1056.926..1056.934 rows=0 loops=1)
>>>> Hash Cond: (ro.order_id = oom.order_id)
>>>> -> Hash Join (cost=6584.61..3674143.44 rows=5397276 width=209) (actual time=1056.926..1056.932 rows=0 loops=1)
>>>> Hash Cond: (ro.branch_id = b.branch_id)
>>>> -> Nested Loop (cost=5427.94..3546726.47 rows=19275986 width=108) (actual time=1036.809..1036.810 rows=0 loops=1)
>>>> -> Function Scan on start_date (cost=0.00..0.01 rows=1 width=8) (actual time=0.006..0.008 rows=1 loops=1)
>>>> -> Bitmap Heap Scan on restaurant_order ro (cost=5427.94..3353966.60 rows=19275986 width=108) (actual time=1036.793..1036.793 rows=0 loops=1)
>>>> Recheck Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
>>>> Rows Removed by Index Recheck: 5039976
>>>> Heap Blocks: lossy=275230
>>>> -> Bitmap Index Scan on rest_ord_date_brin (cost=0.00..608.94 rows=19359111 width=0) (actual time=14.037..14.038 rows=2917120 loops=1)
>>>> Index Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
>>>> -> Hash (cost=1083.35..1083.35 rows=5866 width=109) (actual time=20.106..20.109 rows=20949 loops=1)
>>>> Buckets: 32768 (originally 8192) Batches: 1 (originally 1) Memory Usage: 3112kB
>>>> -> Hash Join (cost=343.29..1083.35 rows=5866 width=109) (actual time=1.620..14.539 rows=20949 loops=1)
>>>> Hash Cond: (b.restaurant_id = r.restaurant_id)
>>>> -> Hash Join (cost=2.26..726.91 rows=5866 width=109) (actual time=0.029..8.597 rows=20949 loops=1)
>>>> Hash Cond: (b.city_id = c.city_id)
>>>> -> Seq Scan on branch b (cost=0.00..668.49 rows=20949 width=88) (actual time=0.004..1.609 rows=20949 loops=1)
>>>> -> Hash (cost=1.56..1.56 rows=56 width=29) (actual time=0.020..0.021 rows=56 loops=1)
>>>> Buckets: 1024 Batches: 1 Memory Usage: 12kB
>>>> -> Seq Scan on city c (cost=0.00..1.56 rows=56 width=29) (actual time=0.004..0.010 rows=56 loops=1)
>>>> -> Hash (cost=233.42..233.42 rows=8609 width=8) (actual time=1.575..1.575 rows=8609 loops=1)
>>>> Buckets: 16384 Batches: 1 Memory Usage: 465kB
>>>> -> Index Only Scan using "restaurant_idx$$_274b003d" on restaurant r (cost=0.29..233.42 rows=8609 width=8) (actual time=0.006..0.684 rows=8609 loops=1)
>>>> Heap Fetches: 0
>>>> -> Hash (cost=33000.09..33000.09 rows=1273009 width=13) (never executed)
>>>> -> Seq Scan on order_offer_map oom (cost=0.00..33000.09 rows=1273009 width=13) (never executed)
>>>> Planning Time: 1.180 ms
>>>> Execution Time: 1057.535 ms
>>>>
>>>> could some one explain why it is slow, if I insert 50k records the execution time reaches 20 seconds
>>>>
>>>>
>>> -> Bitmap Heap Scan on restaurant_order ro
>>> (cost=5427.94..3353966.60 rows=19275986 width=108) (actual
>>> time=1036.793..1036.793 rows=0 loops=1)
>>> Recheck Cond: ((date_time >= start_date.start_date)
>>> AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time
>>> zone))
>>> Rows Removed by Index Recheck: 5039976
>>> Heap Blocks: lossy=275230
>>> -> Bitmap Index Scan on rest_ord_date_brin
>>> (cost=0.00..608.94 rows=19359111 width=0) (actual time=14.037..14.038
>>> rows=2917120 loops=1)
>>> Index Cond: ((date_time >=
>>> start_date.start_date) AND (date_time <= '2021-06-04
>>> 08:05:32.784199+00'::timestamp with time zone))
>>>
>>> Looks so the BRIN index is not in good condition. Maybe you need
>>> reindex, maybe BRIN index is not good format for your data.
>>>
>>> There are lot of data - few millions of rows
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
>>
>>
>>

--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Taras Savchuk 2021-06-04 11:53:12 PgSQL 12 on WinSrv ~3x faster than on Linux
Previous Message Pavel Stehule 2021-06-04 08:40:30 Re: slow query with inline function on AWS RDS with RDS 24x large