Re: query planner not using index, instead using squential scan

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: Ayub Khan <ayub(dot)hp(at)gmail(dot)com>
Cc: Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: query planner not using index, instead using squential scan
Date: 2021-06-05 21:18:10
Message-ID: CAM+6J94tca-6vfONLrvzroUuGeDS7LaHXYfw4zaTk9PJ96_XKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Yes, slowdown was expected :)

I was just interested in cost estimates.
Also did you try to set random_page_cost to 1 if your storage is not hdd.

On Sun, 6 Jun 2021 at 2:44 AM Ayub Khan <ayub(dot)hp(at)gmail(dot)com> wrote:

>
> by setting enable_sequence_scan=OFF, the query execution time seems to
> have slowed down even though the index is being used for left join of
> order_offer_map
>
> Hash Left Join (cost=72639.74..8176118.25 rows=19276467 width=293)
> (actual time=858.853..3166.994 rows=230222 loops=1)
> Hash Cond: (ro.order_id = oom.order_id)
> -> Hash Join (cost=1947.33..2053190.95 rows=19276467 width=211)
> (actual time=20.550..462.303 rows=230222 loops=1)
> Hash Cond: (b.city_id = c.city_id)
> -> Hash Join (cost=1937.65..1998751.06 rows=19276467 width=190)
> (actual time=20.523..399.979 rows=230222 loops=1)
> Hash Cond: (b.restaurant_id = r.restaurant_id)
> -> Hash Join (cost=1596.61..1947784.40 rows=19276467
> width=190) (actual time=19.047..339.984 rows=230222 loops=1)
> Hash Cond: (ro.branch_id = b.branch_id)
> -> Nested Loop (cost=0.56..1895577.38 rows=19276467
> width=108) (actual time=0.032..240.278 rows=230222 loops=1)
> -> Function Scan on start_date
> (cost=0.00..0.01 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
> -> Index Scan using "idx$$_00010001" on
> restaurant_order ro (cost=0.56..1702812.70 rows=19276467 width=108)
> (actual time=0.025..117.525 rows=230222 loops=1)
> Index Cond: ((date_time >=
> start_date.start_date) AND (date_time <= '2021-06-05
> 21:09:50.161463+00'::timestamp with time zone))
> -> Hash (cost=1334.19..1334.19 rows=20949 width=90)
> (actual time=18.969..18.969 rows=20949 loops=1)
> Buckets: 32768 Batches: 1 Memory Usage: 2758kB
> -> Index Scan using "branch_idx$$_274b0038" on
> branch b (cost=0.29..1334.19 rows=20949 width=90) (actual
> time=0.008..14.371 rows=20949 loops=1)
> -> Hash (cost=233.42..233.42 rows=8609 width=8) (actual
> time=1.450..1.451 rows=8609 loops=1)
> Buckets: 16384 Batches: 1 Memory Usage: 465kB
> -> Index Only Scan using "restaurant_idx$$_274b003d"
> on restaurant r (cost=0.29..233.42 rows=8609 width=8) (actual
> time=0.011..0.660 rows=8609 loops=1)
> Heap Fetches: 0
> -> Hash (cost=8.98..8.98 rows=56 width=29) (actual
> time=0.021..0.021 rows=56 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 12kB
> -> Index Only Scan using "city_idx$$_274b0022" on city c
> (cost=0.14..8.98 rows=56 width=29) (actual time=0.004..0.010 rows=56
> loops=1)
> Heap Fetches: 0
> -> Hash (cost=54779.81..54779.81 rows=1273009 width=15) (actual
> time=836.132..836.133 rows=1273009 loops=1)
> Buckets: 2097152 Batches: 1 Memory Usage: 81629kB
> -> Index Scan Backward using order_offer_map_order_id on
> order_offer_map oom (cost=0.43..54779.81 rows=1273009 width=15) (actual
> time=0.010..578.226 rows=1273009 loops=1)
> Planning Time: 1.229 ms
> Execution Time: 3183.248 ms
>
> On Sat, Jun 5, 2021 at 10:52 PM Vijaykumar Jain <
> vijaykumarjain(dot)github(at)gmail(dot)com> wrote:
>
>> thanks Tom.
>>
>> I was trying to simulate some scenarios to be able to explain how the
>> plan would change with/without
>> *Rows Removed by Filter: 73969 * -- by using a different/correct index.
>>
>> postgres=# \d t
>> Table "public.t"
>> Column | Type | Collation | Nullable | Default
>>
>> ------------+-----------------------------+-----------+----------+---------
>> id | integer | | not null |
>> created_on | timestamp without time zone | | |
>> col1 | text | | |
>> Indexes:
>> "t_pkey" PRIMARY KEY, btree (id)
>> "t_created_on_idx" btree (created_on) WHERE col1 = 'a'::text ---
>> useless index as all rows have col1 = 'a', but to attempt lossy case
>> "t_created_on_idx1" btree (created_on)
>> Referenced by:
>> TABLE "t1" CONSTRAINT "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)
>>
>> postgres=# \d t1
>> Table "public.t1"
>> Column | Type | Collation | Nullable | Default
>> --------+---------+-----------+----------+---------
>> t1_id | integer | | not null |
>> id | integer | | |
>> col2 | text | | |
>> Indexes:
>> "t1_pkey" PRIMARY KEY, btree (t1_id)
>> Foreign-key constraints:
>> "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)
>>
>>
>>
>> postgres=# update t set col1 = 'a';
>> UPDATE 1000
>>
>> postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id)
>> where created_on = '2021-06-01 12:48:45.141123';
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------------------------------
>> Hash Join (cost=37.01..39.28 rows=1 width=4) (actual time=0.124..0.125
>> rows=0 loops=1)
>> Hash Cond: (t1.id = t.id)
>> -> Seq Scan on t1 (cost=0.00..2.00 rows=100 width=4) (actual
>> time=0.004..0.008 rows=100 loops=1)
>> -> Hash (cost=37.00..37.00 rows=1 width=4) (actual time=0.109..0.109
>> rows=1 loops=1)
>> Buckets: 1024 Batches: 1 Memory Usage: 9kB
>> -> Seq Scan on t (cost=0.00..37.00 rows=1 width=4) (actual
>> time=0.058..0.107 rows=1 loops=1)
>> Filter: (created_on = '2021-06-01
>> 12:48:45.141123'::timestamp without time zone)
>> *Rows Removed by Filter: 999 --- as no useful
>> index, t_created_on_idx will fetch all pages and then remove rows from
>> them, expensive*
>> Planning Time: 0.111 ms
>> Execution Time: 0.162 ms
>> (10 rows)
>>
>>
>> postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id)
>> where created_on = '2021-06-01 12:48:45.141123';
>> QUERY PLAN
>>
>> ---------------------------------------------------------------------------------------------------------------------------------
>> Hash Join (cost=8.32..33.47 rows=1 width=4) (actual time=0.025..0.026
>> rows=0 loops=1)
>> Hash Cond: (t1.id = t.id)
>> -> Seq Scan on t1 (cost=0.00..22.00 rows=1200 width=4) (actual
>> time=0.009..0.009 rows=1 loops=1)
>> -> Hash (cost=8.31..8.31 rows=1 width=4) (actual time=0.014..0.014
>> rows=0 loops=1)
>> Buckets: 1024 Batches: 1 Memory Usage: 8kB
>> -> Index Scan using t_created_on_idx1 on t (cost=0.29..8.31
>> rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1)
>> Index Cond: (created_on = '2021-06-01
>> 12:48:45.141123'::timestamp without time zone) -- *exact match using
>> btree index, *
>> Planning Time: 0.255 ms
>> Execution Time: 0.071 ms
>> (9 rows)
>>
>>
>> but from Ayub's plan, the number of rows fetched are a lot, but is also
>> removing rows post index scan.
>> if that can be improved with a btree index that does not filter unwanted
>> rows, the run may be faster ?
>> but i guess if there are 156k rows, planner would a have found a win in
>> seq scan.
>>
>> Ayub,
>> just for the sake of understanding,
>>
>> can you run the query using
>>
>> postgres=# set enable_seqscan TO 0;
>> SET
>> postgres=# -- explain analyze <run the query>
>>
>> postgres=# set enable_seqscan TO 1;
>> SET
>>
>>
>> On Sun, 6 Jun 2021 at 00:59, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>>> Ayub Khan <ayub(dot)hp(at)gmail(dot)com> writes:
>>> > could someone clarify why the LEFT JOIN order_offer_map oom using
>>> > (order_id) in the below query is using sequential scan instead of
>>> > using index on order_id which is defined in order_offer_map table.
>>>
>>> Probably because it estimates the hash join to restaurant_order is
>>> faster than a nestloop join would be. I think it's likely right.
>>> You'd need very optimistic assumptions about the cost of an
>>> individual index probe into order_offer_map to conclude that 156K
>>> of them would be faster than the 476ms that are being spent here
>>> to read order_offer_map and join it to the result of the
>>> indexscan on restaurant_order.
>>>
>>> If, indeed, that *is* faster on your hardware, you might want
>>> to dial down random_page_cost to get more-relevant estimates.
>>>
>>> regards, tom lane
>>>
>>>
>>>
>>
>> --
>> Thanks,
>> Vijay
>> Mumbai, India
>>
>
>
> --
> --------------------------------------------------------------------
> Sun Certified Enterprise Architect 1.5
> Sun Certified Java Programmer 1.4
> Microsoft Certified Systems Engineer 2000
> http://in.linkedin.com/pub/ayub-khan/a/811/b81
> mobile:+966-502674604
> ----------------------------------------------------------------------
> It is proved that Hard Work and kowledge will get you close but attitude
> will get you there. However, it's the Love
> of God that will put you over the top!!
>
--
Thanks,
Vijay
Mumbai, India

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dean Gibson (DB Administrator) 2021-06-06 23:49:24 Re: AWS forcing PG upgrade from v9.6 a disaster
Previous Message Ayub Khan 2021-06-05 21:14:18 Re: query planner not using index, instead using squential scan