Re: query planner not using index, instead using squential scan

From: Ayub Khan <ayub(dot)hp(at)gmail(dot)com>
To: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: query planner not using index, instead using squential scan
Date: 2021-06-05 21:14:18
Message-ID: CAHdeyEJXzj870Zy_5BpRYWUsp==Aky-ZfXePSe3YkwYVwunBYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

by setting enable_sequence_scan=OFF, the query execution time seems to
have slowed down even though the index is being used for left join of
order_offer_map

Hash Left Join (cost=72639.74..8176118.25 rows=19276467 width=293) (actual
time=858.853..3166.994 rows=230222 loops=1)
Hash Cond: (ro.order_id = oom.order_id)
-> Hash Join (cost=1947.33..2053190.95 rows=19276467 width=211) (actual
time=20.550..462.303 rows=230222 loops=1)
Hash Cond: (b.city_id = c.city_id)
-> Hash Join (cost=1937.65..1998751.06 rows=19276467 width=190)
(actual time=20.523..399.979 rows=230222 loops=1)
Hash Cond: (b.restaurant_id = r.restaurant_id)
-> Hash Join (cost=1596.61..1947784.40 rows=19276467
width=190) (actual time=19.047..339.984 rows=230222 loops=1)
Hash Cond: (ro.branch_id = b.branch_id)
-> Nested Loop (cost=0.56..1895577.38 rows=19276467
width=108) (actual time=0.032..240.278 rows=230222 loops=1)
-> Function Scan on start_date (cost=0.00..0.01
rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
-> Index Scan using "idx$$_00010001" on
restaurant_order ro (cost=0.56..1702812.70 rows=19276467 width=108)
(actual time=0.025..117.525 rows=230222 loops=1)
Index Cond: ((date_time >=
start_date.start_date) AND (date_time <= '2021-06-05
21:09:50.161463+00'::timestamp with time zone))
-> Hash (cost=1334.19..1334.19 rows=20949 width=90)
(actual time=18.969..18.969 rows=20949 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 2758kB
-> Index Scan using "branch_idx$$_274b0038" on
branch b (cost=0.29..1334.19 rows=20949 width=90) (actual
time=0.008..14.371 rows=20949 loops=1)
-> Hash (cost=233.42..233.42 rows=8609 width=8) (actual
time=1.450..1.451 rows=8609 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 465kB
-> Index Only Scan using "restaurant_idx$$_274b003d"
on restaurant r (cost=0.29..233.42 rows=8609 width=8) (actual
time=0.011..0.660 rows=8609 loops=1)
Heap Fetches: 0
-> Hash (cost=8.98..8.98 rows=56 width=29) (actual
time=0.021..0.021 rows=56 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Index Only Scan using "city_idx$$_274b0022" on city c
(cost=0.14..8.98 rows=56 width=29) (actual time=0.004..0.010 rows=56
loops=1)
Heap Fetches: 0
-> Hash (cost=54779.81..54779.81 rows=1273009 width=15) (actual
time=836.132..836.133 rows=1273009 loops=1)
Buckets: 2097152 Batches: 1 Memory Usage: 81629kB
-> Index Scan Backward using order_offer_map_order_id on
order_offer_map oom (cost=0.43..54779.81 rows=1273009 width=15) (actual
time=0.010..578.226 rows=1273009 loops=1)
Planning Time: 1.229 ms
Execution Time: 3183.248 ms

On Sat, Jun 5, 2021 at 10:52 PM Vijaykumar Jain <
vijaykumarjain(dot)github(at)gmail(dot)com> wrote:

> thanks Tom.
>
> I was trying to simulate some scenarios to be able to explain how the plan
> would change with/without
> *Rows Removed by Filter: 73969 * -- by using a different/correct index.
>
> postgres=# \d t
> Table "public.t"
> Column | Type | Collation | Nullable | Default
> ------------+-----------------------------+-----------+----------+---------
> id | integer | | not null |
> created_on | timestamp without time zone | | |
> col1 | text | | |
> Indexes:
> "t_pkey" PRIMARY KEY, btree (id)
> "t_created_on_idx" btree (created_on) WHERE col1 = 'a'::text ---
> useless index as all rows have col1 = 'a', but to attempt lossy case
> "t_created_on_idx1" btree (created_on)
> Referenced by:
> TABLE "t1" CONSTRAINT "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)
>
> postgres=# \d t1
> Table "public.t1"
> Column | Type | Collation | Nullable | Default
> --------+---------+-----------+----------+---------
> t1_id | integer | | not null |
> id | integer | | |
> col2 | text | | |
> Indexes:
> "t1_pkey" PRIMARY KEY, btree (t1_id)
> Foreign-key constraints:
> "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)
>
>
>
> postgres=# update t set col1 = 'a';
> UPDATE 1000
>
> postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id)
> where created_on = '2021-06-01 12:48:45.141123';
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------
> Hash Join (cost=37.01..39.28 rows=1 width=4) (actual time=0.124..0.125
> rows=0 loops=1)
> Hash Cond: (t1.id = t.id)
> -> Seq Scan on t1 (cost=0.00..2.00 rows=100 width=4) (actual
> time=0.004..0.008 rows=100 loops=1)
> -> Hash (cost=37.00..37.00 rows=1 width=4) (actual time=0.109..0.109
> rows=1 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 9kB
> -> Seq Scan on t (cost=0.00..37.00 rows=1 width=4) (actual
> time=0.058..0.107 rows=1 loops=1)
> Filter: (created_on = '2021-06-01
> 12:48:45.141123'::timestamp without time zone)
> *Rows Removed by Filter: 999 --- as no useful
> index, t_created_on_idx will fetch all pages and then remove rows from
> them, expensive*
> Planning Time: 0.111 ms
> Execution Time: 0.162 ms
> (10 rows)
>
>
> postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id)
> where created_on = '2021-06-01 12:48:45.141123';
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------
> Hash Join (cost=8.32..33.47 rows=1 width=4) (actual time=0.025..0.026
> rows=0 loops=1)
> Hash Cond: (t1.id = t.id)
> -> Seq Scan on t1 (cost=0.00..22.00 rows=1200 width=4) (actual
> time=0.009..0.009 rows=1 loops=1)
> -> Hash (cost=8.31..8.31 rows=1 width=4) (actual time=0.014..0.014
> rows=0 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 8kB
> -> Index Scan using t_created_on_idx1 on t (cost=0.29..8.31
> rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1)
> Index Cond: (created_on = '2021-06-01
> 12:48:45.141123'::timestamp without time zone) -- *exact match using
> btree index, *
> Planning Time: 0.255 ms
> Execution Time: 0.071 ms
> (9 rows)
>
>
> but from Ayub's plan, the number of rows fetched are a lot, but is also
> removing rows post index scan.
> if that can be improved with a btree index that does not filter unwanted
> rows, the run may be faster ?
> but i guess if there are 156k rows, planner would a have found a win in
> seq scan.
>
> Ayub,
> just for the sake of understanding,
>
> can you run the query using
>
> postgres=# set enable_seqscan TO 0;
> SET
> postgres=# -- explain analyze <run the query>
>
> postgres=# set enable_seqscan TO 1;
> SET
>
>
> On Sun, 6 Jun 2021 at 00:59, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Ayub Khan <ayub(dot)hp(at)gmail(dot)com> writes:
>> > could someone clarify why the LEFT JOIN order_offer_map oom using
>> > (order_id) in the below query is using sequential scan instead of
>> > using index on order_id which is defined in order_offer_map table.
>>
>> Probably because it estimates the hash join to restaurant_order is
>> faster than a nestloop join would be. I think it's likely right.
>> You'd need very optimistic assumptions about the cost of an
>> individual index probe into order_offer_map to conclude that 156K
>> of them would be faster than the 476ms that are being spent here
>> to read order_offer_map and join it to the result of the
>> indexscan on restaurant_order.
>>
>> If, indeed, that *is* faster on your hardware, you might want
>> to dial down random_page_cost to get more-relevant estimates.
>>
>> regards, tom lane
>>
>>
>>
>
> --
> Thanks,
> Vijay
> Mumbai, India
>

--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vijaykumar Jain 2021-06-05 21:18:10 Re: query planner not using index, instead using squential scan
Previous Message Vijaykumar Jain 2021-06-05 19:52:06 Re: query planner not using index, instead using squential scan