Simple JOIN on heavy table not using expected index

From: kimaidou <kimaidou(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Simple JOIN on heavy table not using expected index
Date: 2024-02-09 14:14:31
Message-ID: CAMKXKO4tUH=nTZ-g09=wng+wSxtkj7K8yePMkfTiOpW4zT4Jdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

I have performance issue for a pretty simple request in a PostgreSQL server
14.10

* Request

SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
;

* Table definition (extract)

Table « private.parcelles »
Colonne | Type | Collationnement |
NULL-able | Par défaut
-----------------------+-----------------------------+-----------------+-----------+------------
id | integer | |
|
geom | geometry(MultiPolygon,2154) | |
|
fid | bigint | |
|
id_parcelle | character varying(14) | |
not null |
insee_col | character varying(5) | |
|
nom_col | character varying | |
|
section | character varying(2) | |
|
numero | character varying(4) | |
|
contenance | bigint | |
|
epci_nom | character varying | |
|
dep | character varying | |
|
dep_nom | character varying | |
|
Index :
"foncier_pkey" PRIMARY KEY, btree (id_parcelle)
"idx_extension_eol_parcelle" btree (extension_eol)
"idx_lien_hubspot_parcelels" btree (lien_hubspot)
"idx_reg_parcelle" btree (reg)
"idx_type_ener_parcelles" btree (type_d_energie)
"parcelles_dep_idx" btree (dep)
"parcelles_id_parcelle_idx" btree (id_parcelle)
"parcelles_inseecol_idx" btree (insee_col)
"parcelles_object_id_idx" btree (hs_object_id)
"parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
"parcelles_synctohubspot_idx" btree (synctohubspot)
"sidx_foncier_geom" gist (geom)

-> First comment, the primary Key should be on id (integer) and not on
id_parcelle (a text code)

* Statistiques

lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname =
'private' AND relname = 'parcelles';
-[ RECORD 1 ]-------+------------------------------
relid | 2364725
schemaname | private
relname | parcelles
seq_scan | 1891
seq_tup_read | 552509679
idx_scan | 19144304
idx_tup_fetch | 38926631
n_tup_ins | 3
n_tup_upd | 3073182
n_tup_del | 0
n_tup_hot_upd | 2996591
n_live_tup | 92876681
n_dead_tup | 1836882
n_mod_since_analyze | 769313
n_ins_since_vacuum | 3
last_vacuum |
last_autovacuum |
last_analyze | 2024-02-08 15:33:14.008286+01
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 1
autoanalyze_count | 0

* Plan :
https://explain.dalibo.com/plan/47391e3g8c2589cf#plan/node/2

It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text
field), even if the corresponding number of lines for this WHERE clause is
a smal subset of the entire data:
approx 6M against 80M in total

Thanks in advance for any hint regarding this cumbersome query.

Regards
Kimaidou

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Burçin Yazıcı 2024-02-09 14:19:54 Re: Simple JOIN on heavy table not using expected index
Previous Message James Pang (chaolpan) 2024-02-06 06:59:11 RE: huge SubtransSLRU and SubtransBuffer wait_event