Re: Separate 100 M spatial data in 100 tables VS one big table

From: Marc Millas <marc(dot)millas(at)mokadb(dot)com>
To: kimaidou <kimaidou(at)gmail(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Separate 100 M spatial data in 100 tables VS one big table
Date: 2024-03-05 12:47:03
Message-ID: CADX_1abPkN=R5w48r_OsX_3b8AA9Dba=jorCG=g249WnxC0HBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Salut Kimaidou,
why not a partitioned table with the department a partitioning Key ?
each year just detach the obsolete data, department by
department (ie.detach the partition, almost instantaneous) and drop or keep
the obsolete data.
No delete, quite easy to maintain. For each global index, Postgres will
create one index per each partition. and detach them when you detach a
department partition.
so when importing, first create an appropriate table, load the data, and
attach it to the main partitioned table. Postgres will
automatically recreate all necessary indexes.

Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com

On Tue, Mar 5, 2024 at 8:45 AM kimaidou <kimaidou(at)gmail(dot)com> wrote:

> Hi list,
>
> In France, the total number of cadastral parcels is around 10 000 000
>
> The data can be heavy, because each parcel stores a geometry (PostGIS
> geometry data type inside a geom column).
> Indexes must be created to increase performance of day-to-day requests:
>
> * GIST index on geom for spatial filtering and intersection with other
> geometries (other tables)
> * Primary key and probably another unique code to index
> * one index on the "department" field. There are around 100 "departments"
> (admin boundaries) in France, and the parcels are homogeneously distributed
> (~ 1M parcel per "department")
>
> The initial import of this data is made one department by one department
> (the data source is distributed by department by French authorities). And
> each year, data must be deleted and reimported (data change each year), and
> this is also often done one department at a time.
>
> * Sometimes requests are made with a department filter (for example WHERE
> department IN ('2A', '34', '30', '48') )
> * Sometimes other client database clients must be able to get data from
> the whole dataset ( for example get the parcels for a list of known IDs)
>
> I would like to question the list about the following 2 strategies to
> maintain such data:
>
> 1/ Put the whole dataset into one big table
> 2/ Create one table per department, and create a VIEW with 100 UNION ALL
> to gather all the parcels
>
> 1/ Seems simpler for the database clients, but it seems to me this can be
> a pain to maintain. For example, each time we will need to replace last
> year data for one department with the upcoming new data, we will need to
> delete 1M lines, reimport the new 1M lines and VACUUM FULL to regain space.
> Indexes will be huge, and I can suffer questions like :
>
> https://www.postgresql.org/message-id/CAMKXKO7yXmduSs4zzMfdRaPUn2kOKtQ6KMnDe1GxEr56Vr8hxA%40mail.gmail.com
> I often need to use pg_repack to regain spaces on this kind of table.
> VACUUM FULL cannot be used because it locks the table, and it takes times
> (!)
>
> 2/ Seems more kiss, but only if queries on the UNION VIEW will be able to
> use the tables indexes (geom, department) and perform as well as the big
> table.
>
>
> Any hint appreciated !
> Regards
>
> Kimaidou
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Chema 2024-03-05 14:00:00 Re: Optimizing count(), but Explain estimates wildly off
Previous Message kimaidou 2024-03-05 07:44:54 Separate 100 M spatial data in 100 tables VS one big table