Re: Best Strategy for Large Number of Images

From: Imre Samu <pella(dot)samu(at)gmail(dot)com>
To: Estevan Rech <softrech(at)gmail(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Best Strategy for Large Number of Images
Date: 2021-12-20 11:32:49
Message-ID: CAJnEWw=SR8oKyEqKZo2q=R8_27jzL-mMG06iFQy=u3c_pqC_gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> ... I have about 2 million images ...
> folder structure

The "Who's On First" gazetteer with ~ 26M geojson records - using 3-number
chunks subfolder structure.

"Given a Who's On First ID its (relative) URI can be derived by splitting
the ID in to 3-number chunks representing nested subdirectories, followed
by filename consisting of the ID followed by .geojson. For example the
ID for Montréal is 101736545 which becomes: 101/736/545/101736545.geojson"
https://whosonfirst.org/docs/uris/

it is working .. but this is also not optimal

"As of this writing it remains clear that this approach (lots of tiny files
parented by lots of nested directories) can be problematic. We may be
forced to choose another approach, like fewer subdirectories but nothing
has been decided and anything we do will be backwards compatible." ( from
https://whosonfirst.org/data/principles/ )

Now the structure have been migrated to per-country repositories (
https://whosonfirst.org/blog/2019/05/09/changes/ )
so the US structure is:
https://github.com/whosonfirst-data/whosonfirst-data-admin-us/tree/master/data
or
https://github.com/whosonfirst-data/whosonfirst-data-admin-us/blob/master/data/907/132/693/907132693.geojson

maybe you can adopt some ideas.
imho: with 3-number chunks representing nested subdirectories - you can
choose more file systems / hosting solutions ..

regards,
Imre

Estevan Rech <softrech(at)gmail(dot)com> ezt írta (időpont: 2021. dec. 20., H,
11:30):

> How is this folder structure like 10,000 folders? and the backup of it,
> how long does it take?
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message iulian dragos 2021-12-20 12:30:39 How to reduce query planning time (10s)
Previous Message Andreas Joseph Krogh 2021-12-20 10:44:57 Re: Best Strategy for Large Number of Images