Re: unorthodox use of PG for a customer

From: Olivier Gautherot <olivier(at)gautherot(dot)net>
To: David Gauthier <davegauthierpg(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: unorthodox use of PG for a customer
Date: 2018-08-27 14:17:29
Message-ID: CAJ7S9TUDk2AU12tVQq=jzAxu6X6kz1JhanwYr1oid4ArbtrByQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Dave, lots of nice inputs but let's try to summarize your user's needs.

On Fri, Aug 24, 2018 at 3:18 PM, David Gauthier <davegauthierpg(at)gmail(dot)com>
wrote:

> [...]
>
> He's writing a script/program that runs on a workstation and needs to
> write data to a DB. This process also sends work to a batch system on a
> server farm external to the workstation that will create multiple, parallel
> jobs/processes that also have to write to the DB as well. The workstation
> may have many of these jobs running at the same time. And there are 58
> workstation which all have/use locally mounted disks for this work.
>

Do the workstations work independently or do they need to share
information? If they share information, does it need to be ACID or is
"eventually consistent" enough? Also, what is the size of the database? A
local database is viable as long as it does not overload the workstation
but it you have many instances of the job per workstation and 58 of them, I
would rely on an optimized central DB.

Are the jobs mainly INSERT or a mix of INSERT and SELECT/UPDATE?

Things to consider: amount of RAM required, CPU, disk I/O... The cost
factor should be considered.

At first blush, this is easy. Just create a DB on a server and have all
> those clients work with it. But he's also adamant about having the DB on
> the same server(s) that ran the script AND on the locally mounted disk. He
> said he doesn't want the overhead, dependencies and worries of anything
> like an external DB with a DBA, etc... . He also wants this to be fast.
>

If he insists on having a local copy of the database, suggest PG10 with
logical replication - ideally the bidirectional replication. Once he
realizes that every INSERT causes 57 replications and this will most likely
kill the network, he may reconsider the requirement... Queries to a central
DB is likely to cause less stress and will be easier to handle.

Now, if speed is critical, you may also consider an in-memory DB like
Redis. There are schemes running in multi-master.

My first thought was SQLite. Apparently, they now have some sort of
> multiple, concurrent write ability. But there's no way those batch jobs on
> remote machines are going to be able to get at the locally mounted disk on
> the workstation. So I dismissed that idea. Then I thought about having 58
> PG installs, one per workstation, each serving all the jobs pertaining to
> that workstation. That could work. But 58 DB instances ? If he didn't
> like the ideal of one DBA, 58 can't be good. Still, the DB would be on the
> workstation which seems to be what he wants.
>

A distributed database is likely to cause failures at some point if not
handled properly - and it sounds like there won't be any DBA at all. I
would be extremely cautious in this case.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-08-27 14:22:21 Re: archive items not in correct section order
Previous Message Adrian Klaver 2018-08-27 14:12:09 Re: pg_sample