Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

From: Joshua Drake <jd(at)commandprompt(dot)com>
To: tutiluren(at)tutanota(dot)com
Cc: Discuss List Postgres <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?
Date: 2020-09-21 17:53:08
Message-ID: CAJvJg-T7nwjzZGMdDK_CDGh8FNWcuiiQznXRmUC_HJ-34u5aeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Howdy,

First let me say thanks for the feedback! It is rare that we receive
detailed "user feedback" on these lists so it is good to hear from the
outside world. I am only going to address a few things as others have
addressed the rest.

> 2. pg_dump misinterprets non-ANSI values for the "--exclude-*" options (at
> least the --exclude-table-data one, which is the one I've tested) on
> Windows, resulting in it being impossible to make more "sophisticated"
> backups of PostgreSQL databases; it's either all or nothing. Other
> programs, including my own test scripts and commands, are perfectly able to
> use any Unicode character sent from/through both cmd.exe and PHP CLI, but
> not pg_dump, so the idea that "Windows it at fault" here just doesn't seem
> true. (Although I don't doubt for a second that it often *is* the case...
> Microsoft is not a nice entity in any way.) I spent a lot of time and
> efforts experimenting with and asking about this, but eventually gave up
> and concluded that it was yet another bug in an open source project "only"
> on Windows with no real/pressing interest in fixing it. For me, this means
> that I lose a ton of fresh data every day, or have to make *gigantic*
> backups. (I have several huge "temporary debug log" tables whose data have
> zero long-term value but tons of short-term value.) It makes me feel
> crippled and excluded in an uncomfortable manner.
>
>
I have to agree that pg_dump is largely a step child backup program. It has
consistently been found over the years to be lacking in a number of areas.
Unfortunately, working on pg_dump isn't sexy and it is difficult to get
volunteers or even paid resources to do such a thing. The real solution for
pg_dump is a complete refactor which includes pg_dumpall and it is not a
small undertaking. It should be noted that it is also a less and less used
program. On our team it is normally used for only very specific needs
(grabbing a schema) and we use binary backups or logical replication to
receive specific data.

> 3. The ability to embed PG to run in an automatic, quiet manner as part of
> something else. I know about SQLite, but it's extremely limited to the
> point of being virtually useless IMO, which is why I cannot use that for
> anything nontrivial. I want my familiar PostgreSQL, only not require it to
> be manually and separately installed on the machine where it is to run as
> part of some "application". If I could just "embed" it, this would allow me
> to create a single EXE which I can simply put on a different machine to run
> my entire "system" which otherwise takes *tons* of tedious, error-prone
> manual labor to install, set up and maintain. Of course, this is probably
> much easier said than done, but I don't understand why PG's architecture
> necessarily dictates that PG must be a stand-alone, separate thing. Or
> rather, why some "glue" cannot enable it to be used just like SQLite from a
> *practical* perspective, even if it still is a "server-client model"
> underneath the hood. (Which doesn't matter at all to me, nor should it
> matter to anyone else.)
>
>
This is really using the wrong tool for the job type of issue. PG was never
designed for such a scenario.

> 4. There is no built-in means to have PG manage (or even suggest) indexes
> on its own. Trying to figure out what indexes to create/delete/fine-tune,
> and determine all the extremely complex rules for this art (yes, I just
> called index management an *art*, because it is!), is just utterly hopeless
> to me. It never gets any easier. Not even after many years. It's the by far
> worst part of databases to me (combined with point five). Having to use
> third-party solutions ensures that it isn't done in practice, at least for
> me. I don't trust, nor do I want to deal with, external software and
> extensions in my databases. I still have nightmares from PostGIS, which I
> only keep around, angrily, out of absolute necessity. I fundamentally don't
> like third-party add-ons to things, but want the core product to properly
> support things. Besides, this (adding/managing indexes) is not even some
> niche/obscure use-case, but something which is crucial for basically any
> nontrivial database of any kind!
>

I think you are looking at this from a very windows centric way. Open
Source has its origins from the Unix paradigm where each tool was designed
to solve one type of problem and you used multiple tools to create a
"solution". Though we have strayed from that on some items due to the
evolving nature of software needs, that is still at our core and for good
reason. Having tools, flags etc... to do such things (including your point
#3) creates complexity best left to "vendors" not the software project.

>
> 5. Ever since my early days with PG in the mid-2000s, I've tried numerous
> times to read the manual, wikis and comments for the configuration files,
> specifically the performance directives, and asked many, many times for
> help about that, yet never been able to figure out what they want me to
> enter for all the numerous options. At this point, it isn't me being
> lazy/stupid; it's objectively very difficult to understand all of that.
>

This is absolutely true. The Postgresql documentation is FANTASTIC if you
already understand what is going on or you need a reference. We have
improved this a bit in 13 with the glossary but we still don't have
definitive "5 steps to make your PostgreSQL server fast" and that is very
much because it is a complicated question and it takes a lot of knowledge
to do it correctly.

> The practical end result of this is that I've always gone back to using
> the untouched default configuration file (except for the logging-related
> options), which, especially in the past on FreeBSD, *severely* crippled my
> PG database to not even come close to taking advantage of the full power of
> the hardware. Instead, it felt like I was using maybe 1% of the machine's
> power, even with a proper database design and indexes and all of that
> stuff, simply because the default config was so "conservative" and it
> couldn't be just set to "use whatever resources are available".
>

Not to be unkind but this does seem lazy. There are literally hundreds of
"how to make postgres go fast", "how to optimize postgres" if you take 15
minutes to Google. It is true that the project (outside of the wiki)
doesn't have much information in the official documentation but that
doesn't mean that the information is not available.

> I wish so much for PG to have a mode where it self-tunes itself as needed,
> over time, based on the actual workload, or at least allowed some kind of
> abstract "performance mode" such as: "you are allowed to use significant
> system resources, PG", or: "You are one of my most important applications.
> Just use as much power as you currently need, but at least save about 10%
> for the rest of the system, will you?" Maybe this is also harder than it
> sounds to accomplish, but for somebody like me who has zero funding, I
> cannot hire some professional to sit down with me and fine-tune my system
> for $899/hour.
>

See my comment about Google. The information is out there and easy to find.
There is a lot of fresh and free content right here:

https://postgresconf.org/conferences/postgres-webinar-series/schedule/events

I hope you find the information you are looking for. There is a very
helpful community on Discord and Slack as well:

- https://discord.gg/E7WjZhz
- https://postgres-slack.herokuapp.com/

Thanks,

JD

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2020-09-21 18:11:00 Re: CRL reloading info - is it refreshed?
Previous Message Laurenz Albe 2020-09-21 14:10:06 Re: Foreign tables, user mappings and privilege setup