Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

From: Sam Gendler <sgendler(at)ideasculptor(dot)com>
To: tutiluren(at)tutanota(dot)com
Cc: Pgsql General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?
Date: 2020-09-26 03:09:49
Message-ID: CAEV0TzD7bvmSURG4S7XbgRvSPX93WO0e7zPVtRZi5THvd_Cgkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Sep 24, 2020 at 10:40 PM <tutiluren(at)tutanota(dot)com> wrote:

>
> Well not partial as in incremental. Instead dump only some portion of the
> schema with or without its associated data.
>
> It's funny that you should bring that up, considering how it was one of my
> points... See the point about pg_dump's bug on Windows.
>

And you seem to have ignored the fact that one of the core developers
pointed out that it likely isn't a pg_dump bug - if your terminal is using
the same locale as the database, it should have no difficulty dealing with
the characters you are having trouble with. It seems likely that you
simply need to learn how to get your terminal set up correctly for it to
work.

> I'm saying that PostGIS has a bug due to incorrectly constructed internal
> queries which makes it impossible to properly name the schema where PostGIS
> is to reside, causing my database to look very ugly when it has to say
> "postgis" instead of "PostGIS" for PostGIS's schema. And that was an
> example of how sloppy/bad third-party things always are, and is one reason
> why I don't like it when I have to rely on "extensions".
>
> They are extensions so you aren't required to use them and rely on their
> way of doing things. You have the choice of writing your own code/extension
> or do without completely.
>
> It sure is great to have such choices... I can't take it seriously when
> people say things like this. It's similar to "it's open source so you can
> easily vet it yourself". It's not taking reality into consideration at all.
>
> As for doing without it, that would make it impossible to deal with GPS
> coordinates/maps. So it's not really a choice at all.
>

Never mind that your tone in your emails is remarkably rude for someone who
is doing nothing but complain about perceived shortfalls in a product that
is entirely free and of which you appear to be fairly far from an expert
user, has it occurred to you that YOUR issue is that you have absolutely no
understanding of the variety of uses that people put a database like
postgres to out in the world? Imagine if the core database server was
subject to the development schedule of every extension that you happen to
think ought to be included in the core product - or vice versa. Someone
finds a bug in PostGIS and it can't be fixed until the next major or minor
release of the core postgres server? That would result in a terrible user
experience and force development of all extensions to move in lockstep with
the core server. The vast majority of users do not ever use GIS extensions
and have absolutely no use for their presence in the core product. The fact
that you need it is your one individual use case. I've been using Postgres
professionally since 2005 and I've only ever used postgis for a single
project. Who are you to say what should and should not be included in a
product to which you contribute absolutely nothing? And in the same breath,
you complain about the quality of that extension. If it is so lacking in
quality, why would you want it embedded in the core service where it could
potentially impact the quality of every user's experience? Maybe consider
that the dev team has good reasons for keeping non-essential functionality
in extensions instead of the core codebase.

> That would entail building an AI into the code that would deal with
> all the possible OS(versions), Postgres(versions), hardware
> permutations.
>
> I... guess. If "AI" means "a series of ifs". Which is what software... is?
> I doubt that people who can make the world's most advanced open source
> database cannot check the amount of RAM and see how fast the CPU/disk is.
>
>
> It is more then that. It would have to take into account the behavior
> changes that happen in Postgres between major versions. It also would have
> to account for OS specific parameters and the changes that happen there
> between OS versions. It also would need to 'know' how the database was
> going to be used; readonly, heavy writes, etc. Also how the database should
> play with other programs on the same machine. Add to the mix containers,
> cloud instances and so on and you are outrunning the ability of 'ifs' to
> handle it.
>
> If it changes that much, it's far, far worse than I even thought, and it
> sounds like it will be pointless to even *try* to learn it as it keeps
> changing between versions/OSes/other stuff.
>

> I can't help but feel as if people just don't want to answer this and
> other concerns I have. As if there's some silent agreement along the lines
> of "securing PG DBAs' jobs".
>

You really come off as something of an ass with this commentary. DBAs have
plenty of far more interesting things to do with their time than
performance tune a server config and certainly don't need that in order to
be secure in their role. What is clear is that you have absolutely no
understanding of the variety of ways in which a database server might be
configured depending on the way the database is being used, the quantity of
data, the rate of queries, the size of queries, the other applications that
must share the host, the speed and quantity of storage systems, etc. If it
is such an easy problem to solve, then I'm sure everyone here would
encourage you to build it yourself and contribute it back to the project -
all of us who have spent years learning how to fine tune the performance of
postgresql would surely thank you for taking a big chunk of what is
otherwise basically toil off our plates. Take a look at the performance
mailing list some time and look at the effort even people with decades of
postgres experience go to in order to determine optimal settings for new
hardware and then tell us how easy it would be to just automatically
configure a server with magic values that work for everyone. You are just
flat-out wrong, here, and you lack the expertise or experience to even
understand that you don't know enough to have a valid opinion on this
point. Everyone else has tried to let you know politely, but your tone has
ticked me off so I'm just telling it to you straight.

>
> The thing is 'general mode' is going to mean something different to
> someone running a database in the MB-low GB range vs. high GB vs. TB vs. PB.
>
> I don't mean this to sound rude, but it's like talking to a wall... What I
> mean is that there are obviously technical means for software to know
> whether they are exhausting the system they are running on or not, and
> expecting people to understand all these intricate internal parameters is
> just... bizarre. There ought to be some kind of "abstract" setting for
> those of us who aren't able to (or even *wish* to) comprehend all the PG
> internals, and just want an efficient database using (roughly) as much of
> our machine as we want.
>

Well, you most definitely DO sound rude, and you have since your first
email on this thread. And there are most definitely NOT technical means for
software to know whether it is exhausting a system or not. Just for
starters, it has no idea if it is sharing server resources with other
processes or if all resources are dedicated to itself. If disk response is
suddenly slow, is that because some index or table just hit critical mass
and caused a knee in a performance curve that might be tuned out or is it
because someone just started bulk copying a bunch of data to the same
network attached storage that the database is using? There is no way to
tell if a query is slow because other queries were occupying all of the
CPU, because something else sucked up all the memory and the OS is
swapping, or because some other process just consumed 90% of CPU resources
on the host. The database doesn't know if 99% of your queries are going to
be to a bunch of 1MB tables or to the one 40TB table that is actually just
being written for auditing purposes and will rarely, if ever be queried on
the production host. Never mind the differences between data warehousing
workloads vs OLTP workloads.

Just maybe, the folks around here, many with decades of postgresql
experience, know more than you do about the variety of uses a database
might be put to, not to mention the variety of settings that go into tuning
for them, and have come to the conclusion that it is a problem too
complicated to solve via automation. Instead of arguing with them, and then
complaining about how hard it is for YOU - someone who almost certainly is
smarter than any algorithm is likely to be - cannot figure it out. Maybe
take some time to do some reading and learning, set up a system and do some
benchmarking of various config changes, and figure it out. Honestly, it
really isn't all that hard, but it IS time consuming. Skill acquisition
usually is. That's why employers pay well the people who do it. Not only
are there already tools available for 'automatically' tuning your database
based on your projected utilization and available resources, but many of us
have somehow managed to acquire the knowledge you seek via the
documentation and the shared knowledgebase that is the internet. Instead
of berating the developers, how about just asking for assistance and then
using that assistance to further your own knowledge. No one is going to be
quick to volunteer to assist someone who just angrily writes accusatory
complaints that not enough was done to proactively assist them - as if the
folks on this mailing list don't have anything better to do with their time
than to drop everything to help you - and yet they do. Instead of being
angry about it, maybe thank them instead.

The postgresql docs are reference docs. They are written in a manner which
documents available APIs and capabilities in a thorough manner. As a
result, they aren't that useful for learning which bits and pieces are most
useful for doing the thing that you want to do. But there are countless
books, blog posts, mailing lists, and other resources available to you -
just as they are for almost every other open source project of
significance. Can you name a database that is even remotely as capable as
PostgreSQL which is significantly more easily administered than PostgreSQL?
Oracle? Nope, it is MASSIVELY more difficult. SQL Server? Nope, also a
pain in the neck to tune and, in my opinion as something of a novice user
of that system, a whole heck of a lot more poorly documented when it comes
to administrative tasks. MySQL? Not really. Maybe slightly simpler to
configure if you don't need all of the functionality that postgres brings
to the table, but otherwise every bit as hard to manage. If you want
Microsoft Access simplicity, you should use Microsoft Access. There's a
reason all of those databases have manually managed configurations that
take lots of experience to become fully competent with. The only way to
simplify configuration is to simplify functionality, and you would
inevitably complain about that if it happened.

> "There's plenty of guides" and "the information is out there" doesn't help
> me and all the other people who have stuck with the default config and thus
> a massively restricted PG database for all these years. Just because it's
> easy to you doesn't mean it's easy to everyone else. Just dealing with
> composing efficient-enough SQL queries and designing an optimized database
> structure is (way) more than enough work for most of us. I don't have the
> luxury of some hired DBA who sits all day tuning the PG server. Besides,
> I've already explained the privacy issues with that even if I had the
> money...
>

None of us had the luxury of having some hired DBA who sat all day tuning
the PG server. That's why we sat down and acquired those skills for
ourselves. And I'm reasonably certain that none of us sit around all day
idly modifying database configs during our copious spare time. Complex
systems are hard. Welcome to your career. The only thing that will keep
you gainfully employed for the next decades is if you learn how to quickly
and effectively acquire new skills (and not piss of whole communities of
potential support in the process). And you know what is a damn good way to
do that? Read books about the tech. Read the mailing lists for the tech.
And then contribute something useful to that tech. Feel free to fix any
and/or all of the perceived defects that you seem to think are so trivial
to resolve. I can promise you that if your implementations are of high
quality and don't break other functionality or diverge from the project's
core mission, there's a high likelihood that they'll be merged into the
codebase eventually.

I just typed PostgreSQL into my oreilly bookshelf and came up with over
3000 hits. The first page is entirely high quality books published by
reputable publishers within the last 3 years, covering pg 11 or 12. The
first one is called "*Mastering PostgreSQL 12*" and seems a likely
candidate. A few rows farther down, I find "*PostgreSQL Configuration:
Best Practices for Performance and Security*" which was published less than
6 months ago and is surely entirely up to date. I refuse to believe that
you could possibly have so much difficulty learning how to configure a
system from a book like that, which almost certainly gives you step by step
guidance for determining optimal values for the most relevant settings for
use cases similar to yours. It's probably worth pointing out that there's
likely a 99% chance that the authors of those books are reading your emails
on this list. So maybe go buy, borrow, or check out one of their books and
learn how to configure your system. Unless you pay them an hourly rate, no
one here is going to configure your system for you.

There's my $0.02 (and then some).

--sam

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jack Douglas 2020-09-26 07:17:21 Yum repository RPM behind release
Previous Message Bruce Momjian 2020-09-25 19:50:13 Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?