Re: Is there a good discussion of optimizations?

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Is there a good discussion of optimizations?
Date: 2021-01-07 21:54:58
Message-ID: d8991708-a402-06b3-fda7-f21db4c9f6c6@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 1/7/21 2:48 PM, Guyren Howe wrote:
> On Jan 7, 2021, 13:42 -0800, Florents Tselai
> <florents(dot)tselai(at)gmail(dot)com>, wrote:
>
> Apologies for the shameless self-promotion :)
>
>
> Around a year ago I collected my thoughts on this topic. You can
> read the post here Modern Data Practice and the SQL Tradition
> <https://tselai.com/modern-data-practice-and-the-sql-tradition.html> .
>
> It looks like it resonated with a lot of folks in the community.
> HN Discussion https://news.ycombinator.com/item?id=21482114
> <https://news.ycombinator.com/item?id=21482114>
>
>
> I would specifically underline the fact that the newer generation
> of programmers & data pros (my former self included) don't really
> appreciate things like triggers and server-side programming.
> Triggers and DB-side functions are considered something like
> Assembly code.
>
>
> Not many neophytes have been shown with use cases why for example
> writing 2-3 lines of PL/SQL can save you a huge overhead of back
> and forth and environment set up to write the same thing in say
> Pandas.
>
>
> I would focus on Triggers, Indices on expressions, and
> time-related functions. Probably on generated columns too. They
> may be considered a new feature, but the reasoning of building
> successively columns on top of a few base ones is quite appealing
> nowadays, especially for ML purposes.
>
>
> I also wouldn't hesitate to discuss advanced topics. They are
> usually considered obscure because people don't present useful
> examples, rather toy and arbitrary ones.
>
>
> In a recent O'Reilly training, I was skeptical of talking about
> triggers for optimization but it looks like it was probably the
> most useful part of my training as students could actually "steal
> and tweak" my code.
> <https://github.com/Florents-Tselai/SQLite-for-Data-Scientists/blob/master/notebooks/5_advanced_SQL.ipynb>
>
>
> Thanks for this. May I steal some of your examples if they prove
> useful? I’ll credit you of course.
>
> I’m planning on somewhat emphasizing that a relational database is a
> logic engine. Viewed through this lens, a query or view is a
> “backward” implication and a trigger is a “forward” one. This leads to
> considering triggers (and the moral equivalent in external code) as
> requiring “truth maintenance”, and is a great way to think about when
> the database is the appropriate place for some bit of logic.
>
> On Thu, Jan 7, 2021 at 11:21 PM Guyren Howe <guyren(at)gmail(dot)com
> <mailto:guyren(at)gmail(dot)com>> wrote:
>
> On Jan 7, 2021, 13:07 -0800, Kevin Brannen <KBrannen(at)efji(dot)com
> <mailto:KBrannen(at)efji(dot)com>>, wrote:
>
> From: Guyren Howe <guyren(at)gmail(dot)com <mailto:guyren(at)gmail(dot)com>>
>
> >Most folks, in my experience, who use relational
> databases don’t really understand the basic theory or even
> more important the why - the philosophy - of what a
> relational database is and how to get the most out of
> them. I see a lot of folks trying to use SQL in an
> imperative manner - make this temp table, then update it
> some, then make this other temp table, etc...
>
>
> Actually, I’m mostly going to talk about the relational model,
> rather than SQL. Our industry seems to always settle for
> third-best, and SQL is the worst of all the examples of this.
> The world desperately needs a good relational database based
> on a better query language — datalog, for example.
>
>
> I put up with SQL so I can use the relational model, and I
> think that understanding SQL has to start with that.
>
>
> Anyhow.
>
> An example of this is that we have a report we're trying
> to write that I'd
>
> like to think can be done in SQL, but I can't think of a
> way to do it. Yet,
>
> if I do the base query and pull the data back into my
> application, I can do
>
> the last bit with 3 lines of Perl very easily. The problem
> here revolves
>
> around comparing a row of data to the previous row to know
> whether the data
>
> changed "significantly enough" to keep the new row.
>
> Another example is doing running totals. A couple of years
> ago I would have
>
> said SQL can't do that. Now I know about the OVER clause,
> something that I
>
> would categorize as somewhat obscure, I can do it as needed.
>
>
> Actually, Window functions might be “advanced”, but are
> certainly not obscure. Your example sounds like it’s trivially
> solved with LAG().
>
> As Michael Lewis pointed out, large dataset can also cause
> you to choose not
>
> to use SQL in 1 big statemet for everything (i.e.
> advocating the use to temp
>
> tables). In some ways, using a CTE is a type of temp
> table, or at least I
>
> view it as such. That allows a person to solve a problem
> in bite-sized chunks.
>
> I will agree that optimization can do it better at times,
> but the code also has
>
> to be maintained as well – a balancing act.
>
>
> This appears to be good advice with SQL Server, which I’m
> coming to learn has a fairly poor query optimizer. But I would
> have thought Postgres’s optimizer would usually use a
> temporary table where appropriate.
>
>
Please include, for all those front-end coders who might want to hit the
database, the expense/overhead involved.   I've seen "foreach id, read
database, process record" all too often.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Brannen 2021-01-07 22:17:09 RE: Is there a good discussion of optimizations?
Previous Message Guyren Howe 2021-01-07 21:48:01 Re: Is there a good discussion of optimizations?