Re: Is there a good discussion of optimizations?

From: Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
To: Guyren Howe <guyren(at)gmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, Kevin Brannen <KBrannen(at)efji(dot)com>
Subject: Re: Is there a good discussion of optimizations?
Date: 2021-01-07 21:41:52
Message-ID: CA+v5N41tdtEEjwM3s31i++5s=j4BYWurEgbJqcknHtPA9WePYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Apologies for the shameless self-promotion :)

Around a year ago I collected my thoughts on this topic. You can read the
post here Modern Data Practice and the SQL Tradition
<https://tselai.com/modern-data-practice-and-the-sql-tradition.html> .
It looks like it resonated with a lot of folks in the community. HN
Discussion https://news.ycombinator.com/item?id=21482114

I would specifically underline the fact that the newer generation of
programmers & data pros (my former self included) don't really appreciate
things like triggers and server-side programming. Triggers and DB-side
functions are considered something like Assembly code.

Not many neophytes have been shown with use cases why for example writing
2-3 lines of PL/SQL can save you a huge overhead of back and forth and
environment set up to write the same thing in say Pandas.

I would focus on Triggers, Indices on expressions, and time-related
functions. Probably on generated columns too. They may be considered a new
feature, but the reasoning of building successively columns on top of a few
base ones is quite appealing nowadays, especially for ML purposes.

I also wouldn't hesitate to discuss advanced topics. They are usually
considered obscure because people don't present useful examples, rather toy
and arbitrary ones.

In a recent O'Reilly training, I was skeptical of talking about triggers
for optimization but it looks like it was probably the most useful part of
my training as students could actually "steal and tweak" my code.
<https://github.com/Florents-Tselai/SQLite-for-Data-Scientists/blob/master/notebooks/5_advanced_SQL.ipynb>

*Florents Tselai*

*Data Scientist & Engineer*

Jack of All Trades and Master of Some

http://tselai.com
<http://tselai.com/?utm_source=email_signature&utm_medium=email&utm_campaign=Email_Signature>

If I'm late for a meeting: I'll be there in 5'. If not, reread this.

Sent from mobile excuse any...Nah... typos are due to sheer laziness or
neglect.

On Thu, Jan 7, 2021 at 11:21 PM Guyren Howe <guyren(at)gmail(dot)com> wrote:

> On Jan 7, 2021, 13:07 -0800, Kevin Brannen <KBrannen(at)efji(dot)com>, wrote:
>
> From: Guyren Howe <guyren(at)gmail(dot)com>
>
>
>
> >Most folks, in my experience, who use relational databases don’t really
> understand the basic theory or even more important the why - the philosophy
> - of what a relational database is and how to get the most out of them. I
> see a lot of folks trying to use SQL in an imperative manner - make this
> temp table, then update it some, then make this other temp table, etc...
>
>
> Actually, I’m mostly going to talk about the relational model, rather than
> SQL. Our industry seems to always settle for third-best, and SQL is the
> worst of all the examples of this. The world desperately needs a good
> relational database based on a better query language — datalog, for example.
>
> I put up with SQL so I can use the relational model, and I think that
> understanding SQL has to start with that.
>
> Anyhow.
>
> An example of this is that we have a report we're trying to write that I'd
>
> like to think can be done in SQL, but I can't think of a way to do it. Yet,
>
> if I do the base query and pull the data back into my application, I can do
>
> the last bit with 3 lines of Perl very easily. The problem here revolves
>
> around comparing a row of data to the previous row to know whether the data
>
> changed "significantly enough" to keep the new row.
>
>
>
> Another example is doing running totals. A couple of years ago I would have
>
> said SQL can't do that. Now I know about the OVER clause, something that I
>
> would categorize as somewhat obscure, I can do it as needed.
>
>
> Actually, Window functions might be “advanced”, but are certainly not
> obscure. Your example sounds like it’s trivially solved with LAG().
>
> As Michael Lewis pointed out, large dataset can also cause you to choose
> not
>
> to use SQL in 1 big statemet for everything (i.e. advocating the use to
> temp
>
> tables). In some ways, using a CTE is a type of temp table, or at least I
>
> view it as such. That allows a person to solve a problem in bite-sized
> chunks.
>
> I will agree that optimization can do it better at times, but the code
> also has
>
> to be maintained as well – a balancing act.
>
>
> This appears to be good advice with SQL Server, which I’m coming to learn
> has a fairly poor query optimizer. But I would have thought Postgres’s
> optimizer would usually use a temporary table where appropriate.
>
> Curious to hear if that’s wrong.
>
> I think your idea is a good one, but I do hope you present that SQL can't
>
> solve everything ... else why do we have plpgsql. :) You’re correct though,
>
> SQL isn’t used as much as it should be in many places.
>
>
> An important consideration will be when relational is inappropriate. My
> biggest goal, though, is to get folks to understand how much relations
> *can* do — far too many devs in my experience don’t use the power of SQL
> because they don’t understand it.
>
> Thanks for taking the time to give me this feedback.
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Guyren Howe 2021-01-07 21:48:01 Re: Is there a good discussion of optimizations?
Previous Message Guyren Howe 2021-01-07 21:21:19 RE: Is there a good discussion of optimizations?