From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Jay Levitt <jay(dot)levitt(at)gmail(dot)com> |
Cc: | Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, David Johnston <polobo(at)yahoo(dot)com>, Robert James <srobertjames(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Controlling complexity in queries |
Date: | 2011-12-14 15:33:04 |
Message-ID: | CAHyXU0wH43hcKk4e0J_bSVYGuwL-YcTpg66CaeB2cifK5gZpgQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Dec 13, 2011 at 4:27 PM, Jay Levitt <jay(dot)levitt(at)gmail(dot)com> wrote:
> Merlin Moncure wrote:
>>
>> Breaking your large queries into functions OTOH can make significant
>> changes to the plan, often to the worse.
>
>
> As an end-user, I think this is an area where PostgreSQL could really stand
> out (that and the moon launch). In Rails-land, you don't have The DBA that
> writes queries. You have a developer and an ORM, and when they step outside
> that ORM to do the cool things SQL can do, all their DRY habits fall apart,
> because it's 1990 again and you can either write clear code or fast code but
> not both.
As far as ORMs are concerned, I'll take 1990 over 2011 all day long.
The heavy emphasis on procedural/OO coding tactics to model business
applications IMNSHO is and always been a total catastrophe. The
reasons for that are numerous: insane defect rates, brittle functional
relationships, poor concurrency model etc. Enterprises move off SQL
because they hate paying the dollars top SQL talent demands only to
pay the ultimate price when development gets crushed under the weight
of maintaining all that crappy code. This sad state of affairs has
been encouraged by some of the top software vendors.
> But having to write one big query for performance feels exactly like having
> to write one big C function with unrolled loops. I'm currently taking a
> well-factored, function-based query and turning it INTO what Robert James is
> trying to get OUT of: a monolithic query.
SQL has a very powerful abstraction feature: it's called a view. Good
use of views is a key design feature for complex databases.
Functions are generally not a good choice for query abstraction unless:
*) you are working with scalars (string manipulation etc)
*) you need non relational features like plpgsql exception
handling/notice printing, etc
*) this particular operation is not really optimizable anyways and you
want to wrap it (WITH RECURSIVE for example)
*) your function is inline-able (generally, a one liner sql function
that is stable or immutable)
etc
>> In the end, the performance of your queries is going to be directly
>> related to how well you map the problem into relational logic
>
>
> It's not just that, though; it's quite possible to think relationally and
> still fall down. There are plenty of cases where the human eye can see that
> a modular function can be inlined, but the optimizer can't. I have a
> pathological case: a query against a database with just a few thousand users
> takes 1.5 seconds on fast hardware, because it ends up scanning a cartesian
> product to get 16 rows, even before you get to the nested loops. In fact,
> most of the time the optimizer does a great job of inlining all my
> set-returning functions, once 9.0.6/9.1.2 rolled out.
>
> I've seen at least three equally ominous pieces that would have to happen to
> allow DRY, composable SQL:
>
> 1. Optional optimization of non-recursive WITH
> 2. Optional pushdown of WHERE clauses into GROUP BY[1]
> 3. LATERAL
>
> AFAIK, none of these are on anyone's short-term to-do list, and I'm sure
> none are easy.
>
> [1] Since this is my current favorite problem, the pathological case is:
>
> select questions.id
> from questions
> join (
> select u.id
> from users as u
> group by u.id
> ) as s
> on s.id = questions.user_id
> where questions.id = 1;
>
> With users.id as a primary key, it's obvious that this can return only one
> row, but it has to scan the users table to get there. See the "Subjquery in
> a JOIN not getting restricted?" thread on pgsql-performance for Tom's
> explanation of why that's a hard problem to solve.
Yeah -- here and there you run into difficult to optimize queries.
(For my part, I'd just have converted that to WHERE EXISTS for the
semi-join).
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Albe Laurenz | 2011-12-14 16:01:11 | Re: Philosophical question |
Previous Message | Tom Lane | 2011-12-14 15:13:27 | Re: why can't my account be used at wiki.postgresql.org after having registered in www.postgresql.org? |