From: | Jay Levitt <jay(dot)levitt(at)gmail(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, David Johnston <polobo(at)yahoo(dot)com>, Robert James <srobertjames(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Controlling complexity in queries |
Date: | 2011-12-14 20:23:29 |
Message-ID: | 4EE905C1.4000508@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Merlin Moncure wrote:
> SQL has a very powerful abstraction feature: it's called a view. Good
> use of views is a key design feature for complex databases.
>
> Functions are generally not a good choice for query abstraction unless:
One more:
* You want contextual queries.
(I guess this is a special case of "you need non relational features".)
In my case, I want all queries against content to be filtered by their
relevance to the current user. That can't go into a view, because views
don't have parameters; I need a computed column that may be different every
time I run the query, and depends on a piece of information (the current
user ID) that Postgres can't know.
Relevance itself is a weighed average of a bunch of factors which *also*
need the user ID, like "how similar are you to the user who authored that
content?"
As far as I can tell, the only way to accomplish this is through pure-SQL
functions, or by hand-crafting the ultimate SQL query that accomplishes
those functions.
It's much easier to work with
select content.*, relevance(content.id, current_user.id) as r
order by r
than the underlying query. I'm not doing fancy OO stuff; I'm
compartmentalizing the knowledge of "what is relevance", "what is
similarity", and "how do I fetch a collection of content". I'm not even in
1990. I'm in (quick Google) 1946, when subroutines were invented:
"We also wish to be able to arrange for the splitting up of operations
into subsidiary operations."
If views had parameters, I'd use views, but I suspect that a parameterized
view would be very much like a pure-SQL set-returning function, ya?
I'd love to find a better way to do this. Having just read Thinking In
Sets, I am sure a *real* SQL programmer would create a view that cross joins
every content row with every user, materialize it, and then restrict your
eventual query with the user id. But reading that book always leaves me
with the same two questions: "Why is Joe Celko yelling at me? And
what's SNOBOL?"
>> [1] Since this is my current favorite problem, the pathological case is:
>>
>> select questions.id
>> from questions
>> join (
>> select u.id
>> from users as u
>> group by u.id
>> ) as s
>> on s.id = questions.user_id
>> where questions.id = 1;
>>
>> With users.id as a primary key, it's obvious that this can return only one
>> row, but it has to scan the users table to get there. See the "Subjquery in
>> a JOIN not getting restricted?" thread on pgsql-performance for Tom's
>> explanation of why that's a hard problem to solve.
>
> Yeah -- here and there you run into difficult to optimize queries.
> (For my part, I'd just have converted that to WHERE EXISTS for the
> semi-join).
I think I'm about to learn a very important relational-algebra
equivalence... could you elaborate?
Jay
From | Date | Subject | |
---|---|---|---|
Next Message | Marti Raudsepp | 2011-12-14 20:56:31 | Re: when was pg_stat_reset() used in my server for the last time |
Previous Message | Simon Windsor | 2011-12-14 20:01:42 | Vacuum and Large Objects |