From: | Daniel Farina <daniel(at)heroku(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Finding latest record for a number of groups in an INSERT-only table |
Date: | 2011-07-05 07:43:08 |
Message-ID: | CAAZKuFYcMV3eEdTXQMBDH9GdLk5imYZUgkyLq5grmX5-WoScJA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Jul 5, 2011 at 12:32 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I think its a pretty common requirement and we should be looking to
> optimize it if it isn't handled well.
I agree; although I wanted to be sure that it is not in fact handled
well by some mechanism I haven't seen yet.
> The only problem is that there is a few ways of writing that in SQL
> and we would probably need to recognise how to transform between query
> types to achieve a common form.
I think that'd be good for completeness, although I also think that
having a 'design pattern' of sorts for dealing with this workload
would be totally acceptable for quite some time, even if
canonicalization from other forms is technically possible
> For example, the post you cite uses a correlated subquery whereas its
> possible to write it using an IN subselect.
I liked the brevity of DISTINCT ON form, even if semantically it feels
a little weird, yet, I think the biggest bang for the buck is looking
at the "row_number() = N" form of the problem: I've seen people do
this a number of times for different reasons. In my case N = 1, but
if I wanted to take an average of a metric of three records or so then
one could easily imagine wanting row_number() >= 3, followed by
aggregations. I've seen some other real-world use cases fly by where
people wanted some N where N != 1, too. When such a predicate is
around, one can be sure that no more than N rows are generated per
partition, and if there's an appropriate index then some kind of
costing can take place to see if it's worth using.
Interestingly, this may also be a way to (in a degenerate case)
acquire a path to enable skip scan, but last I checked skip scan was
problematic somehow for other reasons, which makes me wonder how
feasible it'd be to embark on this optimizer enhancement, should it
prove necessary.
--
fdr
From | Date | Subject | |
---|---|---|---|
Next Message | Albe Laurenz | 2011-07-05 08:19:17 | Re: cannot "make USE_PGXS=1" b/c no pgxs.mk exists |
Previous Message | Sim Zacks | 2011-07-05 07:37:13 | Re: Read MS-SQL data into Postgres via ODBC link? |