From: | Ladislav Lenart <lenartlad(at)volny(dot)cz> |
---|---|
To: | Jonathan Vanasco <postgres(at)2xlp(dot)com>, PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: newsfeed type query |
Date: | 2015-04-29 17:18:32 |
Message-ID: | 55411268.1060008@volny.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 29.4.2015 18:54, Jonathan Vanasco wrote:
>
> On Apr 29, 2015, at 12:25 PM, Ladislav Lenart wrote:
>
>> Could you please explain to me the error(s) in my reasoning?
>
> Let me just flip your list in reverse... and add in some elements (marked with a *):
>
> posting ts context
> p60 60 friend
> p55 55 friend*
> p54 54 friend*
> p50 50 group
> p50 49 group*
> p50 49 group*
> p40 40 friend
> p30 30 group
> p20 20 friend
> p10 10 group
> p0 0 friend
>
> With the 2 limited subqueries, the results would be:
> 60F, 55F, 50G, 49G
>
> But the "most recent" data is
> 50F, 55F, 54F, 50G
>
> So we end up showing 49 which is less relevant than 54.
I would expect the overall query to return only 60F nad 55F as the most recent
data. No? You expect it to return 4 items when the LIMIT is only 2. Remember
that the overall query should be also ordered by ts and limited to 2.
I thought you want most recent items across all contexts and not 2 most recent
items from friends plus two most recent items from groups...
Ladislav Lenart
> In some situations this isn't much of an issue, but in others it is detrimental.
> For example, one of my "feeds" contains a distribution of events according-to-type that is very uneven. While "friend" and "group" might be relatively close in time to one another, "system" or other events may be months old -- and that older content gets pulled in with this style of query.
>
> If you need to paginate the data and select the next 10 overall items, it gets even more complicated.
>
> IIRC, the best mix of performance and "product" that I've found is do something like this:
>
> SELECT * FROM (
> SELECT a,b,c FROM table_a ORDER BY TIMESTAMP DESC LIMIT 10000;
> UNION
> SELECT a,b,c FROM table_b ORDER BY TIMESTAMP DESC LIMIT 10000;
> ) as unioned
> order by unioned TIMESTAMP DESC LIMIT 100 OFFSET 0;
>
> by creating an artificial limit on the inner queries, you can save postgres from doing a lot of intensive i/o work and memory usage (like a lot)
> then, joining a few lists and sorting 20k (or even 100k) items is really cheap.
> the downside is that you effectively limit the 'relevancy' of the query to whatever the inner limit is (ie, 10000 -- not the combined total of 20000), but that number can be arbitrarily high enough that it is irrelevant while still showing the right amount of content for people.
From | Date | Subject | |
---|---|---|---|
Next Message | Alex Gregory | 2015-04-29 17:53:08 | PostgreSQL HA config recommendations |
Previous Message | John McKown | 2015-04-29 17:03:20 | New column modifier? |