Re: newsfeed type query

From: Ladislav Lenart <lenartlad(at)volny(dot)cz>
To: Jonathan Vanasco <postgres(at)2xlp(dot)com>, PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: newsfeed type query
Date: 2015-04-29 17:18:32
Message-ID: 55411268.1060008@volny.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 29.4.2015 18:54, Jonathan Vanasco wrote:
>
> On Apr 29, 2015, at 12:25 PM, Ladislav Lenart wrote:
>
>> Could you please explain to me the error(s) in my reasoning?
>
> Let me just flip your list in reverse... and add in some elements (marked with a *):
>
> posting ts context
> p60 60 friend
> p55 55 friend*
> p54 54 friend*
> p50 50 group
> p50 49 group*
> p50 49 group*
> p40 40 friend
> p30 30 group
> p20 20 friend
> p10 10 group
> p0 0 friend
>
> With the 2 limited subqueries, the results would be:
> 60F, 55F, 50G, 49G
>
> But the "most recent" data is
> 50F, 55F, 54F, 50G
>
> So we end up showing 49 which is less relevant than 54.

I would expect the overall query to return only 60F nad 55F as the most recent
data. No? You expect it to return 4 items when the LIMIT is only 2. Remember
that the overall query should be also ordered by ts and limited to 2.

I thought you want most recent items across all contexts and not 2 most recent
items from friends plus two most recent items from groups...

Ladislav Lenart

> In some situations this isn't much of an issue, but in others it is detrimental.
> For example, one of my "feeds" contains a distribution of events according-to-type that is very uneven. While "friend" and "group" might be relatively close in time to one another, "system" or other events may be months old -- and that older content gets pulled in with this style of query.
>
> If you need to paginate the data and select the next 10 overall items, it gets even more complicated.
>
> IIRC, the best mix of performance and "product" that I've found is do something like this:
>
> SELECT * FROM (
> SELECT a,b,c FROM table_a ORDER BY TIMESTAMP DESC LIMIT 10000;
> UNION
> SELECT a,b,c FROM table_b ORDER BY TIMESTAMP DESC LIMIT 10000;
> ) as unioned
> order by unioned TIMESTAMP DESC LIMIT 100 OFFSET 0;
>
> by creating an artificial limit on the inner queries, you can save postgres from doing a lot of intensive i/o work and memory usage (like a lot)
> then, joining a few lists and sorting 20k (or even 100k) items is really cheap.
> the downside is that you effectively limit the 'relevancy' of the query to whatever the inner limit is (ie, 10000 -- not the combined total of 20000), but that number can be arbitrarily high enough that it is irrelevant while still showing the right amount of content for people.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alex Gregory 2015-04-29 17:53:08 PostgreSQL HA config recommendations
Previous Message John McKown 2015-04-29 17:03:20 New column modifier?