Re: newsfeed type query

From: Jonathan Vanasco <postgres(at)2xlp(dot)com>
To: PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: newsfeed type query
Date: 2015-04-29 16:54:25
Message-ID: 2E228BD0-0EE0-46AD-BFBF-14E5B630A169@2xlp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Apr 29, 2015, at 12:25 PM, Ladislav Lenart wrote:

> Could you please explain to me the error(s) in my reasoning?

Let me just flip your list in reverse... and add in some elements (marked with a *):

posting ts context
p60 60 friend
p55 55 friend*
p54 54 friend*
p50 50 group
p50 49 group*
p50 49 group*
p40 40 friend
p30 30 group
p20 20 friend
p10 10 group
p0 0 friend

With the 2 limited subqueries, the results would be:
60F, 55F, 50G, 49G

But the "most recent" data is
50F, 55F, 54F, 50G

So we end up showing 49 which is less relevant than 54.

In some situations this isn't much of an issue, but in others it is detrimental.
For example, one of my "feeds" contains a distribution of events according-to-type that is very uneven. While "friend" and "group" might be relatively close in time to one another, "system" or other events may be months old -- and that older content gets pulled in with this style of query.

If you need to paginate the data and select the next 10 overall items, it gets even more complicated.

IIRC, the best mix of performance and "product" that I've found is do something like this:

SELECT * FROM (
SELECT a,b,c FROM table_a ORDER BY TIMESTAMP DESC LIMIT 10000;
UNION
SELECT a,b,c FROM table_b ORDER BY TIMESTAMP DESC LIMIT 10000;
) as unioned
order by unioned TIMESTAMP DESC LIMIT 100 OFFSET 0;

by creating an artificial limit on the inner queries, you can save postgres from doing a lot of intensive i/o work and memory usage (like a lot)
then, joining a few lists and sorting 20k (or even 100k) items is really cheap.
the downside is that you effectively limit the 'relevancy' of the query to whatever the inner limit is (ie, 10000 -- not the combined total of 20000), but that number can be arbitrarily high enough that it is irrelevant while still showing the right amount of content for people.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John McKown 2015-04-29 17:03:20 New column modifier?
Previous Message Ladislav Lenart 2015-04-29 16:25:42 Re: newsfeed type query