In need of help with message fetching query

From: Max Strömberg <max(dot)stromberg(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: In need of help with message fetching query
Date: 2009-02-10 22:19:13
Message-ID: 27753bae0902101419i60b81d2anb30ed324046edb32@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello everyone.

I'm working on a small project of mine, which basically revolves
around messages.

These messages are to be ordered in a very standard fashion of single-depth
threads. That is, a message can be a reply, and a so-called "head".

A head is simply the head of a chain. To put it more eloquently, the table is:

create table messages (
id serial not null primary key,
author_id integer null references account (id) deferrable
initially deferred,
text varchar(200) not null,
timestamp timestamp with time zone not null,
reply_to_id integer null,
reply_to_account_id integer not null references account (id)
deferrable initially deferred,
unique (account_id, reply_to_id)
);

That is, a message is made by an account, and *can* be made in reply
to another message. If it is, reply_to_account_id will equal the
author_id of the corresponding row where reply_to_id = id.

So, for example, a simple case:

#1 by:A1 "Hello world!" reply_to:null
#2 by:A2 "Hey there." reply_to:#1
#3 by:A3 "'Sup?" reply_to:#1

Would be:

Hello world!
+-- Hey there.
+-- 'Sup?

This is a simple case -- the head in this particular instance is #1.

Now, given the above, imagine we were to have:

#1 by:A1 "Hello world!" reply_to:null
#2 by:A2 "Hey there." reply_to:#1
#3 by:A3 "'Sup?" reply_to:#1
#4 by:A1 "Not much!" reply_to:#3

It immediately becomes more complicated -- #4 is a reply to #3, which
is a reply to #1. The desired resultset would be:

#3 by:A3 "'Sup?" reply_to:#1
#4 by:A1 "Not much!" reply_to:#3
#1 by:A1 "Hello world!" reply_to:null
#2 by:A2 "Hey there." reply_to:#1
#3 by:A3 "'Sup?" reply_to:#1

Now, #3 has become a head in its first appearance. In the second, at
the end, it is a mere reply. Thus:

'Sup?
+-- Not much!
Hello world!
+-- Hey there.
+-- 'Sup?

So the head is really "has one or more replies, or reply_to is null."

The query could thus be summed up into, "Get all rows which have
reply_to_id null, or at least one other row refers to in its
reply_to_id. Then, for every such row, include all messages which have
reply_to_id = current row's id"

It is entirely possible that this results in two queries -- one for
fetching the heads, one for fetching all replies.

Further, there are other constraints which would appear trivial: the
heads fetching part will have a limit, and a where clause telling it
"get things older than this", as well as "get rows with author_id in
(x, y, ...)".

There is also a very quirky issue -- the sorting. A head's placement
in the resultset is dependant on its latest reply. That is, if a new
row is added to our table:

#5 by:A4 "O hi!" reply_to:#1

With its timestamp set to the current time, this will result in the #1
"chain" getting bubbled up to the top!

I implemented this as: order by (select b.timestamp from messages b
where reply_to_id = o.id order by timestamp desc limit 1) desc

Obviously, this isn't a very cheap query at all! Even with me being
humble, my hardware simply won't cut it after some amount of messages.

So, I think that sums my problem up fairly well. I've been trying to
get this to work the way I want for days, but I'm simply not skilled
enough in SQL to make this work. I did read manuals, and I think I
really exhausted most of my options.

(As an aside: while I deploy using PostgreSQL (what else?), I also use
SQLite on my MacBook, but I could give that up.)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Raymond O'Donnell 2009-02-10 22:25:56 Re: createdb.exe and psql.exe without Promting Password
Previous Message Sam Mason 2009-02-10 21:47:46 Re: createdb.exe and psql.exe without Promting Password