Best way to store a threaded message list/tree in SQL

From: Mike Christensen <imaudi(at)comcast(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Best way to store a threaded message list/tree in SQL
Date: 2009-03-26 23:57:10
Message-ID: 49CC1656.5050100@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi guys -

I'm looking for the best way to store a set of "posts" as well as
comments on those posts in SQL. Imagine a design similar to a "Wall" on
Facebook where users can write posts on their wall and other users can
comment on those posts. I need to be able to display all wall posts as
well as the comments.

When I first started out, I came up with a table such as:

CREATE Table wallposts
(
id uuid NOT NULL,
posted timestamp NOT NULL,
userid uuid NOT NULL,
posterid uuid NOT NULL,
parentid uuid NOT NULL,
comment text NOT NULL
)

id is unique, parentid will be null on original posts and point to an id
if the row is a comment on an existing post. Easy enough and super fast
to insert new data. However, doing a select which would return me:

POST 1
COMMENT 1
COMMENT 2
POST 2
COMMENT 1
COMMENT 2

Regardless of which order the rows existed in the database proved to be
extremely difficult. I obviously can't just order by date, as someone
might comment on post 1 after post 2 has been posted. If I do a LEFT
JOIN to get the parent post on all rows, and then sort by that date
first, all the original posts group together as they'd have a value of null.

Then I got this idea:

CREATE TABLE wallposts
(
id uuid NOT NULL,
threadposted timestamp,
posted timestamp,
...
comment text
)

On an original post, threadposted and posted would be the same. On a
comment, timestamp would be the time the original post was posted and
"posted" would be the time the comment on that thread was posted. Now I
can just do:

select * from wallposts order by threadposted, posted;

This works great, however one thing irks me. If two people create a
post at the same time, comments on the two posts would get munged
together as they'd have the same timestamp. I could use "ticks" instead
of a datetime, but still the accuracy is only 1/1000 of a second. I
could also setup a unique constraint on threadposted and posted which
makes inserts a bit more expensive, but if I had multiple database
servers in a farm, the chance of a collision is still there. I almost
went ahead with this anyway since the chances of this happening are
extremely small, but I wanted to see if I could eat my cake and still
have it too. Mostly for my own educational curiosity.

Third solution would be to store this data in the form of a graph. Each
node would have a v-left and v-right pointer. I could order by "left"
which would traverse the tree in the order I need. However, every time
someone inserts a comment I'd have to re balance the whole tree. This
would create a ton of row locking, and all sorts of problems if the site
was very busy. Plus, it's kinda extreme and also causes replication
problems. So I tossed this idea quickly.

I also thought about just storing the original posts and then
serializing the comments in a binary form, since who cares about
individual comments. This would be very fast, however if a user wants
to delete their comment or append a new comment to the end, I have to
deserialize this data, modify the structure, then serialize it back and
update the row. If a bunch of people are commenting on the same post at
the same time, I might have random issues with that.

So here's what I eventually did. I query for all the posts ordered by
date entered. In the middle ware layer, I loop through the recordset
and create a "stack" of original posts, each node on the stack points to
a linked list of comments. When I come across an original post, I push
a new node on the stack and when I come across a comment I add a node to
the linked list. I organize this in memory so I can traverse the
recordset once and have O(n). After I create the in-memory
representation of the wall, I traverse through this data structure again
and write out HTML. This works great and has super fast inserts and
super fast selects, and no weird row locking issues; however it's a bit
heavier on my presentation layer and requires me to build an in memory
representation of the user's wall to move stuff around so it's in the
right order. Still, I believe this is the best approach I've found so far.

I thought I'd check with some other SQL experts and see if 1) Postgres
has some super nifty tree functions that I've never heard of and 2) see
if there's actually a way to do this using joins or unions or something
which would still be performant with millions of users. Thoughts?

Mike

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Fetter 2009-03-27 00:55:18 Re: Enumerating a row set
Previous Message Scott Marlowe 2009-03-26 23:37:07 Re: Is there a meaningful benchmark?