Re: Dealing with ordered hierarchies

From: Tim Uckun <timuckun(at)gmail(dot)com>
To: Alban Hertroys <haramrae(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Dealing with ordered hierarchies
Date: 2017-07-24 13:15:56
Message-ID: CAGuHJrPG0xsJA19Ds0NpU4zrHHomqr2T7bZgVcFgjSLMJ96u4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I don't like the approach with a large increment. It would mean complicated
logic to see if you filled the gap and then update all the other peers if
you did. It sounds like the re-order is going to be expensive no matter
what. My primary concern are race conditions though. What if two or more
users are trying to update the hierarchy either by inserts or updates? I
can definitely see a situation where we have issues transactions trip over
each other.

On Mon, Jul 24, 2017 at 10:32 PM, Alban Hertroys <haramrae(at)gmail(dot)com> wrote:

>
> > On 24 Jul 2017, at 9:02, Tim Uckun <timuckun(at)gmail(dot)com> wrote:
> >
> > I have read many articles about dealing with hierarchies in postgres
> including nested sets, ltree, materialized paths, using arrays as
> parentage, CTEs etc but nobody talks about the following scenario.
> >
> > Say I have a hierarchy like this
> >
> > 1
> > 1.1
> > 1.1.1
> > 1.1.2
> > 1.2
> > 1.3
> > 2
> > 2.1
> >
> > In this hierarchy the order is very important and I want to run
> frequent(ish) re-ordering of both subsets and entire trees and even more
> frequent inserts.
>
> Since they're hierarchies, the order is already in the structure of the
> data. Do you really need to add it to the data or would it suffice to add
> it to the query result?
>
> If that's the case, you only need a simple ordering number per branch,
> like 1, 2, 3, etc. The full path (ie. '1.1.3') gets generated in the query.
>
> I regularly generate structures like your above example using recursive
> CTE's. The "path" helps to get the results in the correct order for
> starters (although you're in for a surprise if any of your levels go past 9
> in the above). It's great how you can "trickle" all kinds of calculations
> through the hierarchy using CTE's.
>
> Something like this should help to get you started (untested, I usually do
> this in Oracle, which has several peculiarities):
>
> with recursive hierarchy (parent, node, sequence_number, path) as (
> select null, node, sequence_number, sequence_number::text from
> table
> union all
> select h.node, t.node, t.sequence_number, h.path || '.' ||
> t.sequence_number::text
> from table t
> join hierarchy h on (t.parent = h.node)
> )
> select node, path
> from hierarchy
>
> Where the table "table" has fields:
> parent -- parent node
> node -- actual node
> sequence_number -- Order of sequence of this node within its
> parent branch
>
> You may need to add a surrogate key if your parent/child combinations are
> otherwise not unique. That would then also be the way to address a node
> directly (otherwise it would be (parent, node)).
>
> For the sequence_number I'd probably just use an actual sequence generator
> with a large enough gap to prevent problems with reordering items later on
> (increment by 10 for example). You will also want to pad the sequence
> numbers in the "path" column with leading zeroes (otherwise 10 sorts
> between 1 and 2, etc.), enough that you won't run out of numbers per level.
>
> If you require your sequence numbers to be subsequent in the result: You
> can add a field with such numbering based on the existing sequence_numbers,
> by using a windowing function in each branch of the union - it's down to a
> fairly basic row numbering problem at this point.
>
> > Scenario 1: I want to insert a child into the 1.1 subtree. The next
> item should be 1.1.3 and I can't figure out any other way to do this other
> than to subquery the children and to figure out the max child ID, add one
> to it which is a race condition waiting to happen.
>
> You would first need to determine which node is the parent node by
> traversing the hierarchy up to the point of insertion and use the (parent,
> node) or surrogate key fields to append under. Similar to using '1.1',
> really.
>
> > Scenario 2: I now decide the recently inserted item is the second most
> important so I reset the ID to 1.1.2 and then increment 1.1.2 (and possibly
> everything below). Again this is both prone to race conditions and
> involves a heavy update.
>
> No need to bother with that (much) with the above approach. And if you do
> run out of gaps, you can fairly simply update all the sequence numbers
> under the same parent without causing concurrency issues and without
> requiring locks/synchronisation.
>
> > Is there a better way to deal with this or is the complexity unavoidable?
>
> I think it's better, but I don't think its ideal. It's fairly complicated
> to understand, for one thing, which can cause problems for maintenance (I
> have colleagues who don't dare to touch my queries, for example).
>
> > I should state that like most database reads will be much more frequent
> than writes and inserts will be more frequent than updates (re-ordering)
>
> More of the logic (and thus system load) gets moved to the read-side of
> things, that's probably a drawback, but most of it is just keeping state
> and counting. I don't expect that to be all that much.
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message PAWAN SHARMA 2017-07-24 13:47:21 Re: monitoring PostgreSQL
Previous Message PAWAN SHARMA 2017-07-24 12:20:20 Re: monitoring PostgreSQL