death of array?

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Subject: death of array?
Date: 2017-04-07 02:32:31
Message-ID: 73B8E1AF-BD4A-4E6A-B192-8394D59EF47C@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I believe I have an appropriate use[1] for an array column, but I’m having a hard time using that array in a join clause.
The SQL question is how to use an array value in the join clause? I’m using postgres 9.6 on ubuntu 16.04

create table probandset (id UUID, probands UUID[])
create table segment(id uuid, chr int, sbp int, epb int, probandset_id)
create table people_member(people_id uuid, person_id uuid)
create table people(id uuid, name text)
create table person(id uuid, name text)

probandset.probands is a set of person.id

I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:
join (select id, unnest(probands) as proband from probandset as l) as pu on s.probandset_id = pu.id
join people_member pm on pu.proband = pm.person_id
join people pl on pm.people_id = pl.id
where pl.name = ‘target population name’

The query I have works (showing only half of it here) and I’m not pushing the performance of it too much right now as I’m more interested in the SQL problem. However, I am getting a seq scan on people_member, not surprisingly. People_member will be blocks of people, 50 to 1000 per block and each block loaded in a single transaction, no editing: should this be clustered (and reclustered)? Over time would the seq scan go away as people_member.people_id becomes more discriminating? (There is an index on people_member.people_id). A people has a know set of probands (and we need all subsets of those probands)

Current discussions at our end on whether or not a probandset may be filled with members of more than one people. If not I might be able to add people_id to probandset and I am home free.
But that still doesn’t answer the SQL question.

Thanks for reading, sorry it’s a tad wordy.
rjs

[1] I’m dealing with power sets. A given set of N element has 2**N subsets and the number of subset grows exponentially (obviously). But to model this ‘normally’ would require an even larger number of subset member records. (That summation left to the student[2]). My choice was to list each subset once with and array of elements. Still an exponential problem, but only one exponential problem.
It all breaks down somewhere between 20 and 30 elements but we'll burn that bridge when we get there.
[2] Something like the sum over i=0..N of ((N choose i) * i maybe?

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message David G. Johnston 2017-04-07 03:02:53 Re: death of array?
Previous Message Adrian Klaver 2017-04-06 00:01:47 Re: Crosstab function