Re: max length of sql select statement (long!)

From: markus brosch <brosch(at)gmx(dot)de>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: max length of sql select statement (long!)
Date: 2003-07-14 12:57:19
Message-ID: 1058187439.5983.2.camel@sam
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


It's me again - with the easy and complex statement at the same time ;-)
Who can help me?!

As far, my solution is still unsatisfactory.
I'll explain my full task this time,
otherwise it is hard to explain!

My tables:
(redundant for speed reasons - doesn't matter - only requests!)

===============================================================

table: goGraph
primarykey(parent,child)
index (B-tree) on parent column and on child column.

parent | child
--------------------------
GO:0000001 | GO:0000002
GO:0000001 | GO:0000003
GO:0000002 | GO:0000005
GO:0000002 | GO:0000006
GO:0000003 | GO:0000006
GO:0000003 | GO:0000007
... | ...
about 15000 elements more!

This represents a direct acyclic graph:

GO:0000001
/ \
v v
GO:0000002 GO:0000003
/ \ / \
v v v v
GO:0000005 GO:0000006 GO:0000007

===============================================================

table: binaryInteraction
primarykey(baitProtein, preyProtein, gogoKey)
index (B-tree) on gogokey.

baitProtein | preyProtein | "baitGoId" | "preyGoId" | gogoKey
--------------------------------------------------------------
123 | 123 | GO:0000001 | GO:0000001 | 11
123 | 123 | GO:0000020 | GO:0000001 | 120
123 | 123 | GO:0000020 | GO:0000020 | 2020
123 | 345 | GO:0000001 | GO:0000001 | 11
123 | 345 | GO:0000020 | GO:0000001 | 120
... | ... | ... | ... | ...
up to several millions entries!

===============================================================

So, what I have to do is:
Someone choose 2 goIds. For these ids and all their children
(information from goGraph) I want the count(*) of "fits" within the
table binaryInteraction. That means, baitGoId and preyGoId must be
either the two choosen goIds or one of the related children. I use the
gogoKey for this task: I compute Integers (within Java) from the goIds
and sort smallerGoId.concatenate(biggerGoId) - so I've unique keys for
baitGo / preyGo pairs. One more problem: The goGraph (see table) is a
graph ... and for that reason one node can have more parents. If I
choose e.g. GO:000002 and GO:000003 from the example above and want to
compute their children, I don't want to take GO:000006 into account (two
parents in which I am intested in). That menas, whenever I ask for
children of two nodes, I want a DISTINCT SET of children.

Example how I am processing the data at the moment:

User chosse GoId1: GO:0000005 and GO:0000008;

Java:
- get all childs of GO:000005 by jdbc from goGraph - collection "one"
- get all childs of GO:000006 by jdbc from goGraph - collection "two"
- compute intersection of the two collections
- substract of each collection the intersection
- result: distinct children of two goIds

Now the binaryInteraction table came into our game ;-)
I apply the one distinct collection "one" and collection "two" to
gogoKey:

-----------------------

1. SELECT count(*)
FROM (SELECT DISTINCT bait, prey
FROM binaryInteraction
WHERE gogoKey IN (gogokey1, gogokey2, gogokey3, ... )
)
AS foo;

gogokeyX will be processed by java -> it generates for each
go-go-combination the unique gogokey.

Speed: medium

Limit: only 10000 Elements within the IN statement can be processed by
default. In postgres.conf I can change this limit, as Rod Taylor already
posted. But in the manual there is a warning of BufferOverflow if the
value it too high ...

If I reach the limit, I get something like:
java.sql.SQLException: ERROR: Expression too complex: nesting depth
exceeds max_expr_depth = 10000

BTW: Why nested depth? Within IN I compute thousands of "OR"s, but why
nested ... ???

-----------------------

2. for all gogokeys I generate a temporary table and do a join with the
binaryInteraction table.

SELECT COUNT(*)
FROM (SELECT DISTINCT binaryInteraction.bait binaryInteraction.prey
FROM binaryInteraction, tempTable
WHERE binaryInteraction.gogokey = tempTable.gogokey)
AS temp)

Speed: very slow, but no limit!

-----------------------

After all, I want to compute and precalculate every possible
goId-goId-combination ;-) So, each of these things I explained here,
I've to do millions of time! I tried to find a recursive solution -
impossible!

Who can help? Maybe there is a way to compute it in one but more complex
sql-statemnt. Me as far I am still new to sql I don't know the "tricks"!
If there would be a way to calculate the intersection ... then maybe a
solution is not that far ... Also I was thinking about a denormalized
goGraph table: parent -> allChilds. Then I compute (within SQL?) the
intersection between the 2 GoIds and then I apply it to
binaryInteraction.

Thanks for any ideas and thanks at least
for reading that long story ;-)

Cheers Markus

PS: If you have any additional questions: please feel free to ask!

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Steve Wampler 2003-07-14 13:38:18 Re: [SQL] Replacing a simple nested query?
Previous Message Joe Conway 2003-07-13 22:01:13 Re: Replacing a simple nested query?