From: | markus brosch <brosch(at)gmx(dot)de> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: max length of sql select statement (long!) |
Date: | 2003-07-14 12:57:19 |
Message-ID: | 1058187439.5983.2.camel@sam |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
It's me again - with the easy and complex statement at the same time ;-)
Who can help me?!
As far, my solution is still unsatisfactory.
I'll explain my full task this time,
otherwise it is hard to explain!
My tables:
(redundant for speed reasons - doesn't matter - only requests!)
===============================================================
table: goGraph
primarykey(parent,child)
index (B-tree) on parent column and on child column.
parent | child
--------------------------
GO:0000001 | GO:0000002
GO:0000001 | GO:0000003
GO:0000002 | GO:0000005
GO:0000002 | GO:0000006
GO:0000003 | GO:0000006
GO:0000003 | GO:0000007
... | ...
about 15000 elements more!
This represents a direct acyclic graph:
GO:0000001
/ \
v v
GO:0000002 GO:0000003
/ \ / \
v v v v
GO:0000005 GO:0000006 GO:0000007
===============================================================
table: binaryInteraction
primarykey(baitProtein, preyProtein, gogoKey)
index (B-tree) on gogokey.
baitProtein | preyProtein | "baitGoId" | "preyGoId" | gogoKey
--------------------------------------------------------------
123 | 123 | GO:0000001 | GO:0000001 | 11
123 | 123 | GO:0000020 | GO:0000001 | 120
123 | 123 | GO:0000020 | GO:0000020 | 2020
123 | 345 | GO:0000001 | GO:0000001 | 11
123 | 345 | GO:0000020 | GO:0000001 | 120
... | ... | ... | ... | ...
up to several millions entries!
===============================================================
So, what I have to do is:
Someone choose 2 goIds. For these ids and all their children
(information from goGraph) I want the count(*) of "fits" within the
table binaryInteraction. That means, baitGoId and preyGoId must be
either the two choosen goIds or one of the related children. I use the
gogoKey for this task: I compute Integers (within Java) from the goIds
and sort smallerGoId.concatenate(biggerGoId) - so I've unique keys for
baitGo / preyGo pairs. One more problem: The goGraph (see table) is a
graph ... and for that reason one node can have more parents. If I
choose e.g. GO:000002 and GO:000003 from the example above and want to
compute their children, I don't want to take GO:000006 into account (two
parents in which I am intested in). That menas, whenever I ask for
children of two nodes, I want a DISTINCT SET of children.
Example how I am processing the data at the moment:
User chosse GoId1: GO:0000005 and GO:0000008;
Java:
- get all childs of GO:000005 by jdbc from goGraph - collection "one"
- get all childs of GO:000006 by jdbc from goGraph - collection "two"
- compute intersection of the two collections
- substract of each collection the intersection
- result: distinct children of two goIds
Now the binaryInteraction table came into our game ;-)
I apply the one distinct collection "one" and collection "two" to
gogoKey:
-----------------------
1. SELECT count(*)
FROM (SELECT DISTINCT bait, prey
FROM binaryInteraction
WHERE gogoKey IN (gogokey1, gogokey2, gogokey3, ... )
)
AS foo;
gogokeyX will be processed by java -> it generates for each
go-go-combination the unique gogokey.
Speed: medium
Limit: only 10000 Elements within the IN statement can be processed by
default. In postgres.conf I can change this limit, as Rod Taylor already
posted. But in the manual there is a warning of BufferOverflow if the
value it too high ...
If I reach the limit, I get something like:
java.sql.SQLException: ERROR: Expression too complex: nesting depth
exceeds max_expr_depth = 10000
BTW: Why nested depth? Within IN I compute thousands of "OR"s, but why
nested ... ???
-----------------------
2. for all gogokeys I generate a temporary table and do a join with the
binaryInteraction table.
SELECT COUNT(*)
FROM (SELECT DISTINCT binaryInteraction.bait binaryInteraction.prey
FROM binaryInteraction, tempTable
WHERE binaryInteraction.gogokey = tempTable.gogokey)
AS temp)
Speed: very slow, but no limit!
-----------------------
After all, I want to compute and precalculate every possible
goId-goId-combination ;-) So, each of these things I explained here,
I've to do millions of time! I tried to find a recursive solution -
impossible!
Who can help? Maybe there is a way to compute it in one but more complex
sql-statemnt. Me as far I am still new to sql I don't know the "tricks"!
If there would be a way to calculate the intersection ... then maybe a
solution is not that far ... Also I was thinking about a denormalized
goGraph table: parent -> allChilds. Then I compute (within SQL?) the
intersection between the 2 GoIds and then I apply it to
binaryInteraction.
Thanks for any ideas and thanks at least
for reading that long story ;-)
Cheers Markus
PS: If you have any additional questions: please feel free to ask!
From | Date | Subject | |
---|---|---|---|
Next Message | Steve Wampler | 2003-07-14 13:38:18 | Re: [SQL] Replacing a simple nested query? |
Previous Message | Joe Conway | 2003-07-13 22:01:13 | Re: Replacing a simple nested query? |