Re: Help with queries.

From: Adrian Johnson <oriolebaltimore(at)gmail(dot)com>
To: "Oliveiros d'Azevedo Cristina" <oliveiros(dot)cristina(at)marktest(dot)pt>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Help with queries.
Date: 2010-09-21 23:16:55
Message-ID: AANLkTikkstpxYtKukzU_gDWTwE7qmjShkGZpRBLqPs4y@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello Oliver.
thanks for your reply. Here are my answers. Sorry I shot e-mail in
morning and I was not clear about it.

I am afraid, I am not jamming you with all information.

Thank you very much for your help.

====================================== your questions
====================================

First, answers to questions you asked:

> Please advice me,
> What do you call coordinates to, exactly? Sorry, my genetics knowledge is
> very poor.
== I call coordinates cto - crom

> You call the from and to fields coordinates?
==yes

> Do they have the same meaning as the from and to fields from table snps ?
== cfrom - cto fields have yes.

> What does it mean to map all coordinates in table snps?
== Means I want to take snps.cfrom and snps.cto and gene.gene by
checking if they are in range of gene.cfrom and gene.cto from gene
table.

> Do you wanna know which entries from snps map in the SRC gene?
== yes and more than that.

> Coordinates are the segment where the gene/sample fit in the chromosome? Is
> my understanding correct ?
== yes I am calling on "chr19" (cfrom - cto) (38162736 -
38163007) are coordinates.

=================================== illustration of example
==========================
I know basic queries, but I am finding it difficult to get what I am
describing below.
I am illustrating an example here .

Here I query for a gene RHPN2 from gene table:

gid | chromosome | cfrom | cto | frame | gene
-------+------------+----------+----------+-------+-------
93538 | chr19 | 38162736 | 38163007 | - | RHPN2
93540 | chr19 | 38173254 | 38173420 | - | RHPN2
93541 | chr19 | 38174563 | 38174720 | - | RHPN2
93542 | chr19 | 38176717 | 38176804 | - | RHPN2
93543 | chr19 | 38178766 | 38178971 | - | RHPN2
93544 | chr19 | 38182326 | 38182456 | - | RHPN2
93545 | chr19 | 38184987 | 38185154 | - | RHPN2
93546 | chr19 | 38185553 | 38185751 | - | RHPN2
93547 | chr19 | 38190754 | 38190931 | - | RHPN2
93548 | chr19 | 38194419 | 38194554 | - | RHPN2
93549 | chr19 | 38195387 | 38195475 | - | RHPN2
93551 | chr19 | 38204311 | 38204397 | - | RHPN2
93553 | chr19 | 38209244 | 38209383 | - | RHPN2
93555 | chr19 | 38226989 | 38227115 | - | RHPN2
93556 | chr19 | 38247524 | 38247603 | - | RHPN2

Now taking:
cto = 38162736 first row in above table
cfrom = 38247603 last row in above table
I want to get all those entries from snp table

# select sample_id, chromosome,cfrom, cto, refbase, consbase from snps
where cto > 38162736 and cfrom < 38247603 and chromosome = 'chr19' ;

sample_id | chromosome | cfrom | cto | refbase | consbase
-----------+------------+----------+----------+---------+----------
2 | chr19 | 38178828 | 38178829 | C | Y
5 | chr19 | 38182405 | 38182406 | T | Y
5 | chr19 | 38182424 | 38182425 | G | R
5 | chr19 | 38185101 | 38185102 | C | Y
3 | chr19 | 38182424 | 38182425 | G | R
1 | chr19 | 38178828 | 38178829 | C | Y
1 | chr19 | 38182424 | 38182425 | G | R
7 | chr19 | 38185101 | 38185102 | C | Y
6 | chr19 | 38185101 | 38185102 | C | Y
4 | chr19 | 38178828 | 38178829 | C | Y
4 | chr19 | 38182424 | 38182425 | G | R

From this result, we get that all samples range from 1 to 7 (sample_id).

Now, from above result (snps table) I want to filter:
1. Those that have any letter in consbase other than ATGC (although in
this case there are no A or T or G or C they are there)
2. Those entries that have sample_id 1, 2 and 3.
3. Unique entries that have cfrom and cto common to 1 and 2 and 1 and
3 but not 2 and 3.

For example see below: (say this is result X)
gene | sample_id | chromosome | cfrom | cto
-------+-----------+------------+----------+----------+---------+----------
RHPN2 | 2 | chr19 | 38178828 | 38178829
RHPN2 | 1 | chr19 | 38178828 | 38178829
RHPN2 | 3 | chr19 | 38182424 | 38182425
RHPN2 | 1 | chr19 | 38182424 | 38182425

Here for gene RHPN2:
cfrom - cto (38178828 | 38178829) is common to samples 1 and 2
cfrom - cto ( 38182424| 38182425) is common to samples 1 and 3.

But both these samples 1 and 2 and 1 and 3 belong same gene (RHPN2 ).

Now I have another table coverage, where for sample positions above,
I have the following data:
(say this is result y)
sample_id | chromosome | cfrom | cto | abase | tbase | gbase | cbase
-----------+------------+----------+----------+-------+-------+-------+-------
2 | chr19 | 38178828 | 38178829 | 0 | 29 | 2 | 44
1 | chr19 | 38178828 | 38178829 | 0 | 52 | 0 | 32
3 | chr19 | 38178828 | 38178829 | 0 | 0 | 0
| 28
1 | chr19 | 38182424 | 38182425 | 14 | 0 | 54 | 0
2 | chr19 | 38182424 | 38182425 | 17 | 0 | 55 | 0
3 | chr19 | 38182424 | 38182425 | 13 | 0 | 26 | 0

Now from table snps, I want to get results that satisfy conditions
that gave result X and combine this with results Y.

How is it possible to do this?

Thank you for your help.

Adrian.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Nicholas I 2010-09-22 07:32:28 insert into help
Previous Message Devrim GÜNDÜZ 2010-09-21 20:22:38 Re: available RPM package for 8.4.4?