Quick Links

Re: Designing a DB for storing biological data

From:	Rich Shepard <rshepard(at)appl-ecosys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Designing a DB for storing biological data
Date:	2014-06-14 18:24:58
Message-ID:	alpine.LNX.2.11.1406141107100.17243@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Sat, 14 Jun 2014, Damir Dezeljin wrote:

> This is more a theoretical or better to say, conceptual question; still, I
> hope to get some feed backs from you folks. Additionally this is going to
> be a very long post :) off-topic: I asked a similar question on the MySQL
> forum as I'm still undecided if going with PostgreSQL or MySQL << I'm
> tempted at PostGIS.

Postgres. You can link it to spatial attributes with postgis if you want.
But, how to structure your database and tables is independent of the
analytical methods you use.

> I am designing a database for storing various biological and ecological
> data. Although there is no clear dividing line, it is possible to group
> the data into two groups, namely the measured (physical) and quantitative
> (mostly biological) data; I uploaded both a data sample and an initial
> draft of a DB model to this link
> <https://www.dropbox.com/sh/9gm2ezwrwhkz6xv/AAB3koD6Xzi48-2BhIEdwmlZa>.

Of course there's a clear division between biological, chemical, physical,
geomorphical, and other data.

I may be wrong, but this looks like a question from a student for a
project or master's degree.

> *Biological / quantitative data*

You have many choices: EPA's STORET or EMAP projects, ITIS (International
Taxonomic Identification System), or your own. Each type of biological data
should have its own table; e.g., vegegation, mammals, birds, fish, benthic
macroinvertebrates, microbes.

Each table should have a compound key: site_id, sample_date, species. This
makes each row unique, when possible. Otherwise. use a sequential key.

The table I use for fish data has this schema:

The table I use for benthos has this schema:

Table "public.benthos"
Column | Type | Modifiers

> *Measured / physical data*

> - Physical quantities may be outside the detection range of the measured
> instrument; in such a case, this needs to be recorded. I still do not have
> a clear idea how to do it. NULL’s do not seem to be a good choice to mark
> such data.

Here, too, you have choices. You can incorrectly record and analyze
chemical data below detection limits like EPA, states, and most regulatory
permit holders do, or you can do it correctly. This is a schema I use for
water quality data:

The key is the set (site, sampdate, param) so you can have multiple
samples, each of a different chemical constituent, at the same location and
date. (This works for air, soil, and other media chemistry, too). The
concentrations must be in the same units (e.g., mg/L). The column named
'ceneq1' is a flag: when the quant value is below the analytical laboratory's
reporting limit, set the flag to '1'; otherwise 0. Now you have a consistent
way to identify rows with less-than/non-detect/censored values. Since
reporting levels change as instruments become more sensitive and analysts
become better trained, you can have multiple reporting limits in your data.
That's where the 'floor' and 'ceiling' columns come in.

This should get you started.

Rich

--
Richard B. Shepard, Ph.D.
Applied Ecosystem Services, Inc. | Troutdale, OR 97060 USA
www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863

In response to

Designing a DB for storing biological data at 2014-06-14 15:52:43 from Damir Dezeljin

Responses

Re: Designing a DB for storing biological data at 2014-06-14 19:03:11 from Rich Shepard

Browse pgsql-general by date

	From	Date	Subject
Next Message	Rich Shepard	2014-06-14 19:03:11	Re: Designing a DB for storing biological data
Previous Message	Kevin Grittner	2014-06-14 18:15:57	Re: Using pg_start_backup() and pg_stop_backup() - using 9.1.2.2