Re: Recursive Arrays 101

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: David Blomstrom <david(dot)blomstrom(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Recursive Arrays 101
Date: 2015-10-26 20:59:48
Message-ID: 562E9444.7050006@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/26/2015 02:51 PM, David Blomstrom wrote:
> I'm focusing primarily on vertebrates at the moment, which have a
> total of (I think) about 60,000-70,000 rows for all taxons (species,
> families, etc.). My goal is to create a customized database that does
> a really good job of handling vertebrates first, manually adding a few
> key invertebrates and plants as needed.
>
> I couldn't possibly repeat the process with invertebrates or plants,
> which are simply overwhelming. So, if I ever figure out the Catalogue
> of Life's database, then I'm simply going to modify its tables so they
> work with my system. My vertebrates database will override their
> vertebrate rows (except for any extra information they have to offer).
>
> As for "hand-entry," I do almost all my work in spreadsheets. I spent
> a day or two copying scientific names from the Catalogue of Life into
> my spreadsheet. Common names and slugs (common names in a URL format)
> is a project that will probably take years. I might type a scientific
> name or common name into Google and see where it leads me. If a
> certain scientific name is associated with the common name "yellow
> birch," then its slug becomes yellow-birch. If two or more species are
> called yellow birch, then I enter yellow-birch in a different table
> ("Floaters"), which leads to a disambiguation page.
>
> For organisms with two or more popular common names - well, I haven't
> really figured that out yet. I'll probably have to make an extra table
> for additional names. Catalogue of Life has common names in its
> database, but they all have upper case first letters - like American
> Beaver. That works fine for a page title but in regular text I need to
> make beaver lowercase without changing American. So I'm just starting
> from square one and recreating all the common names from scratch.

Multiple names can be handled in at least two ways. A child table of
species which has species id and alternate name per record - then you
can get all other-names back by species id. Of course going from
altername-name back to species may get you multiple species. Or, welcome
to postgres' arrays-as-column: you can have one column, maybe called
aliases which is an array of string.
>
> It gets still more complicated when you get into "specialist names."
> ;) But the system I've set up so far seems to be working pretty nicely.
>
> On Mon, Oct 26, 2015 at 1:41 PM, Rob Sargent <robjsargent(at)gmail(dot)com
> <mailto:robjsargent(at)gmail(dot)com>> wrote:
>
> On 10/26/2015 02:29 PM, David Blomstrom wrote:
>
> Sorry for the late response. I don't have Internet access at
> home, so I only post from the library or a WiFi cafe.
>
> Anyway, where do I begin?
>
> Regarding my "usage patterns," I use spreadsheets (Apple's
> Numbers program) to organize data. I then save it as a CSV
> file and import it into a database table. It would be very
> hard to break with that tradition, because I don't know of any
> other way to organize my data.
>
> On the other hand, I have a column (Rank) that identifies
> different taxonomic levels (kingdom, class, etc.). So I can
> easily sort a table into specific taxonomic levels and save
> one level at a time for a database table.
>
> There is one problem, though. I can easily put all the
> vertebrate orders and even families into a table. But genera
> might be harder, and species probably won't work; there are
> simply too many. My spreadsheet program is almost overwhelmed
> by fish species alone. The only solution would be if I could
> import Mammals.csv, then import Birds.csv, Reptiles.csv, etc.
> But that might be kind of tedious, especially if I have to
> make multiple updates.
>
> Yes I suspect you spreadsheet will be limited in rows, but of
> course you can send all the spreadsheets to a single table in the
> database. If that's what you want. You don't have to, but you see
> mention of tables millions of records routinely. On the other
> hand, if performance becomes an issue with the single table
> approach you might want to look at "partitioning". But I would be
> surprised if you had to go there.
>
> What is your data source? How much hand-entry are you doing?
> There are tools which (seriously) upgrade the basic 'COPY into
> <table>' command.
>
>
> As for "attributes," I'll post my table's schema, with a
> description, next.
>
>
>
>
>
> --
> David Blomstrom
> Writer & Web Designer (Mac, M$ & Linux)
> www.geobop.org <http://www.geobop.org>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Blomstrom 2015-10-26 21:02:11 Re: Recursive Arrays 101
Previous Message David Blomstrom 2015-10-26 20:51:19 Re: Recursive Arrays 101