Re: Recursive Arrays 101

From: David Blomstrom <david(dot)blomstrom(at)gmail(dot)com>
To: Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Recursive Arrays 101
Date: 2015-10-26 20:51:19
Message-ID: CAA54Z0ifU4NFxQgdJ2a0zusYbdZW--_23mrf3gNc1fWv2jZ3GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm focusing primarily on vertebrates at the moment, which have a total of
(I think) about 60,000-70,000 rows for all taxons (species, families,
etc.). My goal is to create a customized database that does a really good
job of handling vertebrates first, manually adding a few key invertebrates
and plants as needed.

I couldn't possibly repeat the process with invertebrates or plants, which
are simply overwhelming. So, if I ever figure out the Catalogue of Life's
database, then I'm simply going to modify its tables so they work with my
system. My vertebrates database will override their vertebrate rows (except
for any extra information they have to offer).

As for "hand-entry," I do almost all my work in spreadsheets. I spent a day
or two copying scientific names from the Catalogue of Life into my
spreadsheet. Common names and slugs (common names in a URL format) is a
project that will probably take years. I might type a scientific name or
common name into Google and see where it leads me. If a certain scientific
name is associated with the common name "yellow birch," then its slug
becomes yellow-birch. If two or more species are called yellow birch, then
I enter yellow-birch in a different table ("Floaters"), which leads to a
disambiguation page.

For organisms with two or more popular common names - well, I haven't
really figured that out yet. I'll probably have to make an extra table for
additional names. Catalogue of Life has common names in its database, but
they all have upper case first letters - like American Beaver. That works
fine for a page title but in regular text I need to make beaver lowercase
without changing American. So I'm just starting from square one and
recreating all the common names from scratch.

It gets still more complicated when you get into "specialist names." ;) But
the system I've set up so far seems to be working pretty nicely.

On Mon, Oct 26, 2015 at 1:41 PM, Rob Sargent <robjsargent(at)gmail(dot)com> wrote:

> On 10/26/2015 02:29 PM, David Blomstrom wrote:
>
>> Sorry for the late response. I don't have Internet access at home, so I
>> only post from the library or a WiFi cafe.
>>
>> Anyway, where do I begin?
>>
>> Regarding my "usage patterns," I use spreadsheets (Apple's Numbers
>> program) to organize data. I then save it as a CSV file and import it into
>> a database table. It would be very hard to break with that tradition,
>> because I don't know of any other way to organize my data.
>>
>> On the other hand, I have a column (Rank) that identifies different
>> taxonomic levels (kingdom, class, etc.). So I can easily sort a table into
>> specific taxonomic levels and save one level at a time for a database table.
>>
>> There is one problem, though. I can easily put all the vertebrate orders
>> and even families into a table. But genera might be harder, and species
>> probably won't work; there are simply too many. My spreadsheet program is
>> almost overwhelmed by fish species alone. The only solution would be if I
>> could import Mammals.csv, then import Birds.csv, Reptiles.csv, etc. But
>> that might be kind of tedious, especially if I have to make multiple
>> updates.
>>
>> Yes I suspect you spreadsheet will be limited in rows, but of course you
> can send all the spreadsheets to a single table in the database. If that's
> what you want. You don't have to, but you see mention of tables millions
> of records routinely. On the other hand, if performance becomes an issue
> with the single table approach you might want to look at "partitioning".
> But I would be surprised if you had to go there.
>
> What is your data source? How much hand-entry are you doing? There are
> tools which (seriously) upgrade the basic 'COPY into <table>' command.
>
>
> As for "attributes," I'll post my table's schema, with a description, next.
>>
>>
>

--
David Blomstrom
Writer & Web Designer (Mac, M$ & Linux)
www.geobop.org

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rob Sargent 2015-10-26 20:59:48 Re: Recursive Arrays 101
Previous Message David Blomstrom 2015-10-26 20:42:07 Re: Recursive Arrays 101