Re: Recursive Arrays 101

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: David Blomstrom <david(dot)blomstrom(at)gmail(dot)com>, Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Recursive Arrays 101
Date: 2015-10-26 21:53:42
Message-ID: 562EA0E6.5090500@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/26/2015 01:51 PM, David Blomstrom wrote:
> I'm focusing primarily on vertebrates at the moment, which have a total
> of (I think) about 60,000-70,000 rows for all taxons (species, families,
> etc.). My goal is to create a customized database that does a really
> good job of handling vertebrates first, manually adding a few key
> invertebrates and plants as needed.
>
> I couldn't possibly repeat the process with invertebrates or plants,
> which are simply overwhelming. So, if I ever figure out the Catalogue of
> Life's database, then I'm simply going to modify its tables so they work
> with my system. My vertebrates database will override their vertebrate
> rows (except for any extra information they have to offer).
>
> As for "hand-entry," I do almost all my work in spreadsheets. I spent a
> day or two copying scientific names from the Catalogue of Life into my
> spreadsheet. Common names and slugs (common names in a URL format) is a
> project that will probably take years. I might type a scientific name or
> common name into Google and see where it leads me. If a certain
> scientific name is associated with the common name "yellow birch," then
> its slug becomes yellow-birch. If two or more species are called yellow
> birch, then I enter yellow-birch in a different table ("Floaters"),
> which leads to a disambiguation page.
>
> For organisms with two or more popular common names - well, I haven't
> really figured that out yet. I'll probably have to make an extra table
> for additional names. Catalogue of Life has common names in its
> database, but they all have upper case first letters - like American
> Beaver. That works fine for a page title but in regular text I need to
> make beaver lowercase without changing American. So I'm just starting
> from square one and recreating all the common names from scratch.

I think there has to be a better way as this is just a formatting issue
Can't remember what programming language you are working in, but in Python:

In [13]: s = 'American Beaver'

In [14]: s.capitalize()
Out[14]: 'American beaver'

In [15]: s.lower()
Out[15]: 'american beaver'

>
> It gets still more complicated when you get into "specialist names." ;)
> But the system I've set up so far seems to be working pretty nicely.
>
> On Mon, Oct 26, 2015 at 1:41 PM, Rob Sargent <robjsargent(at)gmail(dot)com
> <mailto:robjsargent(at)gmail(dot)com>> wrote:
>
> On 10/26/2015 02:29 PM, David Blomstrom wrote:
>
> Sorry for the late response. I don't have Internet access at
> home, so I only post from the library or a WiFi cafe.
>
> Anyway, where do I begin?
>
> Regarding my "usage patterns," I use spreadsheets (Apple's
> Numbers program) to organize data. I then save it as a CSV file
> and import it into a database table. It would be very hard to
> break with that tradition, because I don't know of any other way
> to organize my data.
>
> On the other hand, I have a column (Rank) that identifies
> different taxonomic levels (kingdom, class, etc.). So I can
> easily sort a table into specific taxonomic levels and save one
> level at a time for a database table.
>
> There is one problem, though. I can easily put all the
> vertebrate orders and even families into a table. But genera
> might be harder, and species probably won't work; there are
> simply too many. My spreadsheet program is almost overwhelmed by
> fish species alone. The only solution would be if I could import
> Mammals.csv, then import Birds.csv, Reptiles.csv, etc. But that
> might be kind of tedious, especially if I have to make multiple
> updates.
>
> Yes I suspect you spreadsheet will be limited in rows, but of course
> you can send all the spreadsheets to a single table in the database.
> If that's what you want. You don't have to, but you see mention of
> tables millions of records routinely. On the other hand, if
> performance becomes an issue with the single table approach you
> might want to look at "partitioning". But I would be surprised if
> you had to go there.
>
> What is your data source? How much hand-entry are you doing? There
> are tools which (seriously) upgrade the basic 'COPY into <table>'
> command.
>
>
> As for "attributes," I'll post my table's schema, with a
> description, next.
>
>
>
>
>
> --
> David Blomstrom
> Writer & Web Designer (Mac, M$ & Linux)
> www.geobop.org <http://www.geobop.org>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gavin Flower 2015-10-26 21:57:14 Re: Importing CSV File
Previous Message Gavin Flower 2015-10-26 21:52:59 Re: Recursive Arrays 101