From: | Martin Mueller <martinmueller(at)northwestern(dot)edu> |
---|---|
To: | "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | a simple-minded question about updating |
Date: | 2023-05-19 04:08:58 |
Message-ID: | CH2PR05MB67903EA4F305103B6BC28373C47C9@CH2PR05MB6790.namprd05.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I work with Postgres and wonder whether for my purposes there is a good-enough reason to update one of these days.
I’m an editor working with some 60,000 Early Modern texts, many of them in need of some editorial attention. The texts are XM encoded documents. Each word is wrapped in a <w> element with attributes for various linguistic metadata. Typically a type of error occurs several or many times, and at the margins they need individual attention. I use Python scripts to extract stuff from the main corpus—sometimes dozens, sometimes thousands or millions—turn them into keyword in contexts and import them into Postgres. I basically use Postgres as a giant spreadsheet. Its excellent string-handling routines make it relatively easy to to perform search and sort operations that identify tokens in need of correction. Once they corrections are made in Postgres—typically as batch updates-- I move them as a data frame into Python, and from Python I move them back into the texts.
I do this on a recent Mac with 64 GB of memory and a 6 cor i& processor. I use Data Studio as an editing interface.
Unless a more recent version of Postgress has additional string handling routines, or indexing routines that speed up working with tables with rows in the low millions, or other features that are likely to speed up operations, I don’t see any reasons to update.
I could imagine a table that has up to 40 million rows. That would be pretty sluggish on my current equipment, which handles up to 10 million rows quite comfortably.
A I right in thinking that given my tasks and equipment it would be a waste of time to update? Or is there something I’m missing?
Martin Mueller
Professor emeritus of English and Classiccs
Northwestern University
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2023-05-19 04:20:58 | Re: a simple-minded question about updating |
Previous Message | Kyotaro Horiguchi | 2023-05-19 04:02:06 | Re: PostgreSQL 13 - Logical Replication - ERROR: could not receive data from WAL stream: SSL SYSCALL error: EOF detected |