RE: Ideas for building a system that parses medical research publications/articles [EXT]

From: Daniel Perrett <dp13(at)sanger(dot)ac(dot)uk>
To: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: RE: Ideas for building a system that parses medical research publications/articles [EXT]
Date: 2021-06-07 11:22:05
Message-ID: a79013c5c1d94d85b3498fc298218efc@sanger.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I think the key word here that will help you is biocuration and it's an established field involving people with scientific, computational, and linguistic backgrounds who are familiar with the problem space so I would suggest talking to people working in this area first to get an idea of what's feasible, what's already out there, etc., as they will know this better than the Postgres community.

You can see an example of the sort of annotation that is fully automated at the moment here:

https://monarchinitiative.org/tools/text-annotate

Given the potential impact on human health, some level of manual involvement in annotation is frequently part of the workflow.

Daniel

-----Original Message-----
From: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
Sent: 05 June 2021 10:49
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Ideas for building a system that parses medical research publications/articles [EXT]

Hello

I am imagining a system that can parse papers from various sources
(web/files/etc) and in various formats (text, pdf, etc) and can store metadata for this paper ,some kind of global ID if applicable, authors, areas of research, whether the paper is "new", "highlighted", "historical", type (e.g. Case reports, Clinical trials), symptoms (e.g.
tics, GI pain, psychological changes, anxiety, ), and other key attributes (I guess dynamic), it must be full text searchable, etc.

I am at the very beginning in this and it is done on a fully volunteer basis.

Lots of questions : is there any scientific/scholar analysis software already available? If yes and is really good and open source , then this will influence the rest of decisions. Otherwise , I'll have to form a team that can write one, in this case I'll have to decide DB, language, etc. I work 20 years with pgsql so it is the natural choice for any kind of data, I just ask this for the sake of completeness.

All ideas welcome.

--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andrew Dunstan 2021-06-07 11:52:44 Re: AWS forcing PG upgrade from v9.6 a disaster
Previous Message RAJAMOHAN 2021-06-07 11:11:46 Re: Symbolic link breaks for postgresql.auto.conf