Re: Ideas for building a system that parses medical research publications/articles

From: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Ideas for building a system that parses medical research publications/articles
Date: 2021-06-05 17:03:01
Message-ID: bf8ec5de-81f2-d096-981d-a2c98debdc83@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Στις 5/6/21 4:45 μ.μ., ο/η Vijaykumar Jain έγραψε:
> http://tika.apache.org/ <http://tika.apache.org/>
>
I checked, it behaves better with downloaded PDF rather than URL PDFs,
in the 2nd case the metadata are poor.

Does not work with nih articles (but this is general problem not tika's )

> To get started with collecting doc metadata. It looks this tool can
> help you started.
> postgres does support fuzzy text search, so I do think dumping meta
> data /abstract in postgresql and then using trigram tsearch etc like
> extensions it should work well for a POC.
> this being a pg mailing list :) what would be your expectation of type
> of data and growth of data would be your queries.
> If you store data to support multiple lingual papers, will postgresql
> be able to handle ?
> Ideally the docs would be stored somewhere on a object storage etc and
> the link of the same would be stored in the db when someone would
> request to read the whole paper.
> Long before I read this
> https://www.citusdata.com/blog/2017/04/20/analyzing-postgresql-email-archives/
> <https://www.citusdata.com/blog/2017/04/20/analyzing-postgresql-email-archives/>
>
> So if this could work, your POC should too :) with postgresql.
>
>
> On Sat, 5 Jun 2021 at 5:14 PM Laura Smith
> <n5d9xq3ti233xiyif2vp(at)protonmail(dot)ch
> <mailto:n5d9xq3ti233xiyif2vp(at)protonmail(dot)ch>> wrote:
>
>
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Saturday, 5 June 2021 12:14, Achilleas Mantzios
> <achill(at)matrix(dot)gatewaynet(dot)com
> <mailto:achill(at)matrix(dot)gatewaynet(dot)com>> wrote:
>
>
> >
> > I know its a huge work, but you are missing a point. Nobody
> wishes to
> > compete with anyone. This is a about a project, a parent-advocacy
> > non-profit that ONLY aims to save the sick children (or maybe also
> > very young adults) of a certain spectrum . So the goal is to
> make the
> > right tools for researchers, clinicians and parents. This market
> is too
> > small to even consider making any money out of it, but the
> research is
> > still very expensive and the progress slower than optimum.
>
>
> Unfortunately I'm not "missing a point", your final paragraph
> summarises your position.
>
> You have been taken in by the very charitable goal of saving sick
> children.
>
> Unfortunately your head has been disconnected from your heart.
>
> If we put the charitable purpose to one side and take a purely
> objective view at what you want to do, my original statement still
> stands, i.e. the certainty that you are grossly underestimating
> the technical and practical complexities of what you want to achieve.
>
>
> --
> Thanks,
> Vijay
> Mumbai, India

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2021-06-05 17:03:33 Re: Ideas for building a system that parses medical research publications/articles
Previous Message Achilleas Mantzios 2021-06-05 16:56:03 Re: Ideas for building a system that parses medical research publications/articles