Re: How to perform a long running dry run transaction without blocking

From: Robert Leach <rleach(at)princeton(dot)edu>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: How to perform a long running dry run transaction without blocking
Date: 2025-02-07 22:02:03
Message-ID: 15A999B4-9D53-4F35-84B1-7B8696256EE9@princeton.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>> Anyway, thanks so much for your help. This discussion has been very useful, and I think I will proceed at first, exactly how you suggested, by queuing every validation job (using celery). Then I will explore whether or not I can apply the "on timeout" strategy in a small patch.
>> Incidentally, during our Wednesday meeting this week, we actually opened our public instance to the world for the first time, in preparation for the upcoming publication. This discussion is about the data submission interface, but that interface is actually disabled on the public-facing instance. The other part of the codebase that I was primarily responsible for was the advanced search. Everything else was primarily by other team members. If you would like to check it out, let me know what you think: http://tracebase.princeton.edu <http://tracebase.princeton.edu>
>
> I would have to hit the books again to understand all of what is going on here.

It's a mass spec tracing database. Animals are infused with radio labeled compounds and mass spec is used to see what the animal's biochemistry turns those compounds into. (My undergrad was biochem, so I've been resurrecting my biochem knowledge, as needed for this project. I've been mostly doing RNA and DNA sequence analysis since undergrad, and most of that was prokaryotic.

> One quibble with the Download tab, there is no indication of the size of the datasets. I generally like to know what I am getting into before I start a download. Also, is there explicit throttling going on? I am seeing 10.2kb/sec, whereas from here https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page I downloaded a 47.65M file at 41.9MB/s

Thank you! Not knowing the download size is exactly a complaint I had. That download actually uses my advanced search interface (in browse mode). There is the same issue with the download buttons on the advanced search. With the streaming, we're not dealing with temp files, which is nice, at least for the advanced search, but we can't know the download size that way. So I had wanted a progress bar to at least show progress (current record per total). I could even estimate the size (an option I explored for a few days). Eventually, I proposed a celery solution for that and I was overruled.

As for the download in the nav bar, we have an issue to change that to a listing of actual files broken down by study (3 files per study). There's not much actual utility from a user perspective for downloading everything anyway. We've just been focussed on other things. In fact, we have a request from a user for that specific feature, done in a way that's compatible with curl/scp. We just have to figure out how to not have to CAS authenticate each command, something I don't have experience with.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message ravi k 2025-02-08 08:16:44 Re: Commit Latency
Previous Message Adrian Klaver 2025-02-07 21:36:54 Re: How to perform a long running dry run transaction without blocking