From: | Jiří Nádvorník <nadvornik(dot)ji(at)gmail(dot)com> |
---|---|
To: | "'Vitalii Tymchyshyn'" <vit(at)tym(dot)im> |
Cc: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Cursor + upsert (astronomical data) |
Date: | 2014-07-27 14:35:21 |
Message-ID: | 04b801cfa9a7$ffdcb280$ff961780$@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi Vitalii, thank you for your reply.
The problem you suggested can in the most pathological way be, that these observations are on one line. As you suggested it, the B would be in the middle. So A and C are not in 1 arcsec range of each other, but they must be within 1 arcsec of their common average coordinates. If the distances between A,B,C are 1 arcsec for each, the right solution is to pick B as reference identifier and assign A and C to it.
We already tried the approach you suggest with applying a grid based on the Q3C indexing of the database. We were not just rounding the results, but using the center of the Q3C “square” in which the observation took place. The result was poor however – 22% of the identifiers were closer to each other than 1 arcsec. That means that when you crossmatch the original observations to them, you don’t know which one to use and you have duplicates. The reason for this is that nearly all of the observations are from SMC (high density of observations), which causes that you have more than 2 “rounded” positions in a row and don’t know which ones to join together (compute average coordinates from it). If it is not clear enough I can draw it on an image for you.
Maybe the simple round up would have better results because the squares are not each the same size and you can scale them only by 2 (2-times smaller, or larger square). We used a squre with the side cca 0.76 arcsec which approximately covers the 1 arcsec radius circle.
Oh and one more important thing. The difficulty of our data is not that it is 3e8 rows. But in the highest density, there are cca 1000 images overlapping. Which kills you when you try to self-join the observations to find neighbours for each of them – the quadratic complexity is based on the overlappingon the image (e.g. 10000 observations on one image with another 999 images overlapping it means 10000 *1000^2).
Best regards,
Jiri Nadvornik
From: tivv00(at)gmail(dot)com [mailto:tivv00(at)gmail(dot)com] On Behalf Of Vitalii Tymchyshyn
Sent: Sunday, July 27, 2014 8:06 AM
To: Jiří Nádvorník
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Cursor + upsert (astronomical data)
I am not sure I understand the problem fully, e.g. what to do if there are observations A,B and C with A to B and B to C less then treshold and A to C over treshold, but anyway.
Could you first apply a kind of grid to your observations? What I mean is to round your coords to, say, 1/2 arcsec on each axe and group the results. I think you will have most observations grouped this way and then use your regular algorithm to combine the results.
Best regards, Vitalii Tymchyshyn
From | Date | Subject | |
---|---|---|---|
Next Message | Craig James | 2014-07-27 15:35:25 | Re: Cursor + upsert (astronomical data) |
Previous Message | Vitalii Tymchyshyn | 2014-07-27 06:05:42 | Re: Cursor + upsert (astronomical data) |