Re: Fwd: Tricky join and update with same table

From: Michael Swierczek <mike(dot)swierczek(at)gmail(dot)com>
To: James David Smith <james(dot)david(dot)smith(at)gmail(dot)com>
Cc: "pgsql-novice(at)postgresql(dot)org" <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Fwd: Tricky join and update with same table
Date: 2013-07-26 14:53:17
Message-ID: CAHp1f1MrE0eZ_aHRRKTauJ-7gJ8bkPLzkbRCKtSjz1uk43qo1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Fri, Jul 26, 2013 at 10:36 AM, James David Smith
<james(dot)david(dot)smith(at)gmail(dot)com> wrote:
> Hi Mike,
>
> About half an hour before you replied I actually managed to come up with
> that myself! :-)
>
> Using it in an update query is troubling me though. I've done it a few times
> now and it keeps updating the wrong fields.
>
> Luca sent me an email a short while ago with some code to try however, and
> I'm doing that at the moment. Though it's been running for about 30 minutes
> now and isn't done. Though there are 230,000 rows, so perhaps that isn't
> that surprising after all. Here it is for posterity:
>
> PDATE hlhs_day hd
> SET point_geom = ( SELECT hd2.point_geom
> FROM hlhs_day hd2
> WHERE hd.spid = hd2.spid
> AND hd2.point_geom IS NOT NULL
> AND hd2.point_time = ( SELECT min( hd3.point_time )
> FROM hlhs_day hd3
> WHERE hd.spid = hd3.spid
> AND hd3.point_geom IS NOT NULL
> AND hd3.point_time >
> hd.point_time
> )
> )
> WHERE hd.point_geom IS null
>
> Thanks
>
> James
>
>

This is a bit of a kluge solution, but if it were me I would take the
query I suggested (and you figured out on your own), put the results
in a text file, and then use Excel or some regular expressions in
Vim/Emacs/Notepad++/whatever to translate each line from
spidx,point_timex,point_geomx
to
update hlhs_day set point_geom = point_geomx where spid = spidx and
point_time = point_timex;
Then run it through psql or pgAdmin3.

I believe that would probably run much more quickly, and you can break
it up into batches.

Good luck either way.
-Mike

>
>
>
>
> On 26 July 2013 14:37, Michael Swierczek <mike(dot)swierczek(at)gmail(dot)com> wrote:
>>
>> On Fri, Jul 26, 2013 at 5:24 AM, James David Smith
>> <james(dot)david(dot)smith(at)gmail(dot)com> wrote:
>> > Hi all,
>> >
>> > Some help if you can please. I have GPS data for a number of people.
>> > Each
>> > person turned the GPS on, at home, at a different time of the day.
>> > Before
>> > that time, I presume that the person was in their house. I have made a
>> > table
>> > for each person for a whole 24 hours (one record per minute), and I now
>> > want
>> > to 'fill in' their location for the rows before they turned the GPS on.
>> > So
>> > for each person I want to take the first row where the point_geom is not
>> > null, and update all of the rows above it with that value. It's driving
>> > me
>> > nuts.
>> >
>> > spid | point_time | point_geom
>> > -----------------------------------------------------------
>> > 1 | 2012-01-01 00:01:00 |
>> > 1 | 2012-01-01 00:02:00 |
>> > 1 | 2012-01-01 00:03:00 | POINT(X, Y)
>> > 1 | 2012-01-01 00:04:00 | POINT(X, Y)
>> > 1 | 2012-01-01 00:05:00 | POINT(X, Y)
>> > 2 | 2012-01-01 00:01:00 |
>> > 2 | 2012-01-01 00:02:00 |
>> > 2 | 2012-01-01 00:03:00 |
>> > 2 | 2012-01-01 00:04:00 |
>> > 2 | 2012-01-01 00:05:00 | POINT(X, Y)
>> > 3 | 2012-01-01 00:01:00 |
>> > 3 | 2012-01-01 00:02:00 | POINT(X, Y)
>> > 3 | 2012-01-01 00:03:00 | POINT(X, Y)
>> > 3 | 2012-01-01 00:04:00 | POINT(X, Y)
>> > 3 | 2012-01-01 00:05:00 | POINT(X, Y)
>> >
>> > I've managed to select the correct row using this:
>> >
>> > SELECT spid, min(point_time) as point_time
>> > FROM hlhs_day
>> > WHERE point_geom IS NOT NULL
>> > GROUP BY spid;
>> >
>> > However when I try to add in the column 'point_geom' to the query, it
>> > won't
>> > work.
>> >
>> > Thanks for your help and suggestions.
>> >
>> > James
>>
>> Usually when I chime in on questions like this, someone comes along
>> with a better solution after I finish. But I think you can get the
>> information you want by "wrapping" that query:
>>
>> SELECT hd1.spid, hd1.point_time, hd2.point_geom
>> FROM
>> (SELECT spid, min(point_time) as point_time
>> FROM hlhs_day
>> WHERE point_geom IS NOT NULL GROUP BY spid) as hd1
>> INNER JOIN hlhs_day hd2 ON hd1.spid = hd2.spid AND hd1.point_time =
>> hd2.point_time
>> ORDER BY hd1.spid;
>>
>> I hope this helps.
>> --Mike
>
>

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Luca Ferrari 2013-07-26 16:05:28 Re: Fwd: Tricky join and update with same table
Previous Message James David Smith 2013-07-26 14:36:04 Re: Fwd: Tricky join and update with same table