Re: Update with last known location?

From: James David Smith <james(dot)david(dot)smith(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Erik Darling <edarling80(at)gmail(dot)com>, PGSQL-Novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Update with last known location?
Date: 2014-01-31 14:50:21
Message-ID: CAMu32ADBsx7Bx3G1az__7xVNL-ATcR0QyLwRmd4F+f+45=i+_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hi Kevin / all,

I tried to run the update with your code but got the following error
(which I realise is now diverging from my original question):

ERROR: could not write to hash-join temporary file: No space left on device
********** Error **********

Any thoughts? I didn't tweak the memory allocation or cost factors
before I ran it - maybe that would help? I didn't bother doing it
initially as I wasn't too fussed if it took say 4 hours instead of 3
etc. I'm not in a huge rush.

Regarding the system, it's a virtual Ubuntu 12.04 desktop with
PostgreSQL 9.3 and PostGIS 2.0. I asked our IT guy for the details
about it and he gave the below. I'm the only user of the visualization
set-up at the moment, so all the resource *should* be available to me
he says:

20 cores of Intel Xeon E-2690 v2 processor @ 3GHz
48GB of memory at 1866 MHz
Hard drive based on 10 x 15K RPM SAS hard disks

Cheers

James

On 30 January 2014 20:08, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> James David Smith <james(dot)david(dot)smith(at)gmail(dot)com> wrote:
>> On 30 January 2014 16:45, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>
>>> update test_data x
>>> set the_geom = y.the_geom
>>> from test_data y
>>> where x.the_geom is null
>>> and y.ppid = x.ppid
>>> and y.the_geom is not null
>>> and y.point_time < x.point_time
>>> and not exists
>>> (
>>> select * from test_data z
>>> where z.ppid = y.ppid
>>> and z.the_geom is not null
>>> and z.point_time > y.point_time
>>> and z.point_time < x.point_time
>>> )
>>> ;
>
>> I can't run it on my actual data at the moment, or rather I don't want
>> too, as I set the below query going an hour or so ago and thought I
>> should let it finish really.
>>
>> If it hasn't finished when I come into work tomorrow (I'm leaving the
>> office shortly) then I'll cancel it and give yours a crack instead. To
>> my novice eye, your code looks like it'll be quicker than the below
>> anyway.
>
>> SELECT
>> data.ppid,
>> data.point_time,
>> CASE
>> WHEN data.the_geom IS NULL
>> THEN (
>> --Get all locations with an earlier time stamp for that ppid
>> SELECT geom.the_geom
>> FROM hybrid_location geom
>> WHERE data.ppid = geom.ppid
>> AND geom.point_time < data.point_time
>> AND geom.the_geom IS NOT NULL
>> AND NOT EXISTS (
>> -- Cull all but the most recent one
>> SELECT *
>> FROM hybrid_location cull
>> WHERE cull.ppid = geom.ppid
>> AND geom.the_geom IS NOT NULL
>> AND cull.point_time < data.point_time
>> AND cull.point_time > geom.point_time
>> AND cull.the_geom IS NOT NULL
>> )
>> )
>> ELSE data.the_geom
>> end
>> FROM hybrid_location data;
>
> Yeah, that's basically the same approach, but it uses a subquery
> which I don't think can get pulled up -- so I think it will need to
> do a lot of the work once for each row with a NULL the_geom column
> where my version can do it once, period.
>
> If you haven't tuned your configuration, you can probably speed up
> any of these versions with a few tweaks to memory allocation and
> cost factors. The most significant for this query would probably
> be to set work_mem to something around 25% of machine RAM divided
> by the number of active connections you can have.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Anh Pham 2014-02-01 02:31:00 lots of errors from fmgr.h when I try to write a C UDF
Previous Message Kevin Grittner 2014-01-30 20:08:30 Re: Update with last known location?