Re: SQL problem: bank account

From: Dmitry Tkach <dmitry(at)openratings(dot)com>
To: "Erik G(dot) Burrows" <eburrows(at)erikburrows(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SQL problem: bank account
Date: 2003-06-02 14:28:16
Message-ID: 3EDB5F00.8080407@openratings.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I am afraid, this looks even uglier then your second solution, but
should work, and be quicker...

-- You need this to avoid having to rescan the whole table for each
customerid every time and resort the results
create index customer_txstamp_idx on bank_account (customer_id, ts);

select ba.* from bank_account ba where
transaction_id = (select transaction_id from bank_account where
customer_id = ba.customer_id order by customer_id desc, ts desc limit 1);

Now, note that, if you have lots of different customers in that table,
it will still take a while to fetch them all (although, it should still
be a lot quicker then half an hour) - in that case, you may consider
either getting them one-by-one (by adding ... and customer_id=? to the
above query) or using cursors...

I hope, it helps...

Dima

Erik G. Burrows wrote:

>It seems to me this is a simple problem, but the solution eludes me.
>
>I have a table:
>
>bank_account (
> transaction_id int not null serial,
> customer_id int not null references customer(id),
> ts timestamp not null default now(),
> amount float not null,
> balance float not null,
> primary key(transaction_id)
>)
>
>I need to get the most recent transaction for each customer. I need only
>the transaction ID, but the entire row would be best.
>
>
>
>I have two solutions, both of which are too slow for use in my
>interactive web-based interface:
>
>Solution1: Outer left self join:
>SELECT
> ba1.*
>FROM
> bank_account ba1
> LEFT OUTER JOIN bank_account ba2
> ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts
>WHERE
> ba2.ts IS NULL;
>
>This query works great on tables of less than a few thousand rows. For
>my 300k row table, it takes several hours.
>
>
>Solution2: max-concat trick
>SELECT
> split_part(max(
> extract(EPOCH from ts)::VARCHAR || ' ' ||
> transaction_id::VARCHAR), ' ', 2)::INT
>FROM
> bank_account
>GROUP BY
> customer_id
>
>This is an ugly and obviously inefficient solution, but it does the job
>in about 1/2 hour. Still too long though.
>
>
>
>I've been working on this problem for days, and consulting friends. No
>elegant, fast solution is presenting itself. As I said, I feel I'm not
>seeing the obvious solution in front of my face.
>
>In the mean-time I can use this query to do the job on a per-customer
>basis:
>
>select
> *
>from
> bank_account
>where
> id = <the customer's ID>
> and ts = (select max(ts) from bank_account ba2 where ba2.customer_id =
>bank_account.customer_id);
>
>
>However, doing this for all 40,000 customers is not workable as a manual
>process. My last resort is to do it this way to pre-generate the report,
>but I'd far rather do it real-time.
>
>Help! My brain hurts!
>
>
>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Dmitry Tkach 2003-06-02 14:31:33 Re: Using & - operator
Previous Message Tom Lane 2003-06-02 13:42:37 Re: SQL problem: bank account