Re: Query question

From: mailinglists(at)net-virtual(dot)com
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Query question
Date: 2009-01-15 18:56:06
Message-ID: 57839.69.109.177.118.1232045766.squirrel@69.109.177.118
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> On Wed, Jan 14, 2009 at 07:36:03PM -0800, mailinglists(at)net-virtual(dot)com
> wrote:
>> CREATE TABLE listings (
>> trans_id SERIAL,
>> mode CHAR(1),
>> listing_id INT,
>> region_id INT,
>> category INT
>> );
>>
>> "SELECT * FROM listings ORDER BY region_id, category, listing_id,
>> trans_id"
>
>> [...] what I want to do is get only the last transaction for
>> a given listing_id, because the earlier ones don't matter.
>
> If you have an index on (region_id,category,listing_id,trans_id) you
> should be able to do:
>
> SELECT region_id,category,listing_id,MAX(trans_id)
> FROM listings
> GROUP BY region_id,category,listing_id;
>
> And have PG answer this using the index (it'll only do this if it thinks
> there are many transactions for each group though).
>
>> On top of
>> that, each region_id and category_id has its own index. I need to be
>> able
>> to process the indexes in-full, one-at-a-time because there are too many
>> to hold that many open filehandles/processes at one time.
>
> Not sure what you mean by "index" here; I'm assuming you're talking
> about something outside PG, or am I missing some context?

Yes, sorry I wasn't clear... This is to keep track of incremental updates
to an index outside of Postgres

>> So, my question is, is there some way to return the rows in a
>> deterministic order, without actually having to do an explicit sort on
>> the
>> data? What I mean is, I don't care if category_id 4 / region_id 10 /
>> listing_id 10000 comes before category_id 1 / region_id 1 / lisitng_id 1
>> -- I just need them returned to me in that sort of grouped order
>> (although
>> sorted by trans_id).
>
> If you want to know all the transaction ids then you need to do the
> sort, if you only want the largest/latest then you're probably better
> off telling PG that's what you want (i.e. using GROUP BY and MAX
> aggregate and letting it make an appropiate decision).
>

I only need to know the max id, but the problem is that for each region_id
has N categories and I need to process each one individually. The query
you suggested does, indeed, sort them by category, but it does not sort
them by region_id.

Here's an example: (select region_id,category,listing_id,max(trans_id)
from listings_pending where region_id IN('3134000000', '2222000000')
group by region_id,category,listing_id;)

region_id | category_id | listing_id | max
------------+----------------+------------+--------
2222000000 | 1 | 2221473 | 640799
2222000000 | 1 | 2426142 | 845468
2222000000 | 1 | 2103599 | 522925
3134000000 | 1 | 2146326 | 565652
2222000000 | 1 | 2462112 | 881438
2222000000 | 1 | 1947690 | 367016
2222000000 | 1 | 2526731 | 946057
2222000000 | 1 | 2217864 | 637190
2222000000 | 1 | 2288420 | 707746

As you can see, at transaction 565652, I would close the index (which is
very expensive, because it actually has to do a "merge" of the newly
created index with old one), then reopen it at transaction 881438.

Thanks for your help!

- Greg

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Wenk 2009-01-15 19:33:32 Re: MD5 password issue
Previous Message Alvaro Herrera 2009-01-15 18:54:38 Re: MD5 password issue