From: | Janning Vygen <vygen(at)gmx(dot)de> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | getting the ranks out of items with SHARED |
Date: | 2005-07-12 08:48:51 |
Message-ID: | 200507121048.52113.vygen@gmx.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
in postgresql you have several possibilites to get the rank of items. A thread
earlier this year shows correlated subqueries (not very performant) and other
tricks and techniques to solve the ranking problem:
http://archives.postgresql.org/pgsql-general/2005-05/msg00157.php
The possibility to use a SHARED variable in plperl can be another nice way to
get a rank of items. ( good example showing SHARED in use is at
http://www.varlena.com/varlena/GeneralBits/114.php)
So i wrote two functions for my purpose:
CREATE OR REPLACE function ranking(int4) RETURNS int4 LANGUAGE 'plperl' AS $$
my %this;
$this{'punkte'} = shift;
$this{'ranking'} = $_SHARED{'prev'}{'ranking'}; # defaults to 0
$this{'count'} = $_SHARED{'prev'}{'count'} + 1; # defaults to 1
$this{'ranking'} = $this{'count'} unless
$this{'punkte'} == $_SHARED{'prev'}{'punkte'};
$_SHARED{'prev'} = \%this;
return $this{'ranking'};
$$;
CREATE OR REPLACE FUNCTION reset_ranking() RETURNS void LANGUAGE 'plperl' AS
$$
$_SHARED{'prev'} = undef;
$$;
Nice Thing: the function drops rankings which other ranking solutions in the
given thread can't. Like this:
rank | points
-------------
1 | 10
2 | 9
2 | 9
4 | 8
5 | 7
It drops rank 3 because we have to entries for second rank.
It would be even nice if you can write a genric ranking() function which takes
anyarray as an argument, but as far as i know you can't pass an "anyarray" to
a plperl function, right?
Now i can do the following in plpsql Function which updates a caching table
for me and it works fine:
PERFORM reset_ranking();
CREATE TEMP TABLE ranking AS
SELECT
*,
ranking(r1.gc_gesamtpunkte) AS rank
FROM (
SELECT
mg_name,
gc_gesamtpunkte
FROM temp_gc
ORDER BY gc_gesamtpunkte DESC, mg_name ASC
) AS r1
;
EXECUTE '
UPDATE temp_gc
SET gc_rank = ranking.rank
FROM ranking
WHERE temp_gc.mg_name = ranking.mg_name;
';
DROP TABLE ranking;
Problems arrise when you try to do the select and update step together without
any temporary table in between:
PERFORM reset_ranking();
UPDATE temp_gc SET gc_rank = ranking.rank
FROM (
SELECT
*,
ranking(r1.gc_gesamtpunkte) AS rank
FROM (
SELECT
mg_name,
gc_gesamtpunkte
FROM temp_gc
ORDER BY gc_gesamtpunkte DESC, mg_name ASC
) AS r1
) AS ranking
WHERE temp_gc.mg_name = ranking.mg_name;
';
I have a guess, what happens here: The order of the subselect statement is
dropped by the optimizer because the optimizer doesn't see the "side-effect"
of the ranking function. that's ok because using such functions isn't SQLish,
i guess.
Is there a way to FORCE the optimizer to keep things orders like the sql
statement author wanted it?
kind regards,
janning
From | Date | Subject | |
---|---|---|---|
Next Message | Bjørn T Johansen | 2005-07-12 10:22:33 | Windows version of PostgreSQL 8? |
Previous Message | Marco Gaiarin | 2005-07-12 08:19:37 | 7.2 -> 7.4: horrible performance hit! |