From: | wkipjohn(at)gmail(dot)com |
---|---|
To: | Rob Sargent <robjsargent(at)gmail(dot)com>, wkipjohn(at)gmail(dot)com |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: SQL report |
Date: | 2009-07-31 03:55:52 |
Message-ID: | 0016364c637589ebcb046ff865c1@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi Rob,
I have default B-Tree indexes created for each of the indexed columes and
primary key columes. (No multiple columes indexe or NULL FIRST or
DESC/ASC). I am using PostgreSQL 8.3 with the auto vacuum daemon on. I
assume analyse will be automatically run to collect statistics for use by
the planner and there is no maintainance for B-tree indexes once it is
created. (Please point me out if I am wrong about this)
I will probably try to partition the status table to group more recent
status records together to minimize the dataset I am querying.
Thx
John
On Jul 31, 2009 1:16am, Rob Sargent <robjsargent(at)gmail(dot)com> wrote:
> I would be curious to know the performance curve for let's say 20K, 40K ,
> 60K, 80K, 100K records. And what sort of indexing you have, whether or
> not it's clustered, re-built and so on.
> One could envision partitioning the status table such that recent records
> were grouped together (on the assumption that they will be most
> frequently "reported").
> wkipjohn(at)gmail(dot)com wrote:
> I have the following senario.
> I have a tracking system. The system will record the status of an object
> regularly, all the status records are stored in one table. And it will
> keep a history of maximum 1000 status record for each object it tracks.
> The maximum objects the system will track is 100,000. Which means I will
> potentially have a table size of 100 million records.
> I have to generate a report on the latest status of all objects being
> tracked at a particular point in time, and also I have to allow user to
> sort and filter on different columes in the status record displayed in
> the report.
> The following is a brief description in the status record (they are not
> actual code)
> ObjectRecord(
> objectId bigint PrimaryKey
> desc varchar
> )
> StatusRecord (
> id bigint PrimaryKey
> objectId bigint indexed
> datetime bigint indexed
> capacity double
> reliability double
> efficiency double
> )
> I have tried to do the following, it works very well with around 20,000
> objects. (The query return in less than 10s) But when I have 100,000
> objects it becomes very very slow. (I don't even have patience to wait
> for it to return.... I kill it after 30 mins)
> select * from statusrecord s1 INNER JOIN ( SELECT objectId ,
> MAX(datetime) AS msdt FROM statusrecord WHERE startDatetime
> I did try to write a store procedure like below, for 100,000 objects and
> 1000 status records / object, it returns in around 30 mins.
> CREATE OR REPLACE FUNCTION getStatus(pitvalue BIGINT) RETURNS SETOF
> statusrecord AS $BODY$
> DECLARE
> id VARCHAR;
> status statusrecord%ROWTYPE;
> BEGIN
> FOR object IN SELECT * FROM objectRecord
> LOOP
> EXECUTE 'SELECT * FROM statusrecord WHERE objectId = ' ||
> quote_literal(object.objectId) ||
> ' AND datetime
> INTO status;
> IF FOUND THEN
> RETURN NEXT status;
> END IF;
> END LOOP;
> RETURN;
> END
> $BODY$ LANGUAGE plpgsql;
> Just wanna to know if anyone have a different approach to my senario.
> Thanks alot.
> John
From | Date | Subject | |
---|---|---|---|
Next Message | Harald Fuchs | 2009-07-31 11:37:02 | Re: Show CAS, USD first; the left ordered by currency name |
Previous Message | nha | 2009-07-31 00:46:47 | Re: SELECT max(time) group by problem |