Quick Links

Re: Compare rows

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Spiegelberg <gspiegelberg(at)cranel(dot)com>, PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Compare rows
Date:	2003-10-08 17:10:26
Message-ID:	200310081010.26576.josh@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Greg,

> The data represents metrics at a point in time on a system for
> network, disk, memory, bus, controller, and so-on. Rx, Tx, errors,
> speed, and whatever else can be gathered.
>
> We arrived at this one 642 column table after testing the whole
> process from data gathering, methods of temporarily storing then
> loading to the database. Initially, 37+ tables were in use but
> the one big-un has saved us over 3.4 minutes.

Hmmm ... if few of those columns are NULL, then you are probably right ...
this is probably the most normalized design. If, however, many of columns
are NULL the majority of the time, then the design you should be using is a
vertial child table, of the form ( value_type | value ).

Such a vertical child table would also make your comparison between instances
*much* easier, as it could be executed via a simple 4-table-outer-join and 3
where clauses. So even if you don't have a lot of NULLs, you probably want
to consider this.

> The reason for my initial question was this. We save changes only.
> In other words, if system S has row T1 for day D1 and if on day D2
> we have another row T1 (excluding our time column) we don't want
> to save it.

If re-designing the table per the above is not a possibility, then I'd suggest
that you locate 3-5 columns that:
1) are not NULL for any row;
2) combined, serve to identify a tiny subset of rows, i.e. 3% or less of the
table.

Then put a multi-column index on those columns, and do your comparison.
Hopefully the planner should pick up on the availablity of the index and scan
only the rows retrieved by the index. However, there is the distinct
possibility that the presence of 637 WHERE criteria will confuse the planner,
causing it to resort to a full table seq scan; in that case, you will want to
use a subselect to force the issue.

Or, as Joe Conway suggested, you could figure out some kind of value hash that
uniquely identifies your rows.

--
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Re: Compare rows at 2003-10-08 16:27:41 from Greg Spiegelberg

Responses

Re: Compare rows at 2003-10-08 19:10:53 from Greg Spiegelberg

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jason Hihn	2003-10-08 17:13:26	Re: Compare rows
Previous Message	Jeff	2003-10-08 17:05:28	Re: Presentation