Quick Links

Re: Compare rows

From:	Jason Hihn <jhihn(at)paytimepayroll(dot)com>
To:	Greg Spiegelberg <gspiegelberg(at)cranel(dot)com>, PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Compare rows
Date:	2003-10-08 17:13:26
Message-ID:	NGBBLHANMLKMHPDGJGAPAEBACOAA.jhihn@paytimepayroll.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Comment interjected below.

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org]On Behalf Of Greg
> Spiegelberg
> Sent: Wednesday, October 08, 2003 12:28 PM
> To: PgSQL Performance ML
> Subject: Re: [PERFORM] Compare rows
>
>
> Josh Berkus wrote:
> > Greg,
> >
> >
> >>Anyone have any suggestions on how to efficiently compare
> >>rows in the same table? This table has 637 columns to be
> >>compared and 642 total columns.
> >
> >
> > 637 columns? Are you sure that's normalized? It's hard for
> me to conceive
> > of a circumstance where that many columns would be necessary.
> >
> > If this isn't a catastrophic normalization problem (which it
> sounds like),
> > then you will probably still need to work through procedureal
> normalization
> > code, as SQL simply doesn't offer any way around naming all the
> columns by
> > hand. Perhaps you could describe the problem in more detail?
> >
>
> The data represents metrics at a point in time on a system for
> network, disk, memory, bus, controller, and so-on. Rx, Tx, errors,
> speed, and whatever else can be gathered.
>
> We arrived at this one 642 column table after testing the whole
> process from data gathering, methods of temporarily storing then
> loading to the database. Initially, 37+ tables were in use but
> the one big-un has saved us over 3.4 minutes.
>
> The reason for my initial question was this. We save changes only.
> In other words, if system S has row T1 for day D1 and if on day D2
> we have another row T1 (excluding our time column) we don't want
> to save it.

Um, isn't this a purpose of a key? And I am confused. Do you want to UPDATE
the changed columns? or skip it all together?
You have: (System, Day, T1 | T2 |...Tn )
But should use:
Master: (System, Day, Table={T1, T2, .. Tn)) [Keys: sytem, day, table]
T1 { System, Day, {other fields}} [foreign keys [system, day]

This should allow you to find your dupes very fast (indexes!) and save a lot
of space (few/no null columns), and now you don't have to worry about
comparing fields, and moving huge result sets around.

> That said, if the 3.4 minutes gets burned during our comparison which
> saves changes only we may look at reverting to separate tables. There
> are only 1,700 to 3,000 rows on average per load.
>
> Oh, PostgreSQL 7.3.3, PHP 4.3.1, RedHat 7.3, kernel 2.4.20-18.7smp,
> 2x1.4GHz PIII, 2GB memory, and 1Gbs SAN w/ Hitachi 9910 LUN's.
>
> Greg
>
> --
> Greg Spiegelberg
> Sr. Product Development Engineer
> Cranel, Incorporated.
> Phone: 614.318.4314
> Fax: 614.431.8388
> Email: gspiegelberg(at)Cranel(dot)com
> Cranel. Technology. Integrity. Focus.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

In response to

Re: Compare rows at 2003-10-08 16:27:41 from Greg Spiegelberg

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Bruce Momjian	2003-10-08 17:28:53	Re: PostgreSQL vs. MySQL
Previous Message	Josh Berkus	2003-10-08 17:10:26	Re: Compare rows