RE: [HACKERS] sort on huge table

From: "Ansley, Michael" <Michael(dot)Ansley(at)intec(dot)co(dot)za>
To: "'t-ishii(at)sra(dot)co(dot)jp'" <t-ishii(at)sra(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: RE: [HACKERS] sort on huge table
Date: 1999-11-04 08:41:57
Message-ID: 1BF7C7482189D211B03F00805F8527F748C208@S-NATH-EXCH2
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Now that's a close to linear as you are going to get. Pretty good I think:
a sort of one billion rows in half an hour.

Mikea

>> -----Original Message-----
>> From: Tatsuo Ishii [mailto:t-ishii(at)sra(dot)co(dot)jp]
>> Sent: Thursday, November 04, 1999 10:30 AM
>> To: Tom Lane
>> Cc: t-ishii(at)sra(dot)co(dot)jp; pgsql-hackers(at)postgreSQL(dot)org
>> Subject: Re: [HACKERS] sort on huge table
>>
>>
>> >
>> >Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
>> >> I have compared current with 6.5 using 1000000
>> tuple-table (243MB) (I
>> >> wanted to try 2GB+ table but 6.5 does not work in this case). The
>> >> result was strange in that current is *faster* than 6.5!
>> >
>> >> RAID5
>> >> current 2:29
>> >> 6.5.2 3:15
>> >
>> >> non-RAID
>> >> current 1:50
>> >> 6.5.2 2:13
>> >
>> >> Seems my previous testing was done in wrong way or the behavior of
>> >> sorting might be different if the table size is changed?
>> >
>> >Well, I feel better now, anyway ;-). I thought that my first cut
>> >ought to have been about the same speed as 6.5, and after I added
>> >the code to slurp up multiple tuples in sequence, it should've been
>> >faster than 6.5. The above numbers seem to be in line with that
>> >theory. Next question: is there some additional effect that comes
>> >into play once the table size gets really huge? I am thinking maybe
>> >there's some glitch affecting performance once the temp file size
>> >goes past one segment (1Gb). Tatsuo, can you try sorts of say
>> >0.9 and 1.1 Gb to see if something bad happens at 1Gb? I could
>> >try rebuilding here with a small RELSEG_SIZE, but right at the
>> >moment I'm not certain I'd see the same behavior you do...
>>
>> Ok. I have run some testings with various amount of data.
>>
>> RedHat Linux 6.0
>> Kernel 2.2.5-smp
>> 512MB RAM
>> Sort mem: 80MB
>> RAID5
>>
>> 100 million tuples 1:31
>> 200 4:24
>> 300 7:27
>> 400 11:11 <-- 970MB
>> 500 14:01 <-- 1.1GB (segmented files)
>> 600 18:31
>> 700 22:24
>> 800 24:36
>> 900 28:12
>> 1000 32:14
>>
>> I didn't see any bad thing at 1.1GB (500 million).
>> --
>> Tatsuo Ishii
>>
>> ************
>>

Browse pgsql-hackers by date

  From Date Subject
Next Message Ansley, Michael 1999-11-04 08:53:41 RE: [HACKERS] getting new serial value of serial insert
Previous Message Tatsuo Ishii 1999-11-04 08:30:00 Re: [HACKERS] sort on huge table