Re: Sorting Improvements for 8.4

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, Michał Zaborowski <michal(dot)zaborowski(at)gmail(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sorting Improvements for 8.4
Date: 2007-12-20 01:42:02
Message-ID: D425483C2C5C9F49B5B7A41F8944154701000B4A@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Gregory Stark
> Sent: Wednesday, December 19, 2007 5:26 PM
> To: Jeff Davis
> Cc: Mark Mielke; Michał Zaborowski; Simon Riggs; Ron Mayer; pgsql-
> hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Sorting Improvements for 8.4
>
>
> "Jeff Davis" <pgsql(at)j-davis(dot)com> writes:
>
> > test=> explain analyze select * from sorter order by t;
> > test=> explain analyze select * from sorter order by b;
> > test=> explain analyze select * from sorter order by f;
> >
> > On my machine this table fits easily in memory (so there aren't any disk
> > reads at all). Sorting takes 7 seconds for floats, 9 seconds for binary
> > data, and 20 seconds for localized text. That's much longer than it
> > would take to read that data from disk, since it's only 70MB (which
> > takes a fraction of a second on my machine).
> >
> > I think this disproves your hypothesis that sorting happens at disk
> > speed.
>
> I suspect most of that is spent just copying the data around. Which would
> not
> be helped by having multiple threads doing the copying -- and in fact
> might be
> exacerbated if it required an extra copy to consolidate all the data in
> the
> end.

Benchmarking a single system will really only explain that system.
Someone may have a disk farm with 2GB/Sec throughput:
http://www.sql-server-performance.com/articles/per/system_storage_configuration_p1.aspx

But such a configuration is very unlikely.

Someone may have 10GB/S NIC cards, but those too, are rare.

So for any benchmark, we will really just end up with a number for that system.

Typically, disk is the bottleneck.
I found this on the net somewhere, but it's quite a useful table for capacity planning (to find the weak link in the chain using back of the envelope calculations):

Interface Width Frequency Bytes/Sec Bits/Sec
4-way interleaved PC1600 (DDR200) SDRAM 4 x 64bits 100 MHz DDR 6.4 GB/s 51 Gbps
Opteron HyperTransport memory bus 128bits 200 MHz DDR 6.4 GB/s 51 Gbps
Pentium 4 "800 MHz" FSB 64bits 200 MHz QDR 6.4 GB/s 51 Gbps
PC2 6400 (DDR-II 800) SDRAM 64bits 400 MHz DDR 6.4 GB/s 51 Gbps
PC2 5300 (DDR-II 667) SDRAM 64bits 333 MHz DDR 5.3 GB/s 43 Gbps
Pentium 4 "533 MHz" FSB 64bits 133 MHz QDR 4.3 GB/s 34 Gbps
PC2 4300 (DDR-II 533) SDRAM 64bits 266 MHz DDR 4.3 GB/s 34 Gbps
2-channel PC1066 RDRAM 2 x 16bits 533 MHz DDR 4.3 GB/s 34 Gbps
PCI-X 533 64bits 533 MHz 4.3 GB/s 34 Gbps
PCI-Express x16 serial/16lanes 2.5 GHz 4 GB/s 32 Gbps
Pentium 4 "400 MHz" FSB 64bits 100 MHz QDR 3.2 GB/s 25.6 Gbps
2-channel PC800 RDRAM 2 x 16bits 400 MHz DDR 3.2 GB/s 25.6 Gbps
2-way interleaved PC1600 (DDR200) SDRAM 2 x 64bits 100 MHz DDR 3.2 GB/s 25.6 Gbps
PC2 3200 (DDR-II 400) SDRAM 64bits 200 MHz DDR 3.2 GB/s 25.6 Gbps
PC3200 (DDR400) SDRAM 64bits 200 MHz DDR 3.2 GB/s 25.6 Gbps
PC2700 (DDR333) SDRAM 64bits 167 MHz DDR 2.7 GB/s 21 Gbps
PC2100 (DDR266) SDRAM 64bits 133 MHz DDR 2.1 GB/s 17 Gbps
AGP 8x 32bits 533 MHz 2.1 GB/s 17 Gbps
PCI-X 266 64bits 266 MHz 2.1 GB/s 17 Gbps
PCI-Express x8 serial/8lanes 2.5 GHz 2 GB/s 16 Gbps
EV6 bus (Athlon/Duron FSB) 64bits 100 MHz DDR 1.6 GB/s 13 Gbps
PC1600 (DDR200) SDRAM 64bits 100 MHz DDR 1.6 GB/s 13 Gbps
PC800 RDRAM 16bits 400 MHz DDR 1.6 GB/s 13 Gbps
PC150 SDRAM 64bits 150 MHz 1.3 GB/s 10.2 Gbps
10 gigabit ethernet serial 10 GHz 1.25 GB/s 10 Gbps
OC-192 serial 9.953 GHz 1.24 GB/s 9.953 Gbps
133 MHz FSB 64bits 133 MHz 1.06 GB/s 8.5 Gbps
PC133 SDRAM 64bits 133 MHz 1.06 GB/s 8.5 Gbps
AGP 4x 32bits 266 MHz 1.06 GB/s 8.5 Gbps
PCI-X 64bits 133 MHz 1.06 GB/s 8.5 Gbps
PCI-Express x4 serial/4lanes 2.5 GHz 1 GB/s 8 Gbps
100 MHz FSB 64bits 100 MHz 800 MB/s 6.4 Gbps
PC100 SDRAM 64bits 100 MHz 800 MB/s 6.4 Gbps
PC66 SDRAM 64bits 66 MHz 533 MB/s 4.3 Gbps
fast/wide PCI 64bits 66 MHz 533 MB/s 4.3 Gbps
AGP 2x 32bits 133 MHz 533 MB/s 4.3 Gbps
single-link DVI 12bits 165 MHz DDR 495 MB/s 3.96 Gbps
Ultra-320 SCSI 16bits 160 MHz 320 MB/s 2.6 Gbps
OC-48 network serial 2.488 GHz 311 MB/s 2.488 Gbps
AGP 32bits 66 MHz 266 MB/s 2.1 Gbps
PCI-Express x1 serial 2.5 GHz 250 MB/s 2 Gbps
Serial ATA/1500 disk serial 1.5 GHz 187 MB/s 1.5 Gbps
Ultra-160 SCSI 16bits 80 MHz 160 MB/s 1.3 Gbps
OC-24 network serial 1.244 GHz 155 MB/s 1.244 Gbps
PCI 32bits 33 MHz 133 MB/s 1.06 Gbps
ATA/133 disk 8bits 66 MHz DDR 133 MB/s 1.06 Gbps
gigabit ethernet serial 1 GHz 125 MB/s 1 Gbps
ATA/100 disk 8bits 50 MHz DDR 100 MB/s 800 Mbps
IEEE 1394b serial 800 MHz 100 MB/s 800 Mbps
Ultra-2 Wide SCSI 16bits 40 MHz 80 MB/s 640 Mbps
OC-12 network serial 622.08 MHz 77.7 MB/s 622.08 Mbps
ATA/66 disk 8bits 33 MHz DDR 66 MB/s 533 Mbps
USB-2 serial 480 MHz 60 MB/s 480 Mbps
IEEE 1394 serial 400 MHz 50 MB/s 400 Mbps
Ultra Wide SCSI 16bits 20 MHz 40 MB/s 320 Mbps
ATA/33 disk 8bits 16.6 MHz DDR 33 MB/s 266 Mbps
Fast Wide SCSI 16bits 10 MHz 20 MB/s 160 Mbps
OC-3 network serial 155.52 MHz 19.4 MB/s 155.52 Mbps
100baseT ethernet serial 100 MHz 12.5 MB/s 100 Mbps
OC-1 network serial 51.84 MHz 6.5 MB/s 51.84 Mbps
T-3 network serial 45 MHz 5.6 MB/s 44.736 Mbps
USB serial 12 MHz 1.5 MB/s 12 Mbps
10baseT ethernet serial 10 MHz 1.25 MB/s 10 Mbps
IrDA-2 serial 4 MHz 500 KB/s 4 Mbps
T-1 network serial 1.5 MHz 193 KB/s 1.544 Mbps

> How long does a "explain analyze sinmple select * from sorter" take?
>
> And assuming you're doing disk sorts (in disk cache) you're doing quite a
> lot
> of copying to temporary files (in disk cache) and then back to memory.
>
>
> Note that speeding up a query from 20s to 5s isn't terribly useful. If
> it's
> OLTP you can't be using all your cores for each user anyways. And if it's
> DSS
> 20s isn't a problem.

Unless (of course) there are 20,000 users doing the queries that would take 20 seconds but now they take 5 (when run single-user). They will still have a bit of a wait, of course.

> Where parallel processing like this becomes attractive is when you're
> running
> a 2 hour query on a machine sequentially running scheduled batch jobs
> which
> can be sped up to 30 minutes. But in that case you're almost certainly
> being
> limited by your disk bandwidth, not your cpu speed.

A linear speedup of 2 or more is always worth while[*]. Since sorting (e.g. for 'group by' and 'order by') and sort joins are a major database task, I guess that a linear speedup by a factor of 2 might make the database operations on the whole be 10% faster or so {OK, it's a SWAG}. I guess it would look good on the benchmarks, if nothing else.

[*] unless it is already fast enough. If, at peak load a query takes 1 ms, then making the query take 0.5 ms is not going to win you any medals, especially if the improvement costs $10,000.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2007-12-20 02:25:46 Re: Sorting Improvements for 8.4
Previous Message Gregory Stark 2007-12-20 01:26:13 Re: Sorting Improvements for 8.4