From: | Paul Parker <pparker(at)cs(dot)rice(dot)edu> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Problems implementing TPC-D queries |
Date: | 1998-06-04 03:37:10 |
Message-ID: | 199806040337.WAA09687@noel.cs.rice.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi,
I've been working for a while on a research project that attempts to
"parallelize" Postgres via a software DSM (more work than expected and not
actually useful for any end-user). We want to use TPC-D to model commercial
workloads to analyze the speedup and memory behavior. I've gotten most of the
queries to work (the inclusion of subselects in 6.3 was a big help and is
earnestly appreciated), but I'm still having problems with a couple.
Query 12 analyzes the performance of a couple of shipping methods, finding
out how many late high-priority and low-priority shipments were made using
each method. TPC's definitional query uses CASE in an aggregate, which
isn't implemented. One variant uses aggregates (COUNT) inside views, which
of course doesn't work. A second variant uses temporary tables instead of
views, which I was more hopeful about. Unfortunately, I have been unable
to use INSERT INTO to actually insert the tuples from the SELECT. That is:
tpc_test=> SELECT
tpc_test-> SHIPMODE, COUNT(*)
tpc_test-> FROM LATEORDRS0
tpc_test-> WHERE ORDRPRIORITY IN ('1-URGENT', '2-HIGH')
tpc_test-> AND shipmode = 'RAIL'
tpc_test-> GROUP BY SHIPMODE;
shipmode |count
----------+-----
RAIL | 44
(1 row)
as compared to:
tpc_test=> CREATE TABLE QUICK0 (SHIPMODE CHAR(10), HIGH_LINE_COUNT INTEGER);
CREATE
tpc_test=>
tpc_test=> INSERT INTO QUICK0
tpc_test-> SELECT
tpc_test-> SHIPMODE, COUNT(*)
tpc_test-> FROM LATEORDRS0
tpc_test-> WHERE ORDRPRIORITY IN ('1-URGENT', '2-HIGH')
tpc_test-> AND shipmode = 'RAIL'
tpc_test-> GROUP BY SHIPMODE;
INSERT 105206 1
tpc_test=> SELECT * FROM QUICK0;
shipmode|high_line_count
--------+---------------
| 0
Note the types are correct (assuming count returns an integer, which is
validated below):
CREATE VIEW LATEORDRS0 AS
SELECT
L_SHIPMODE AS SHIPMODE, O_ORDRPRIORITY AS ORDRPRIORITY
FROM ORDR, LINEITEM
WHERE O_ORDRKEY = L_ORDRKEY
AND L_SHIPMODE IN ('TRUCK','RAIL')
AND L_COMMITDATE < L_RECEIPTDATE
AND L_SHIPDATE < L_COMMITDATE
AND L_RECEIPTDATE >= '1993-01-01'::DATETIME
AND L_RECEIPTDATE < ('1993-01-01'::DATETIME + '1 YEAR'::TIMESPAN);
tpc_test-> \d lineitem
Table = lineitem
+----------------------------------+----------------------------------+-------+
| Field | Type | Length|
+----------------------------------+----------------------------------+-------+
...
| l_shipmode | char() | 10 |
In fact, this query works (assuming quick0 has at least one row with a
non-NULL shipmode):
insert into quick0
select shipmode, count(*)
from quick0
group by shipmode ;
But this query only adds a blank and zero tuple:
insert into quick0
select shipmode, count(*)
from lateordrs0
group by shipmode ;
select * from quick0;
Quite likely this is actually a bug, but I don't know SQL well enough to
pronounce it that for sure.
The other query's problem is COUNT DISTINCT (unimplemented, I believe).
I've seen this mentioned in the archives, but I haven't seen any solutions
that would work. They mentioned you could produce a DISTINCT view, but
COUNT(*) failed on it (related to the view-aggregate problem??). I suspect
INSERT INTO SELECT DISTINCT... might work, however even if it did, I would
need to do it for each of some arbitrary number of cases, as per the query,
which counts the number of suppliers for each part meeting some
specifications:
SELECT P_BRAND, P_TYPE, P_SIZE, COUNT(DISTINCT PS_SUPPKEY) AS SUPPLIER_CNT
FROM PARTSUPP, PART
WHERE
P_PARTKEY = PS_PARTKEY AND
P_BRAND <> 'Brand#14' AND
P_TYPE NOT LIKE 'LARGE PLATED%' AND
P_SIZE IN (45, 22, 2, 48, 4, 49, 26, 10) AND
PS_SUPPKEY NOT IN
(SELECT S_SUPPKEY FROM SUPPLIER WHERE S_COMMENT LIKE
'%Better Business Bureau%Complaints%')
GROUP BY P_BRAND, P_TYPE, P_SIZE
ORDER BY SUPPLIER_CNT DESC, P_BRAND, P_TYPE, P_SIZE;
Also, for no reason readily evident to myself, this last query takes 41 cpu
hours on a P6-150 w/ 64 M RAM running FreeBSD 2.2 over a 20 M database,
after having done vacuum analyze (which improved several queries one or two
orders of magnitude), whereas the other queries take a few cpu seconds, a
difference of about 1000x. True, the VM size does swell to 57 M, but the
working set seems content at <30M, which the otherwise empty machine easily
gives it.
CREATE TABLE ALL_NATIONS0
(YEAR_CREATE INTEGER,
VOLUME FLOAT8,
NATION CHAR(25));
INSERT INTO ALL_NATIONS0
SELECT EXTRACT (YEAR FROM O_ORDRDATE)::INTEGER AS YEAR_SEL, L_EXTENDEDPRICE*(1-L_DISCOUNT) AS VOLUME,
N2.N_NAME AS NATION
FROM PART, SUPPLIER, LINEITEM, ORDR, CUSTOMER,
NATION N1, NATION N2, REGION
WHERE P_PARTKEY = L_PARTKEY
AND S_SUPPKEY = L_SUPPKEY
AND L_ORDRKEY = O_ORDRKEY
AND O_CUSTKEY = C_CUSTKEY
AND C_NATIONKEY = N1.N_NATIONKEY
AND N1.N_REGIONKEY = R_REGIONKEY
AND R_NAME = 'ASIA'
AND S_NATIONKEY = N2.N_NATIONKEY
AND O_ORDRDATE > ('1995-01-01'::DATETIME)
AND O_ORDRDATE < ('1996-12-31'::DATETIME)
AND P_TYPE = 'ECONOMY POLISHED COPPER';
Any help anyone can give is appreciated,
Paul
-----------
Paul Parker\___________________________________
pparker(at)cs(dot)rice(dot)edu a.k.a. "The other Paul" \_____________
Second-year grad-student seeking a decent signature quotation
From | Date | Subject | |
---|---|---|---|
Next Message | Konstantin S. Kuznetsov | 1998-06-04 04:05:33 | Need ODBC driver |
Previous Message | Stoeppel | 1998-06-03 19:39:00 | problems with CAST :-{ |