Problems implementing TPC-D queries

From: Paul Parker <pparker(at)cs(dot)rice(dot)edu>
To: pgsql-sql(at)postgresql(dot)org
Subject: Problems implementing TPC-D queries
Date: 1998-06-04 03:37:10
Message-ID: 199806040337.WAA09687@noel.cs.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi,

I've been working for a while on a research project that attempts to
"parallelize" Postgres via a software DSM (more work than expected and not
actually useful for any end-user). We want to use TPC-D to model commercial
workloads to analyze the speedup and memory behavior. I've gotten most of the
queries to work (the inclusion of subselects in 6.3 was a big help and is
earnestly appreciated), but I'm still having problems with a couple.

Query 12 analyzes the performance of a couple of shipping methods, finding
out how many late high-priority and low-priority shipments were made using
each method. TPC's definitional query uses CASE in an aggregate, which
isn't implemented. One variant uses aggregates (COUNT) inside views, which
of course doesn't work. A second variant uses temporary tables instead of
views, which I was more hopeful about. Unfortunately, I have been unable
to use INSERT INTO to actually insert the tuples from the SELECT. That is:

tpc_test=> SELECT
tpc_test-> SHIPMODE, COUNT(*)
tpc_test-> FROM LATEORDRS0
tpc_test-> WHERE ORDRPRIORITY IN ('1-URGENT', '2-HIGH')
tpc_test-> AND shipmode = 'RAIL'
tpc_test-> GROUP BY SHIPMODE;
shipmode |count
----------+-----
RAIL | 44
(1 row)

as compared to:

tpc_test=> CREATE TABLE QUICK0 (SHIPMODE CHAR(10), HIGH_LINE_COUNT INTEGER);
CREATE
tpc_test=>
tpc_test=> INSERT INTO QUICK0
tpc_test-> SELECT
tpc_test-> SHIPMODE, COUNT(*)
tpc_test-> FROM LATEORDRS0
tpc_test-> WHERE ORDRPRIORITY IN ('1-URGENT', '2-HIGH')
tpc_test-> AND shipmode = 'RAIL'
tpc_test-> GROUP BY SHIPMODE;
INSERT 105206 1
tpc_test=> SELECT * FROM QUICK0;
shipmode|high_line_count
--------+---------------
| 0

Note the types are correct (assuming count returns an integer, which is
validated below):

CREATE VIEW LATEORDRS0 AS
SELECT
L_SHIPMODE AS SHIPMODE, O_ORDRPRIORITY AS ORDRPRIORITY
FROM ORDR, LINEITEM
WHERE O_ORDRKEY = L_ORDRKEY
AND L_SHIPMODE IN ('TRUCK','RAIL')
AND L_COMMITDATE < L_RECEIPTDATE
AND L_SHIPDATE < L_COMMITDATE
AND L_RECEIPTDATE >= '1993-01-01'::DATETIME
AND L_RECEIPTDATE < ('1993-01-01'::DATETIME + '1 YEAR'::TIMESPAN);

tpc_test-> \d lineitem

Table = lineitem
+----------------------------------+----------------------------------+-------+
| Field | Type | Length|
+----------------------------------+----------------------------------+-------+
...
| l_shipmode | char() | 10 |

In fact, this query works (assuming quick0 has at least one row with a
non-NULL shipmode):

insert into quick0
select shipmode, count(*)
from quick0
group by shipmode ;

But this query only adds a blank and zero tuple:

insert into quick0
select shipmode, count(*)
from lateordrs0
group by shipmode ;
select * from quick0;

Quite likely this is actually a bug, but I don't know SQL well enough to
pronounce it that for sure.

The other query's problem is COUNT DISTINCT (unimplemented, I believe).
I've seen this mentioned in the archives, but I haven't seen any solutions
that would work. They mentioned you could produce a DISTINCT view, but
COUNT(*) failed on it (related to the view-aggregate problem??). I suspect
INSERT INTO SELECT DISTINCT... might work, however even if it did, I would
need to do it for each of some arbitrary number of cases, as per the query,
which counts the number of suppliers for each part meeting some
specifications:

SELECT P_BRAND, P_TYPE, P_SIZE, COUNT(DISTINCT PS_SUPPKEY) AS SUPPLIER_CNT
FROM PARTSUPP, PART
WHERE
P_PARTKEY = PS_PARTKEY AND
P_BRAND <> 'Brand#14' AND
P_TYPE NOT LIKE 'LARGE PLATED%' AND
P_SIZE IN (45, 22, 2, 48, 4, 49, 26, 10) AND
PS_SUPPKEY NOT IN
(SELECT S_SUPPKEY FROM SUPPLIER WHERE S_COMMENT LIKE
'%Better Business Bureau%Complaints%')
GROUP BY P_BRAND, P_TYPE, P_SIZE
ORDER BY SUPPLIER_CNT DESC, P_BRAND, P_TYPE, P_SIZE;

Also, for no reason readily evident to myself, this last query takes 41 cpu
hours on a P6-150 w/ 64 M RAM running FreeBSD 2.2 over a 20 M database,
after having done vacuum analyze (which improved several queries one or two
orders of magnitude), whereas the other queries take a few cpu seconds, a
difference of about 1000x. True, the VM size does swell to 57 M, but the
working set seems content at <30M, which the otherwise empty machine easily
gives it.

CREATE TABLE ALL_NATIONS0
(YEAR_CREATE INTEGER,
VOLUME FLOAT8,
NATION CHAR(25));

INSERT INTO ALL_NATIONS0
SELECT EXTRACT (YEAR FROM O_ORDRDATE)::INTEGER AS YEAR_SEL, L_EXTENDEDPRICE*(1-L_DISCOUNT) AS VOLUME,
N2.N_NAME AS NATION
FROM PART, SUPPLIER, LINEITEM, ORDR, CUSTOMER,
NATION N1, NATION N2, REGION
WHERE P_PARTKEY = L_PARTKEY
AND S_SUPPKEY = L_SUPPKEY
AND L_ORDRKEY = O_ORDRKEY
AND O_CUSTKEY = C_CUSTKEY
AND C_NATIONKEY = N1.N_NATIONKEY
AND N1.N_REGIONKEY = R_REGIONKEY
AND R_NAME = 'ASIA'
AND S_NATIONKEY = N2.N_NATIONKEY
AND O_ORDRDATE > ('1995-01-01'::DATETIME)
AND O_ORDRDATE < ('1996-12-31'::DATETIME)
AND P_TYPE = 'ECONOMY POLISHED COPPER';

Any help anyone can give is appreciated,

Paul

-----------
Paul Parker\___________________________________
pparker(at)cs(dot)rice(dot)edu a.k.a. "The other Paul" \_____________
Second-year grad-student seeking a decent signature quotation

Browse pgsql-sql by date

  From Date Subject
Next Message Konstantin S. Kuznetsov 1998-06-04 04:05:33 Need ODBC driver
Previous Message Stoeppel 1998-06-03 19:39:00 problems with CAST :-{