Quick Links

Re: hashed crosstab

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Joe Conway <mail(at)joeconway(dot)com>
Cc:	"Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: hashed crosstab
Date:	2003-03-20 06:46:27
Message-ID:	200303200646.h2K6kRC21244@candle.pha.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-patches

Patch applied. Thanks.

---------------------------------------------------------------------------

Joe Conway wrote:
> Attached is an update to contrib/tablefunc. It implements a new hashed
> version of crosstab. This fixes a major deficiency in real-world use of
> the original version. Easiest to undestand with an illustration:
>
> Data:
> -------------------------------------------------------------------
> select * from cth;
> id | rowid | rowdt | attribute | val
> ----+-------+---------------------+----------------+---------------
> 1 | test1 | 2003-03-01 00:00:00 | temperature | 42
> 2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
> 3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
> 4 | test2 | 2003-03-02 00:00:00 | temperature | 53
> 5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
> 6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
> 7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
> (7 rows)
>
> Original crosstab:
> -------------------------------------------------------------------
> SELECT * FROM crosstab(
> 'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
> AS c(rowid text, temperature text, test_result text, test_startdate
> text, volts text);
> rowid | temperature | test_result | test_startdate | volts
> -------+-------------+-------------+----------------+--------
> test1 | 42 | PASS | 2.6987 |
> test2 | 53 | FAIL | 01 March 2003 | 3.1234
> (2 rows)
>
> Hashed crosstab:
> -------------------------------------------------------------------
> SELECT * FROM crosstab(
> 'SELECT rowid, attribute, val FROM cth ORDER BY 1',
> 'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> AS c(rowid text, temperature int4, test_result text, test_startdate
> timestamp, volts float8);
> rowid | temperature | test_result | test_startdate | volts
> -------+-------------+-------------+---------------------+--------
> test1 | 42 | PASS | | 2.6987
> test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
> (2 rows)
>
> Notice that the original crosstab slides data over to the left in the
> result tuple when it encounters missing data. In order to work around
> this you have to be make your source sql do all sorts of contortions
> (cartesian join of distinct rowid with distinct attribute; left join
> that back to the real source data). The new version avoids this by
> building a hash table using a second distinct attribute query.
>
> The new version also allows for "extra" columns (see the README) and
> allows the result columns to be coerced into differing datatypes if they
> are suitable (as shown above).
>
> In testing a "real-world" data set (69 distinct rowid's, 27 distinct
> categories/attributes, multiple missing data points) I saw about a
> 5-fold improvement in execution time (from about 2200 ms old, to 440 ms
> new).
>
> I left the original version intact because: 1) BC, 2) it is probably
> slightly faster if you know that you have no missing attributes.
>
> README and regression test adjustments included. If there are no
> objections, please apply.
>
> Thanks,
>
> Joe

> Index: contrib/tablefunc/README.tablefunc
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/README.tablefunc,v
> retrieving revision 1.5
> diff -c -r1.5 README.tablefunc
> *** contrib/tablefunc/README.tablefunc 23 Nov 2002 01:54:09 -0000 1.5
> --- contrib/tablefunc/README.tablefunc 3 Mar 2003 03:37:39 -0000
> ***************
> *** 333,338 ****
> --- 333,457 ----
> ==================================================================
> Name
>
> + crosstab(text, text) - returns a set of row_name, extra, and
> + category value columns
> +
> + Synopsis
> +
> + crosstab(text source_sql, text category_sql)
> +
> + Inputs
> +
> + source_sql
> +
> + A SQL statement which produces the source set of data. The SQL statement
> + must return one row_name column, one category column, and one value
> + column. It may also have one or more "extra" columns.
> +
> + The row_name column must be first. The category and value columns
> + must be the last two columns, in that order. "extra" columns must be
> + columns 2 through (N - 2), where N is the total number of columns.
> +
> + The "extra" columns are assumed to be the same for all rows with the
> + same row_name. The values returned are copied from the first row
> + with a given row_name and subsequent values of these columns are ignored
> + until row_name changes.
> +
> + e.g. source_sql must produce a set something like:
> + SELECT row_name, extra_col, cat, value FROM foo;
> +
> + row_name extra_col cat value
> + ----------+------------+-----+---------
> + row1 extra1 cat1 val1
> + row1 extra1 cat2 val2
> + row1 extra1 cat4 val4
> + row2 extra2 cat1 val5
> + row2 extra2 cat2 val6
> + row2 extra2 cat3 val7
> + row2 extra2 cat4 val8
> +
> + category_sql
> +
> + A SQL statement which produces the distinct set of categories. The SQL
> + statement must return one category column only. category_sql must produce
> + at least one result row or an error will be generated. category_sql
> + must not produce duplicate categories or an error will be generated.
> +
> + e.g. SELECT DISTINCT cat FROM foo;
> +
> + cat
> + -------
> + cat1
> + cat2
> + cat3
> + cat4
> +
> + Outputs
> +
> + Returns setof record, which must be defined with a column definition
> + in the FROM clause of the SELECT statement, e.g.:
> +
> + SELECT * FROM crosstab(source_sql, cat_sql)
> + AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
> +
> + the example crosstab function produces a set something like:
> + <== values columns ==>
> + row_name extra cat1 cat2 cat3 cat4
> + ---------+-------+------+------+------+------
> + row1 extra1 val1 val2 val4
> + row2 extra2 val5 val6 val7 val8
> +
> + Notes
> +
> + 1. source_sql must be ordered by row_name (column 1).
> +
> + 2. The number of values columns is determined at run-time. The
> + column definition provided in the FROM clause must provide for
> + the correct number of columns of the proper data types.
> +
> + 3. Missing values (i.e. not enough adjacent rows of same row_name to
> + fill the number of result values columns) are filled in with nulls.
> +
> + 4. Extra values (i.e. source rows with category not found in category_sql
> + result) are skipped.
> +
> + 5. Rows with a null row_name column are skipped.
> +
> +
> + Example usage
> +
> + create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
> + insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42');
> + insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS');
> + insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987');
> + insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
> + insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234');
> +
> + SELECT * FROM crosstab
> + (
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth ORDER BY 1'
> + )
> + AS
> + (
> + rowid text,
> + rowdt timestamp,
> + temperature int4,
> + test_result text,
> + test_startdate timestamp,
> + volts float8
> + );
> + rowid | rowdt | temperature | test_result | test_startdate | volts
> + -------+--------------------------+-------------+-------------+--------------------------+--------
> + test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987
> + test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234
> + (2 rows)
> +
> + ==================================================================
> + Name
> +
> connectby(text, text, text, text, int[, text]) - returns a set
> representing a hierarchy (tree structure)
>
> Index: contrib/tablefunc/tablefunc.c
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.c,v
> retrieving revision 1.11
> diff -c -r1.11 tablefunc.c
> *** contrib/tablefunc/tablefunc.c 23 Nov 2002 01:54:09 -0000 1.11
> --- contrib/tablefunc/tablefunc.c 3 Mar 2003 02:57:50 -0000
> ***************
> *** 39,44 ****
> --- 39,49 ----
>
> #include "tablefunc.h"
>
> + static int load_categories_hash(char *cats_sql, MemoryContext per_query_ctx);
> + static Tuplestorestate *get_crosstab_tuplestore(char *sql,
> + int num_categories,
> + TupleDesc tupdesc,
> + MemoryContext per_query_ctx);
> static void validateConnectbyTupleDesc(TupleDesc tupdesc, bool show_branch);
> static bool compatCrosstabTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2);
> static bool compatConnectbyTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2);
> ***************
> *** 95,100 ****
> --- 100,166 ----
> /* sign, 10 digits, '\0' */
> #define INT32_STRLEN 12
>
> + /* hash table support */
> + static HTAB *crosstab_HashTable;
> +
> + /* The information we cache about loaded procedures */
> + typedef struct crosstab_cat_desc
> + {
> + char *catname;
> + int attidx; /* zero based */
> + } crosstab_cat_desc;
> +
> + #define MAX_CATNAME_LEN NAMEDATALEN
> + #define INIT_CATS 64
> +
> + #define crosstab_HashTableLookup(CATNAME, CATDESC) \
> + do { \
> + crosstab_HashEnt *hentry; char key[MAX_CATNAME_LEN]; \
> + \
> + MemSet(key, 0, MAX_CATNAME_LEN); \
> + snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATNAME); \
> + hentry = (crosstab_HashEnt*) hash_search(crosstab_HashTable, \
> + key, HASH_FIND, NULL); \
> + if (hentry) \
> + CATDESC = hentry->catdesc; \
> + else \
> + CATDESC = NULL; \
> + } while(0)
> +
> + #define crosstab_HashTableInsert(CATDESC) \
> + do { \
> + crosstab_HashEnt *hentry; bool found; char key[MAX_CATNAME_LEN]; \
> + \
> + MemSet(key, 0, MAX_CATNAME_LEN); \
> + snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATDESC->catname); \
> + hentry = (crosstab_HashEnt*) hash_search(crosstab_HashTable, \
> + key, HASH_ENTER, &found); \
> + if (hentry == NULL) \
> + elog(ERROR, "out of memory in crosstab_HashTable"); \
> + if (found) \
> + elog(ERROR, "trying to use a category name more than once"); \
> + hentry->catdesc = CATDESC; \
> + } while(0)
> +
> + #define crosstab_HashTableDelete(CATNAME) \
> + do { \
> + crosstab_HashEnt *hentry; char key[MAX_CATNAME_LEN]; \
> + \
> + MemSet(key, 0, MAX_CATNAME_LEN); \
> + snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATNAME); \
> + hentry = (crosstab_HashEnt*) hash_search(crosstab_HashTable, \
> + key, HASH_REMOVE, NULL); \
> + if (hentry == NULL) \
> + elog(WARNING, "trying to delete function name that does not exist."); \
> + } while(0)
> +
> + /* hash table */
> + typedef struct crosstab_hashent
> + {
> + char internal_catname[MAX_CATNAME_LEN];
> + crosstab_cat_desc *catdesc;
> + } crosstab_HashEnt;
> +
> /*
> * normal_rand - return requested number of random values
> * with a Gaussian (Normal) distribution.
> ***************
> *** 593,598 ****
> --- 659,999 ----
> }
>
> /*
> + * crosstab_hash - reimplement crosstab as materialized function and
> + * properly deal with missing values (i.e. don't pack remaining
> + * values to the left)
> + *
> + * crosstab - create a crosstab of rowids and values columns from a
> + * SQL statement returning one rowid column, one category column,
> + * and one value column.
> + *
> + * e.g. given sql which produces:
> + *
> + * rowid cat value
> + * ------+-------+-------
> + * row1 cat1 val1
> + * row1 cat2 val2
> + * row1 cat4 val4
> + * row2 cat1 val5
> + * row2 cat2 val6
> + * row2 cat3 val7
> + * row2 cat4 val8
> + *
> + * crosstab returns:
> + * <===== values columns =====>
> + * rowid cat1 cat2 cat3 cat4
> + * ------+-------+-------+-------+-------
> + * row1 val1 val2 null val4
> + * row2 val5 val6 val7 val8
> + *
> + * NOTES:
> + * 1. SQL result must be ordered by 1.
> + * 2. The number of values columns depends on the tuple description
> + * of the function's declared return type.
> + * 2. Missing values (i.e. missing category) are filled in with nulls.
> + * 3. Extra values (i.e. not in category results) are skipped.
> + */
> + PG_FUNCTION_INFO_V1(crosstab_hash);
> + Datum
> + crosstab_hash(PG_FUNCTION_ARGS)
> + {
> + char *sql = GET_STR(PG_GETARG_TEXT_P(0));
> + char *cats_sql = GET_STR(PG_GETARG_TEXT_P(1));
> + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> + TupleDesc tupdesc;
> + MemoryContext per_query_ctx;
> + MemoryContext oldcontext;
> + int num_categories;
> +
> + /* check to see if caller supports us returning a tuplestore */
> + if (!rsinfo || !(rsinfo->allowedModes & SFRM_Materialize))
> + elog(ERROR, "crosstab: materialize mode required, but it is not "
> + "allowed in this context");
> +
> + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
> + oldcontext = MemoryContextSwitchTo(per_query_ctx);
> +
> + /* get the requested return tuple description */
> + tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
> +
> + /*
> + * Check to make sure we have a reasonable tuple descriptor
> + *
> + * Note we will attempt to coerce the values into whatever
> + * the return attribute type is and depend on the "in"
> + * function to complain if needed.
> + */
> + if (tupdesc->natts < 2)
> + elog(ERROR, "crosstab: query-specified return tuple and " \
> + "crosstab function are not compatible");
> +
> + /* load up the categories hash table */
> + num_categories = load_categories_hash(cats_sql, per_query_ctx);
> +
> + /* let the caller know we're sending back a tuplestore */
> + rsinfo->returnMode = SFRM_Materialize;
> +
> + /* now go build it */
> + rsinfo->setResult = get_crosstab_tuplestore(sql,
> + num_categories,
> + tupdesc,
> + per_query_ctx);
> +
> + /*
> + * SFRM_Materialize mode expects us to return a NULL Datum. The actual
> + * tuples are in our tuplestore and passed back through
> + * rsinfo->setResult. rsinfo->setDesc is set to the tuple description
> + * that we actually used to build our tuples with, so the caller can
> + * verify we did what it was expecting.
> + */
> + rsinfo->setDesc = tupdesc;
> + MemoryContextSwitchTo(oldcontext);
> +
> + return (Datum) 0;
> + }
> +
> + /*
> + * load up the categories hash table
> + */
> + static int
> + load_categories_hash(char *cats_sql, MemoryContext per_query_ctx)
> + {
> + HASHCTL ctl;
> + int ret;
> + int proc;
> + MemoryContext SPIcontext;
> + int num_categories = 0;
> +
> + /* initialize the category hash table */
> + ctl.keysize = MAX_CATNAME_LEN;
> + ctl.entrysize = sizeof(crosstab_HashEnt);
> +
> + /*
> + * use INIT_CATS, defined above as a guess of how
> + * many hash table entries to create, initially
> + */
> + crosstab_HashTable = hash_create("crosstab hash", INIT_CATS, &ctl, HASH_ELEM);
> +
> + /* Connect to SPI manager */
> + if ((ret = SPI_connect()) < 0)
> + elog(ERROR, "load_categories_hash: SPI_connect returned %d", ret);
> +
> + /* Retrieve the category name rows */
> + ret = SPI_exec(cats_sql, 0);
> + num_categories = proc = SPI_processed;
> +
> + /* Check for qualifying tuples */
> + if ((ret == SPI_OK_SELECT) && (proc > 0))
> + {
> + SPITupleTable *spi_tuptable = SPI_tuptable;
> + TupleDesc spi_tupdesc = spi_tuptable->tupdesc;
> + int i;
> +
> + /*
> + * The provided categories SQL query must always return one column:
> + * category - the label or identifier for each column
> + */
> + if (spi_tupdesc->natts != 1)
> + elog(ERROR, "load_categories_hash: provided categories SQL must " \
> + "return 1 column of at least one row");
> +
> + for (i = 0; i < proc; i++)
> + {
> + crosstab_cat_desc *catdesc;
> + char *catname;
> + HeapTuple spi_tuple;
> +
> + /* get the next sql result tuple */
> + spi_tuple = spi_tuptable->vals[i];
> +
> + /* get the category from the current sql result tuple */
> + catname = SPI_getvalue(spi_tuple, spi_tupdesc, 1);
> +
> + SPIcontext = MemoryContextSwitchTo(per_query_ctx);
> +
> + catdesc = (crosstab_cat_desc *) palloc(sizeof(crosstab_cat_desc));
> + catdesc->catname = catname;
> + catdesc->attidx = i;
> +
> + /* Add the proc description block to the hashtable */
> + crosstab_HashTableInsert(catdesc);
> +
> + MemoryContextSwitchTo(SPIcontext);
> + }
> + }
> + else
> + {
> + /* no qualifying tuples */
> + SPI_finish();
> + elog(ERROR, "load_categories_hash: provided categories SQL must " \
> + "return 1 column of at least one row");
> + }
> +
> + if (SPI_finish() != SPI_OK_FINISH)
> + elog(ERROR, "load_categories_hash: SPI_finish() failed");
> +
> + return num_categories;
> + }
> +
> + /*
> + * create and populate the crosstab tuplestore using the provided source query
> + */
> + static Tuplestorestate *
> + get_crosstab_tuplestore(char *sql,
> + int num_categories,
> + TupleDesc tupdesc,
> + MemoryContext per_query_ctx)
> + {
> + Tuplestorestate *tupstore;
> + AttInMetadata *attinmeta = TupleDescGetAttInMetadata(tupdesc);
> + char **values;
> + HeapTuple tuple;
> + int ret;
> + int proc;
> + MemoryContext SPIcontext;
> +
> + /* initialize our tuplestore */
> + tupstore = tuplestore_begin_heap(true, SortMem);
> +
> + /* Connect to SPI manager */
> + if ((ret = SPI_connect()) < 0)
> + elog(ERROR, "get_crosstab_tuplestore: SPI_connect returned %d", ret);
> +
> + /* Now retrieve the crosstab source rows */
> + ret = SPI_exec(sql, 0);
> + proc = SPI_processed;
> +
> + /* Check for qualifying tuples */
> + if ((ret == SPI_OK_SELECT) && (proc > 0))
> + {
> + SPITupleTable *spi_tuptable = SPI_tuptable;
> + TupleDesc spi_tupdesc = spi_tuptable->tupdesc;
> + int ncols = spi_tupdesc->natts;
> + char *rowid;
> + char *lastrowid = NULL;
> + int i, j;
> + int result_ncols;
> +
> + /*
> + * The provided SQL query must always return at least three columns:
> + *
> + * 1. rowname the label for each row - column 1 in the final result
> + * 2. category the label for each value-column in the final result
> + * 3. value the values used to populate the value-columns
> + *
> + * If there are more than three columns, the last two are taken as
> + * "category" and "values". The first column is taken as "rowname".
> + * Additional columns (2 thru N-2) are assumed the same for the same
> + * "rowname", and are copied into the result tuple from the first
> + * time we encounter a particular rowname.
> + */
> + if (ncols < 3)
> + elog(ERROR, "get_crosstab_tuplestore: provided source SQL must " \
> + "return at least 3 columns; a rowid, a category, " \
> + "and a values column");
> +
> + result_ncols = (ncols - 2) + num_categories;
> +
> + /* Recheck to make sure we tuple descriptor still looks reasonable */
> + if (tupdesc->natts != result_ncols)
> + elog(ERROR, "get_crosstab_tuplestore: query-specified return " \
> + "tuple has %d columns but crosstab returns %d",
> + tupdesc->natts, result_ncols);
> +
> + /* allocate space */
> + values = (char **) palloc(result_ncols * sizeof(char *));
> +
> + /* and make sure it's clear */
> + memset(values, '\0', result_ncols * sizeof(char *));
> +
> + for (i = 0; i < proc; i++)
> + {
> + HeapTuple spi_tuple;
> + crosstab_cat_desc *catdesc;
> + char *catname;
> +
> + /* get the next sql result tuple */
> + spi_tuple = spi_tuptable->vals[i];
> +
> + /* get the rowid from the current sql result tuple */
> + rowid = SPI_getvalue(spi_tuple, spi_tupdesc, 1);
> +
> + /* if rowid is null, skip this tuple entirely */
> + if (rowid == NULL)
> + continue;
> +
> + /*
> + * if we're on a new output row, grab the column values up to
> + * column N-2 now
> + */
> + if ((lastrowid == NULL) || (strcmp(rowid, lastrowid) != 0))
> + {
> + /*
> + * a new row means we need to flush the old one first,
> + * unless we're on the very first row
> + */
> + if (lastrowid != NULL)
> + {
> + /* switch to appropriate context while storing the tuple */
> + SPIcontext = MemoryContextSwitchTo(per_query_ctx);
> +
> + /* rowid changed, flush the previous output row */
> + tuple = BuildTupleFromCStrings(attinmeta, values);
> + tuplestore_puttuple(tupstore, tuple);
> + for (j = 0; j < result_ncols; j++)
> + xpfree(values[j]);
> +
> + /* now reset the context */
> + MemoryContextSwitchTo(SPIcontext);
> + }
> +
> + values[0] = rowid;
> + for (j = 1; j < ncols - 2; j++)
> + values[j] = SPI_getvalue(spi_tuple, spi_tupdesc, j + 1);
> + }
> +
> + /* look up the category and fill in the appropriate column */
> + catname = SPI_getvalue(spi_tuple, spi_tupdesc, ncols - 1);
> +
> + if (catname != NULL)
> + {
> + crosstab_HashTableLookup(catname, catdesc);
> +
> + if (catdesc)
> + values[catdesc->attidx + ncols - 2] =
> + SPI_getvalue(spi_tuple, spi_tupdesc, ncols);
> + }
> +
> + xpfree(lastrowid);
> + lastrowid = pstrdup(rowid);
> + }
> +
> + /* switch to appropriate context while storing the tuple */
> + SPIcontext = MemoryContextSwitchTo(per_query_ctx);
> +
> + /* flush the last output row */
> + tuple = BuildTupleFromCStrings(attinmeta, values);
> + tuplestore_puttuple(tupstore, tuple);
> +
> + /* now reset the context */
> + MemoryContextSwitchTo(SPIcontext);
> +
> + }
> + else
> + {
> + /* no qualifying tuples */
> + SPI_finish();
> + }
> +
> + if (SPI_finish() != SPI_OK_FINISH)
> + elog(ERROR, "get_crosstab_tuplestore: SPI_finish() failed");
> +
> + tuplestore_donestoring(tupstore);
> +
> + return tupstore;
> + }
> +
> + /*
> * connectby_text - produce a result set from a hierarchical (parent/child)
> * table.
> *
> ***************
> *** 668,674 ****
> attinmeta = TupleDescGetAttInMetadata(tupdesc);
>
> /* check to see if caller supports us returning a tuplestore */
> ! if (!rsinfo->allowedModes & SFRM_Materialize)
> elog(ERROR, "connectby requires Materialize mode, but it is not "
> "allowed in this context");
>
> --- 1069,1075 ----
> attinmeta = TupleDescGetAttInMetadata(tupdesc);
>
> /* check to see if caller supports us returning a tuplestore */
> ! if (!rsinfo || !(rsinfo->allowedModes & SFRM_Materialize))
> elog(ERROR, "connectby requires Materialize mode, but it is not "
> "allowed in this context");
>
> Index: contrib/tablefunc/tablefunc.h
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.h,v
> retrieving revision 1.4
> diff -c -r1.4 tablefunc.h
> *** contrib/tablefunc/tablefunc.h 4 Sep 2002 20:31:08 -0000 1.4
> --- contrib/tablefunc/tablefunc.h 2 Mar 2003 22:32:15 -0000
> ***************
> *** 34,39 ****
> --- 34,40 ----
> */
> extern Datum normal_rand(PG_FUNCTION_ARGS);
> extern Datum crosstab(PG_FUNCTION_ARGS);
> + extern Datum crosstab_hash(PG_FUNCTION_ARGS);
> extern Datum connectby_text(PG_FUNCTION_ARGS);
>
> #endif /* TABLEFUNC_H */
> Index: contrib/tablefunc/tablefunc.sql.in
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.sql.in,v
> retrieving revision 1.5
> diff -c -r1.5 tablefunc.sql.in
> *** contrib/tablefunc/tablefunc.sql.in 18 Oct 2002 18:41:21 -0000 1.5
> --- contrib/tablefunc/tablefunc.sql.in 2 Mar 2003 22:32:23 -0000
> ***************
> *** 52,57 ****
> --- 52,62 ----
> AS 'MODULE_PATHNAME','crosstab'
> LANGUAGE 'C' STABLE STRICT;
>
> + CREATE OR REPLACE FUNCTION crosstab(text,text)
> + RETURNS setof record
> + AS 'MODULE_PATHNAME','crosstab_hash'
> + LANGUAGE 'C' STABLE STRICT;
> +
> CREATE OR REPLACE FUNCTION connectby(text,text,text,text,int,text)
> RETURNS setof record
> AS 'MODULE_PATHNAME','connectby_text'
> Index: contrib/tablefunc/expected/tablefunc.out
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/expected/tablefunc.out,v
> retrieving revision 1.5
> diff -c -r1.5 tablefunc.out
> *** contrib/tablefunc/expected/tablefunc.out 23 Nov 2002 01:54:09 -0000 1.5
> --- contrib/tablefunc/expected/tablefunc.out 3 Mar 2003 02:58:22 -0000
> ***************
> *** 123,128 ****
> --- 123,201 ----
> test2 | val5 | val6 | val7 | val8
> (2 rows)
>
> + --
> + -- hash based crosstab
> + --
> + create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
> + NOTICE: CREATE TABLE will create implicit sequence 'cth_id_seq' for SERIAL column 'cth.id'
> + insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42');
> + insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS');
> + -- the next line is intentionally left commented and is therefore a "missing" attribute
> + -- insert into cth values(DEFAULT,'test1','01 March 2003','test_startdate','28 February 2003');
> + insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987');
> + insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
> + insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234');
> + -- return attributes as plain text
> + SELECT * FROM crosstab(
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature text, test_result text, test_startdate text, volts text);
> + rowid | rowdt | temperature | test_result | test_startdate | volts
> + -------+--------------------------+-------------+-------------+----------------+--------
> + test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987
> + test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | 01 March 2003 | 3.1234
> + (2 rows)
> +
> + -- this time without rowdt
> + SELECT * FROM crosstab(
> + 'SELECT rowid, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, temperature text, test_result text, test_startdate text, volts text);
> + rowid | temperature | test_result | test_startdate | volts
> + -------+-------------+-------------+----------------+--------
> + test1 | 42 | PASS | | 2.6987
> + test2 | 53 | FAIL | 01 March 2003 | 3.1234
> + (2 rows)
> +
> + -- convert attributes to specific datatypes
> + SELECT * FROM crosstab(
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> + rowid | rowdt | temperature | test_result | test_startdate | volts
> + -------+--------------------------+-------------+-------------+--------------------------+--------
> + test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987
> + test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234
> + (2 rows)
> +
> + -- source query and category query out of sync
> + SELECT * FROM crosstab(
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth WHERE attribute IN (''temperature'',''test_result'',''test_startdate'') ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp);
> + rowid | rowdt | temperature | test_result | test_startdate
> + -------+--------------------------+-------------+-------------+--------------------------
> + test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS |
> + test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003
> + (2 rows)
> +
> + -- if category query generates no rows, get expected error
> + SELECT * FROM crosstab(
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth WHERE attribute = ''a'' ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> + ERROR: load_categories_hash: provided categories SQL must return 1 column of at least one row
> + -- if category query generates more than one column, get expected error
> + SELECT * FROM crosstab(
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT rowdt, attribute FROM cth ORDER BY 2')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> + ERROR: load_categories_hash: provided categories SQL must return 1 column of at least one row
> + --
> + -- connectby
> + --
> -- test connectby with text based hierarchy
> CREATE TABLE connectby_text(keyid text, parent_keyid text);
> \copy connectby_text from 'data/connectby_text.data'
> Index: contrib/tablefunc/sql/tablefunc.sql
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/sql/tablefunc.sql,v
> retrieving revision 1.6
> diff -c -r1.6 tablefunc.sql
> *** contrib/tablefunc/sql/tablefunc.sql 23 Nov 2002 01:54:09 -0000 1.6
> --- contrib/tablefunc/sql/tablefunc.sql 3 Mar 2003 02:51:45 -0000
> ***************
> *** 38,43 ****
> --- 38,98 ----
> SELECT * FROM crosstab('SELECT rowid, attribute, val FROM ct where rowclass = ''group1'' ORDER BY 1,2;', 3) AS c(rowid text, att1 text, att2 text, att3 text);
> SELECT * FROM crosstab('SELECT rowid, attribute, val FROM ct where rowclass = ''group1'' ORDER BY 1,2;', 4) AS c(rowid text, att1 text, att2 text, att3 text, att4 text);
>
> + --
> + -- hash based crosstab
> + --
> + create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
> + insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42');
> + insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS');
> + -- the next line is intentionally left commented and is therefore a "missing" attribute
> + -- insert into cth values(DEFAULT,'test1','01 March 2003','test_startdate','28 February 2003');
> + insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987');
> + insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
> + insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234');
> +
> + -- return attributes as plain text
> + SELECT * FROM crosstab(
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature text, test_result text, test_startdate text, volts text);
> +
> + -- this time without rowdt
> + SELECT * FROM crosstab(
> + 'SELECT rowid, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, temperature text, test_result text, test_startdate text, volts text);
> +
> + -- convert attributes to specific datatypes
> + SELECT * FROM crosstab(
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> +
> + -- source query and category query out of sync
> + SELECT * FROM crosstab(
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth WHERE attribute IN (''temperature'',''test_result'',''test_startdate'') ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp);
> +
> + -- if category query generates no rows, get expected error
> + SELECT * FROM crosstab(
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT attribute FROM cth WHERE attribute = ''a'' ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> +
> + -- if category query generates more than one column, get expected error
> + SELECT * FROM crosstab(
> + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> + 'SELECT DISTINCT rowdt, attribute FROM cth ORDER BY 2')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> +
> +
> + --
> + -- connectby
> + --
> +
> -- test connectby with text based hierarchy
> CREATE TABLE connectby_text(keyid text, parent_keyid text);
> \copy connectby_text from 'data/connectby_text.data'

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

hashed crosstab at 2003-03-03 04:27:44 from Joe Conway

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Dennis Björklund	2003-03-20 06:47:55	Forgotten?
Previous Message	Bruce Momjian	2003-03-20 06:45:31	Re: psql patch (2)