Re: Crosstab Confusion

From: Lee Hachadoorian <lee(dot)hachadoorian(at)gmail(dot)com>
To: Adam Sherman <adam(at)sherman(dot)ca>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Crosstab Confusion
Date: 2010-02-01 16:34:44
Message-ID: 5ab13581002010834k11c1be7ci4b4aef1b6778be4f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I'm flying blind here since I have nothing that looks like the structure you
described to experiment on, but based on some crosstabs I have set up this
should get you started. Explanatory notes follow.

SELECT pivot.*
FROM crosstab('
--row header, column header, cell value
SELECT customer_id, extract(day FROM date), sum(amount)
WHERE extract(month FROM date) = 1 --desired month, 1=Jan, 2=Feb, etc.
FROM your_table
GROUP BY 1, 2
ORDER BY 1, 2
','
--list of column headers
SELECT 1, 2, 3, […] 31
')
--list of column names for final result set
--does not have to match column names from crosstab() section!
pivot(customer_id integer, day1 numeric(10,4), day2 numeric(10,4), day3
numeric(10,4), […] day31 numeric(10,4))
;

You basically have three parts:

1) SELECT query in the form (row header, column header, cell value). In this
case it is an aggregate query so that you can sum the transactions over a
given day.

2) List of column headers. If you want, this can SELECT from another table,
so you can have a table with rows 1, 2, 3, etc and use it to select the days
from the month instead of listing them manually.

3) List of output columns, which follows "pivot" in the text above. Note
that "pivot" is an arbitrary name. You can use foo, bar, or whatever, but
that will be the name of the table which must be used to reference the
columns in the top SELECT list.

Note that my somewhat bizarre indenting / end of line structure is designed
so that, when using pgAdmin, I can very quickly select and execute just the
value query or column header query, which lets me experiment as I build the
crosstab.

Note, finally, that the list of output columns (following "pivot") should be
one more than the number of columns in section 2 (technically, you can
circumvent this limitation, but it's complicated and seems unnecessary for
your use case). That could be problematic when running this query in the
middle of the month and the number of value rows per customer doesn't match
the number expected in the list of column headers and list of output
columns. I don't know of any way to make the list of output columns vary
dynamically, but perhaps someone else here does? Otherwise, you would have
to either (a) make 31 different crosstab queries and choose which one to run
based on the day of the month, or (b) come up with some way to "pad" the
value table created in part 1 so that it lists customer_id, day, 0 for all
days which have no customer transactions. (Perhaps create a dummy
customer_id that has no transactions for all days in the month, which should
be enough to trick the crosstab function into thinking it has something to
fill the last columns.

--Lee

--
Lee Hachadoorian
PhD Student, Geography
Program in Earth & Environmental Sciences
CUNY Graduate Center

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Adam Sherman 2010-02-01 16:52:16 Re: Crosstab Confusion
Previous Message Adam Sherman 2010-02-01 15:03:40 Crosstab Confusion