Re: Working with very large datasets

From: Wilkinson Charlie E <Charlie(dot)E(dot)Wilkinson(at)irs(dot)gov>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, Lex Berezhny <LBerezhny(at)DevIS(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Working with very large datasets
Date: 2003-02-12 21:14:30
Message-ID: 9DDDC002B736D711A7320004AC3699EB01C195D6@nct0010mb03.nc.no.irs.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

> -----Original Message-----
> From: Stephan Szabo [mailto:sszabo(at)megazone23(dot)bigpanda(dot)com]
[...]
> Was it psql that grew to 800Mb or a backend? If the former,
> how many rows
> do you expect that to return? You probably want to look into using
> cursors rather than returning the entire result set at once.

It was the front end, psql that got big. postmaster has been pretty
well behaved through all this. I'm looking to get the full 100M+
records back. I have to print it all out and fax them to someone. ;)
Seriously, I want to dump it all to a (big!) file and sift through
it with some familiar unix tools to check the data and generally
look it over.

Thanks to Stephan and Lex for the tips! I took your CURSOR and LIMIT
clues and ran with them. I looked at straight SQL>psql and Perl DBD:Pg
for solutions, but the end result was some "ecpg" C code:

----------------------------------
#include <stdio.h>
#define NO_MORE_TUPLES (sqlca.sqlcode == ECPG_NOT_FOUND)

int main(int argc, char* agv[]){

EXEC SQL BEGIN DECLARE SECTION;
int myuser;
char myname[121];
int mysize;
int mymtime;
EXEC SQL END DECLARE SECTION;

EXEC SQL INCLUDE sqlca;
EXEC SQL CONNECT TO userdb USER xxxxxxx;
EXEC SQL WHENEVER sqlerror sqlprint;

EXEC SQL DECLARE mycursor CURSOR FOR SELECT user,name,size,mtime
FROM users JOIN files USING (fileid) ORDER BY user;

EXEC SQL OPEN mycursor;
while(1){
EXEC SQL FETCH FROM mycursor
INTO :myuser,:myname,:mysize,:mymtime;
if(NO_MORE_TUPLES)
break;
myname[120] = '\0';
printf("%d\t%s\t%d\t%d\n", myuser, myname, mysize, mymtime);
}
EXEC SQL CLOSE mycursor;

exit(0);
}
-----------------------------------

No idea how long it's going to take to complete its run, but it
faithfully starts spitting out the records shortly after being
invoked. Thanks again guys!

For those interested, more helpful hints & examples on epgc,
libpq and related goodies here:
http://www.cse.ucsc.edu/classes/cmps180/Winter02/proj5.html
- thanks to UCSC and Google. :-)

-cw-

Browse pgsql-sql by date

  From Date Subject
Next Message Johnny Kristensen 2003-02-12 21:51:41 SQL Functions vs PL/PgSQL
Previous Message Dmitry Tkach 2003-02-12 20:59:48 Re: How do you select from a table until a condition is met?