Re: [SQL] pqReadData() error

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Imtiaz(dot) S(dot) M" <imtiaz_sm(at)yahoo(dot)com>
Cc: pgsql-sql(at)postgreSQL(dot)org
Subject: Re: [SQL] pqReadData() error
Date: 1999-12-06 16:01:26
Message-ID: 18704.944496086@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

"Imtiaz. S. M" <imtiaz_sm(at)yahoo(dot)com> writes:
> ... to delete the oldest data I use the following SQL queries.

> create table tempdates as select distinct dates from daydata;
> delete from daydata where dates in (select b.dates from tempdates a,
> tempdates b where a.dates-b.dates > '22 hours');

This is certainly the hard, hard way to do it. Why not

create table tempdates as
select max(dates) - '22 hours'::interval as cutoff from daydata;
delete from daydata where dates < (select cutoff from tempdates);

As you have it, the subselect must generate O(N^2) rows for an N-row
daydata table (assuming the distribution of dates is fairly even).
If I read you correctly, there are about 12,000 rows in daydata at all
times? If so the subselect will examine 144,000,000 pairs of a.dates
and b.dates, and probably output about 25,000 rows (500 a rows matching
each b row, and vice versa).

That's a lot of data, and then it has to be scanned over for each row of
the daydata table in order to implement the IN; the rows that you are
keeping will require a comparison against *every single one* of the
subselect output rows to verify that the IN fails. So about 11,500 *
25,000 row comparisons done by the IN just for the non-deleted rows,
and probably a few million more for the deletable rows.

Lots faster to calculate the cutoff time once and then do one compare
for each row.

You could probably simplify this even more, to

delete from daydata
where dates < (select max(dates) - '22 hours'::interval from daydata);

unless you have some other use for the temp table's contents.

> The query takes a long time and then gives me the
> following error

> "pqReadData() -- backend closed the channel unexpectedly.

Probably all that date arithmetic is overflowing memory with temporary
results ... which won't get reclaimed till end of query, at present.
But even if it didn't run out of memory, doing it this way is very slow.

regards, tom lane

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 1999-12-06 16:23:27 Re: [SQL] How to avoid "Out of memory" using aggregate functions?
Previous Message Nikolay Mijaylov 1999-12-06 15:17:59 Fw: Whats happen here?