From: | Jean-Luc Lachance <jllachan(at)nsd(dot)ca> |
---|---|
To: | Christopher Browne <cbbrowne(at)acm(dot)org> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: why does count take so long? |
Date: | 2003-09-10 15:16:59 |
Message-ID: | 3F5F406B.9E8B5138@nsd.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Christopher,
You did not quite understand.
The counts in question is the actual counts (deltas) for the
transtactions.
the tuple should be something like:
xid, reltype, insert_count, update_count, delete_count
When a COUNT(*) is issued the commited tuples are totaled up stored as
xid 0 or whatever and the commited tuples deleted (all this with
appropriate locks).
I am not sure if there is a need for the update count.
JLL
Christopher Browne wrote:
>
> In the last exciting episode, jllachan(at)nsd(dot)ca (Jean-Luc Lachance) wrote:
> > How about keeping counts of inserts, deletes and updates per table per
> > transaction as part of the live statistics?
>
> Aye, there's the rub.
>
> That's exactly what _won't_ work, and that's exactly the case that is
> somewhat pathological under MVCC.
>
> With MVCC, data "appears as if by magic" when its transaction COMMITs,
> thereby revealing the rows to the world.
>
> Unfortunately, there's no simple way of making updates to counts
> "simply appear" when they take effect, not without turning the updates
> into a concurrency bottleneck.
>
> Here's a bit of a wild thought...
>
> Assume a table with schema as follows:
> create table pg_count (
> xid integer, --- Actually, it's not an integer, right, Andrew? :-(
> reltype oid,
> count integer
> );
>
> Every transaction, "xid," affects some number of tuples. So that for
> a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4
> rows from table with 45678, part of the transaction would include
> inserting these rows:
>
> insert into pg_count (xid, reltype, count) values (2134151, 12345, 5);
> insert into pg_count (xid, reltype, count) values (2134151, 45678, -4);
>
> In order to get the count for table 12345, you could then go to
> pg_count and request:
> select sum(count) from pg_count where reltype = 12345;
>
> The size of this table would increase every time a transaction gets
> committed, so presumably part of VACUUM TABLE would be a
> collection/summarization, thus...
>
> -- Collect existing stats into 1 row
> insert into pg_count(xid, reltype, count) values (currxid,
> currtable, select sum(count) from pg_count where reltype =
> currtable);
>
> -- Delete the old stats
> delete from pg_count where reltype = currtable and xid <> currxid;
>
> This will cope with concurrency reasonably well (modulo having to make
> sure that the "collect/delete" transaction is a serialized one).
>
> Unfortunately, if a table is updated frequently, the summary
> select sum(count) from pg_count where reltype = 12345;
> will have to collect together quite a large number of entries, which
> makes this "less cheap."
>
> That suggests an optimization; any time the COUNT is selected, the old
> stats can and should be collected into 1 row and the old data deleted.
> --
> wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca').
> http://www3.sympatico.ca/cbbrowne/nonrdbms.html
> Sturgeon's Law: 90% of *EVERYTHING* is crud.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
From | Date | Subject | |
---|---|---|---|
Next Message | Alessandro GARDICH | 2003-09-10 15:18:27 | message type 0x50 |
Previous Message | Tom Lane | 2003-09-10 14:45:12 | Re: Question about conccurrency control and Insert |