From: | Manfred Koizar <mkoi-pg(at)aon(dot)at> |
---|---|
To: | Manfred Koizar <mkoi-pg(at)aon(dot)at> |
Cc: | "Mooney, Ryan" <ryan(dot)mooney(at)pnl(dot)gov>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Large DB |
Date: | 2004-04-02 15:17:50 |
Message-ID: | 7sqq60po4q8k0l7ere02sja1kkevb877cu@email.aon.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Thu, 01 Apr 2004 12:22:58 +0200, I wrote:
>BTW, ANALYSE is basically a constant time operation.
On closer inspection, this is not the whole truth. ANALY[SZ]E is a two
stage process: First it collects a sample of rows, then these rows are
examined to produce various statistics.
The cost of the latter depends on the sample size, which itself depends
on the default or column-specific statistics target, and the number (and
types) of columns, so it *should* take more or less constant time.
The first step, however, (acquire_sample_rows() in analyze.c) has to
read more rows than finally end up in the sample. It visits less than
O(nblocks) pages but certainly more than O(1).
A vague feeling tries to tell me that the number of page reads is
somehow related to the harmonic numbers 1 + 1/2 + 1/3 + ... + 1/n, which
grow like O(ln(n)).
I have an idea how this could be done with O(1) page reads. If I'm able
to translate it into C, I'll send a patch ...
Servus
Manfred
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2004-04-02 15:22:44 | Re: Optimization on UPDATEs and FOREIGN KEYs... |
Previous Message | sferriol | 2004-04-02 14:48:11 | column oid ? |
From | Date | Subject | |
---|---|---|---|
Next Message | Joe Conway | 2004-04-02 16:16:03 | Re: Inconsistent behavior on Array & Is Null? |
Previous Message | Tom Lane | 2004-04-02 14:40:41 | Re: Inconsistent behavior on Array & Is Null? |