Quick Links

Collect frequency statistics for arrays

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Nathan Boley <npboley(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Collect frequency statistics for arrays
Date:	2011-10-25 16:12:08
Message-ID:	CAPpHfdvTfDZ7OeFGUdv9s=2EKV9cDF3AjXznbNrp1xbzwF7kpA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi!

There is updated version of patch. General list of changes since reviewed
version:
1) Distinct slot is used for length histogram.
2) Standard statistics is collected for arrays.
3) Most common values and most common elements are mapped to distinct
columns of pg_stats view, because both of them are calculated for arrays.
4) Description of lossy counting algorithm was copied from
compute_tsvector_stats with corresponding changes in it.
5) In estimation functions comments about assumtions were added.

Accuracy testing

Following files are attached.
datasets.sql - sql script which generates test datasets
arrayanalyze.php - php script which does accuracy testing
results.sql - dump of table with tests results

As we can see from testing results, estimates seem to be quite accurate in
most part of test cases. When length of constant array exceeds 30, estimate
of "column <@ const" is very inaccurate for arrat_test3 table. It's related
with skipping of length histogram usage because of high CPU usage during
estimate (see array_sel.c:888).

------
With best regards,
Alexander Korotkov.

Attachment	Content-Type	Size
arrayanalyze-0.6.patch.gz	application/x-gzip	19.8 KB
datasets.sql	text/x-sql	687 bytes
arrayanalyze.php	application/x-httpd-php	3.5 KB
results.sql.gz	application/x-gzip	58.3 KB

Responses

Re: Collect frequency statistics for arrays at 2011-11-09 16:49:35 from Alexander Korotkov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2011-10-25 16:32:31	isolationtester's "dry run" mode
Previous Message	Florian Pflug	2011-10-25 16:05:46	Re: lexemes in prefix search going through dictionary modifications