| From: | Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com> | 
|---|---|
| To: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> | 
| Cc: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: WIP: Collecting statistics on CSV file data | 
| Date: | 2011-10-17 17:27:15 | 
| Message-ID: | 4E9C6573.8050507@gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
(2011/10/07 18:09), Etsuro Fujita wrote:
> Thank you for the review and the helpful information.
> I rebased. Please find attached a patch. I'll add the patch to the next CF.
> 
> Changes:
> 
>    * cleanups and fixes
>    * addition of the following to ALTER FOREIGN TABLE
>        ALTER [COLUMN] column SET STATISTICS integer
>        ALTER [COLUMN] column SET ( n_distinct = val ) (n_distinct only)
>        ALTER [COLUMN] column RESET ( n_distinct )
>    * reflection of the force_not_null info in acquiring sample rows
>    * documentation
The new patch could be applied with some shifts.  Regression tests of
core and file_fdw have passed cleanly.  Though I've tested only simple
tests, ANALYZE works for foreign tables for file_fdw, and estimation of
costs and selectivity seem appropriate.
New API AnalyzeForeignTable
===========================
In your design, new handler function is called instead of
do_analylze_rel.  IMO this hook point would be good for FDWs which can
provide statistics in optimized way.  For instance, pgsql_fdw can simply
copy statistics from remote PostgreSQL server if they are compatible.
Possible another idea is to replace acquire_sample_rows so that other
FDWs can reuse most part of fileAnalyzeForeignTable and
file_fdw_do_analyze_rel.
And I think that AnalyzeForeignTable should be optional, and it would be
very useful if a default handler is provided.  Probably a default
handler can use basic FDW APIs to acquire sample rows from the result of
"SELECT * FROM foreign_table" with skipping periodically.  It won't be
efficient but I think it's not so unreasonable.
Other issues
============
There are some other comments about non-critical issues.
- When there is no analyzable column, vac_update_relstats is not called.
 Is this behavior intentional?
- psql can't complete foreign table name after ANALYZE.
- A new parameter has been added to vac_update_relstats in a recent
commit.  Perhaps 0 is OK for that parameter.
- ANALYZE without relation name ignores foreign tables because
get_rel_oids doesn't list foreign tables.
- IMO logging "analyzing foo.bar" should not be done in
AnalyzeForeignTable handler of each FDW because some FDW might forget to
do it.  Maybe it should be pulled up to analyze_rel or somewhere in core.
- It should be mentioned in a document that foreign tables are not
analyzed automatically because they are read-only.
Regards,
-- 
Shigeru Hanada
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alvaro Herrera | 2011-10-17 18:33:36 | HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts | 
| Previous Message | Tom Lane | 2011-10-17 15:37:56 | Re: Underspecified window queries in regression tests |