From: | Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com> |
---|---|
To: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: WIP: Collecting statistics on CSV file data |
Date: | 2011-10-17 17:27:15 |
Message-ID: | 4E9C6573.8050507@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
(2011/10/07 18:09), Etsuro Fujita wrote:
> Thank you for the review and the helpful information.
> I rebased. Please find attached a patch. I'll add the patch to the next CF.
>
> Changes:
>
> * cleanups and fixes
> * addition of the following to ALTER FOREIGN TABLE
> ALTER [COLUMN] column SET STATISTICS integer
> ALTER [COLUMN] column SET ( n_distinct = val ) (n_distinct only)
> ALTER [COLUMN] column RESET ( n_distinct )
> * reflection of the force_not_null info in acquiring sample rows
> * documentation
The new patch could be applied with some shifts. Regression tests of
core and file_fdw have passed cleanly. Though I've tested only simple
tests, ANALYZE works for foreign tables for file_fdw, and estimation of
costs and selectivity seem appropriate.
New API AnalyzeForeignTable
===========================
In your design, new handler function is called instead of
do_analylze_rel. IMO this hook point would be good for FDWs which can
provide statistics in optimized way. For instance, pgsql_fdw can simply
copy statistics from remote PostgreSQL server if they are compatible.
Possible another idea is to replace acquire_sample_rows so that other
FDWs can reuse most part of fileAnalyzeForeignTable and
file_fdw_do_analyze_rel.
And I think that AnalyzeForeignTable should be optional, and it would be
very useful if a default handler is provided. Probably a default
handler can use basic FDW APIs to acquire sample rows from the result of
"SELECT * FROM foreign_table" with skipping periodically. It won't be
efficient but I think it's not so unreasonable.
Other issues
============
There are some other comments about non-critical issues.
- When there is no analyzable column, vac_update_relstats is not called.
Is this behavior intentional?
- psql can't complete foreign table name after ANALYZE.
- A new parameter has been added to vac_update_relstats in a recent
commit. Perhaps 0 is OK for that parameter.
- ANALYZE without relation name ignores foreign tables because
get_rel_oids doesn't list foreign tables.
- IMO logging "analyzing foo.bar" should not be done in
AnalyzeForeignTable handler of each FDW because some FDW might forget to
do it. Maybe it should be pulled up to analyze_rel or somewhere in core.
- It should be mentioned in a document that foreign tables are not
analyzed automatically because they are read-only.
Regards,
--
Shigeru Hanada
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2011-10-17 18:33:36 | HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts |
Previous Message | Tom Lane | 2011-10-17 15:37:56 | Re: Underspecified window queries in regression tests |