From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Freeze avoidance of very large table. |
Date: | 2016-02-11 14:49:06 |
Message-ID: | CAD21AoA5ed0E1abvC8eiswSEandDweXBrGvHCmFjXTHquOE83w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thank you for reviewing this patch!
On Wed, Feb 10, 2016 at 4:39 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Thu, 4 Feb 2016 02:32:29 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoB1HnZ7thWYjqKve78gQ5+PyedbbkjAPbc5zLV3oA-CuA(at)mail(dot)gmail(dot)com>
>> On Tue, Feb 2, 2016 at 7:22 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>> > Masahiko Sawada wrote:
>> >> I think we have two options.
>> >>
>> >> 1. Change page layout(PG_PAGE_LAYOUT_VERSION) to 5. pg_upgrade detects
>> >> it and then converts only VM files.
>> >> 2. Change pg_upgrade plugin mechanism so that it can handle other name
>> >> conversion plugins (e.g., convertLayout_vm_to_vfm)
>> >>
>> >> I think #2 is better. Thought?
>> >
>> > My vote is for #2 as well. Maybe we just didn't have forks when this
>> > functionality was invented; maybe the author just didn't think hard
>> > enough about what would be the right interface to do it.
>>
>> I've almost wrote up the very rough patch. (it can pass regression test)
>> Windows supporting is not yet, and Makefile is not correct.
>>
>> I've divided the main patch into two patches; add frozen bit patch and
>> pg_upgrade support patch.
>> 000 patch is almost same as previous code. (includes small fix)
>> 001 patch provides rewriting visibility map as a pageConverter routine.
>> 002 patch is for enhancement debug message in visibilitymap.c
>
> Thanks, it becomes easy to read.
>
>> In order to support pageConvert plugin, I made the following changes.
>> * Main changes
>> - Remove PAGE_CONVERSION
>> - pg_upgrade plugin is located to 'src/bin/pg_upgrade/plugins' directory.
>> - Move directory having plugins from '$(bin)/plugins' to '$(lib)/plugins'.
>
> These seem fair.
>
>> - Add new page-converter plugin function for visibility map.
>> - Current code doesn't allow us to use link mode (-k) in the case
>> where page-converter is required.
>>
>> But I changed it so that if page-converter for fork file is
>> specified, we convert it actually even when link mode.
>>
>> * Interface designe
>> convertFile() and convertPage() are plugin function for main relation
>> file, and these functions are dynamically loaded by
>> loadConvertPlugin().
>
> Though I haven't looked this so closer, loadConverterPlugin looks
> to continue deciding what plugin to load using old and new page
> layout versions. Currently the acutually possible versions are 4
> and if we increment it now, 5.
>
> On the other hand, _vm came at the *catalog version* 201107031
> (9.1 release) and _fsm came at 8.4 release. Both of them are of
> page layout version 4. Are we allowed to increment page layout
> version for this reason? And is this framework under
> reconstruction is flexible enough for this kind of changes in
> future? I don't think so.
Yeah, I also think that page layout version should not be increased by
this layout change of vm.
This patch checks catalog version at first, and then decides what
plugin to load.
In this case, only the format of VM has been changed, so pg_upgrade
loads a plugin for VM and convert them.
pg_upgrade doesn't load for other plugin file, and other files are just copied.
>
>
> We have added _vm and _fsm so far so we must use a version number
> that can determine when _vm, _fsm and _vfm are introduced. I'm
> afraid that it is out of purpose, catalog version seems to be
> most usable, it is already used to know when the crash safe VM
> has been introduced.
>
> Using the catalog version, the plugin we provide first would be
> converteLayout_201105231_201602071.so which has only a converter
> from _vm to _vfm. This plugin is loaded for the combination of
> the source cluster with the catalog version of 201105231(when VM
> has been introduced) or later and the destination cluster with
> that *before* 201602071 (this version).
>
> If we change the format of fsm (vm no longer exists), we would
> have a new plugin maybe named
> converteLayout_200904091_2017xxxxx.so which has an, maybe,
> inplace file converter for fsm. It will be loaded when a source
> database is of the catalog version of 200904091(when FSM has been
> introduced) or later and a destination before 2017xxxxx(the
> version). Catalog version seems to work fine.
I think that it's not good idea to use catalog version for plugin name.
Because, even if catalog version is used for plugin file name as you
suggested, pg_upgrade still needs to decide what plugin name to load
by itself.
Also, the plugin file having catalog version would not easy to
understand what plugin does actually. It's not developer friendly.
The advanteage of using page layout version as plugin name is that
pg_upgrade can decide automatically which plugin should be loaded.
>
> So far, I assumed that we can safely assume that the name of
> files to be converted is <oid>[fork_name] so the possible types
> of conversions would be the following.
>
> - per-page conversion
> - per-file conversion between the files with the same fork name.
> - per-file conversion between the files with different fork names.
>
> Since the plugin filename doesn't tell such things, they should
> be told by the plugin itself. So a plugin is to provide the
> following interface,
>
> typedef struct ConverterTable
> {
> char *src_fork_name;
> char *dst_fork_name;
> FileConverterFunc file_conveter;
> PageConverterFunc page_conveter;
> } ConverterTable[];
>
> Following such name convention of plugins, we may load multiple
> plugins at once, so we collect all entries of the table of all
> loaded plugins and check if any src_fork_name from them don't
> duplicate.
>
> Here, we have sufficient information to choose what conveter to
> invoke and execute conversion like this.
>
> for (fork_name in all_fork_names_including_"" )
> {
> find a converter comparing fork_name with src_fork_name.
> check dst_fork_name and rename the target file if needed.
> invoke the converter.
> }
>
> If we need to convert clogs or similars and need to prepare for
> such events, the ConverterTable might have an additional member
> and change the meaning of some of existing members.
>
> typedef struct ConverterTable
> {
> enum target_type; /* FILE_NAME or FORK_NAME */
> char *src_name;
> char *dst_name;
> FileConverterFunc file_conveter;
> PageConverterFunc page_conveter;
> } ConverterTable[];
>
> when target_type == FILE_NAME, src_name and dst_name represents
> the target file names relatively to $PGDATA.
>
> # Yeah, I know it is too complicated.
>
I agree with having ConverterTable.
Since we have three kind of fiel suffix types; "", "_vm", "_fsm",
pg_upgrade will have three elements of ConverterTable[].
>> I added a new pageConvert plugin function converVMFile() for
>> visibility map (fork file).
>> If layout of CLOG, FSM or etc will be changed in the future, we could
>> expand some new pageConvert plugin functions like convertCLOGFile() or
>> convertFSMFile(), and these functions are dynamically loaded by
>> loadAdditionalConvertPlugin().
>> It means that main file and other fork file conversion are executed
>> independently, and conversion for fork file are executed even if link
>> mode is specified.
>> Each conversion plugin is loaded and used only when it's required.
>
> As I asked upthread, It is one of the most important design point
> of plugin mechanism that what characteristics of src and/or dest
> cluster to trigger loading of a plugin. And if page layout format
> is it, are we allowed to increment for such irrelevant events? Or
> using another characteristics like catalog version?
>
>> I still agree with this plugin approach, but I felt it's still
>> complicated a bit, and I'm concerned that patch size has been
>> increased.
>> Please give me feedbacks.
>
> Yeah, I feel the same. What make it worse, the plugin mechanism
> will get further complex if we make it more flexible for possible
> usage as I proposed above. It is apparently too complicated for
> deciding whether to load *just one*, for now, converter
> function. And no additional converter is in sight.
There will be case where layout of other type relation file is
changed, so pg_upgrade will need to convert several types of relation
file at the same time.
I'm thinking that we need to support to load multiple plugin function at least.
> I incline to pull out all the plugin stuff of pg_upgrade. We are
> so prudent to make changes of file formats so this kind of events
> will happen with several-years intervals. The plugin mechanism
> would be valuable if we are encouriged to change file formats
> more frequently and freely by providing it, but such situation
> absolutely introduces more untoward things..
Yes, I think so too.
In fact, such a layout change is for the first time since pg_upgrade
has been introduced at 9.0.
Regards,
--
Masahiko Sawada
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2016-02-11 14:51:26 | Re: GinPageIs* don't actually return a boolean |
Previous Message | Robert Haas | 2016-02-11 14:48:39 | Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution. |