From: | Hari Babu <haribabu(dot)kommi(at)huawei(dot)com> |
---|---|
To: | "'Robert Haas'" <robertmhaas(at)gmail(dot)com> |
Cc: | "'Josh Berkus'" <josh(at)agliodbs(dot)com>, "'Amit kapila'" <amit(dot)kapila(at)huawei(dot)com>, "'Andres Freund'" <andres(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Review: Patch to compute Max LSN of Data Pages |
Date: | 2013-07-04 08:01:52 |
Message-ID: | 008b01ce788c$bee694a0$3cb3bde0$@kommi@huawei.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wednesday, July 03, 2013 1:26 AM Robert Haas Wrote:
>On Tue, Jun 25, 2013 at 1:42 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
>> I think the usecase for this utility isn't big enough to be included in
>> postgres since it really can only help in a very limited
>> circumstances. And I think it's too likely to be misused for stuff it's
>> not useable for (e.g. remastering).
>>
>> The only scenario I see is that somebody deleted/corrupted
>> pg_controldata. In that scenario the tool is supposed to be used to find
>> the biggest lsn used so far so the user then can use pg_resetxlog to set
>> that as the wal starting point.
>> But that can be way much easier solved by just setting the LSN to
>> something very, very high. The database cannot be used for anything
>> reliable afterwards anyway.
>I guess this is true, but I think I'm mildly in favor of including
>this anyway. I think I would have used it once or twice, if it had
>been there - maybe not even to feed into pg_resetxlog, but just to
>check for future LSNs. We don't have anything like a suite of
>diagnostic tools in bin or contrib today, for use by database
>professionals in fixing things that fall strictly in the realm of
>"don't try this at home", and I think we should have such a thing.
>Unfortunately this covers about 0.1% of the space I'd like to see
>covered, which might be a reason to reject it or at least insist that
>it be enhanced first.
>At any rate, I don't think this is anywhere near committable as it
>stands today. Some random review comments:
Thanks for the detailed review.
>remove_parent_refernces() is spelled wrong.
>Why does this patch need all of this fancy path-handling logic -
>specifically, remove_parent_refernces() and make_absolute_path()?
>Surely its needs are not that different from pg_ctl or pg_resetxlog or
>pg_controldata. If they all need it and it's duplicated, we should
>fix that. Otherwise, why the special logic here?
>I don't think we need getLinkPath() either. The server has no trouble
>finding its files by just using a pathname that follows the symlink.
>Why do we need anything different here? The whole point of symlinks
>is that you can traverse them transparently, without knowing where
>they point.
Removed the special handling of path functions.
>Duplicating PageHeaderIsValid doesn't seem acceptable. Moreover,
>since the point of this is to be able to use it on a damaged cluster,
>why is that logic even desirable? I think we'd be better off assuming
>the pages to be valid.
Corrected.
>The calling convention for this utility seems quite baroque. There's
>no real need for all of this -p/-P stuff. I think the syntax should
>just be:
>pg_computemaxlsn file_or_directory...
>For each argument, we determine whether it's a file or a directory.
>If it's a file, we assume it's a PostgreSQL data file and scan it. If
>it's a directory, we check whether it looks like a data directory. If
>it does, we recurse through the whole tree structure and find the data
>files, and process them. If it doesn't look like a data directory, we
>scan each plain file in that directory whose name looks like a
>PostgreSQL data file name. With this approach, there's no need to
>limit ourselves to a single input argument and no need to specify what
>kind of argument it is; the tool just figures it out.
Changed to accept file or directory without of options.
>I think it would be a good idea to have a mode that prints out the max
>LSN found in *each data file* scanned, and then prints out the overall
>max found at the end. In fact, I think that should perhaps be the
>default mode, with -q, --quiet to disable it. When printing out the
>per-file data, I think it would be useful to track and print the block
>number where the highest LSN in that file was found. I have
>definitely had cases where I suspected, but was not certain of,
>corruption. One could use a tool like this to hunt for problems, and
>then use something like pg_filedump to dump the offending blocks.
>That would be a lot easier than running pg_filedump on the whole file
>and grepping through the output.
Corrected.
>Similarly, I see no reason for the restriction imposed by
>check_path_belongs_to_pgdata(). I've had people mail me one data
>file; why shouldn't I be able to run this tool on it? It's a
>read-only utility.
>- if (0 != FindMaxLSNinDir(pathbuf, maxlsn, false)) and similar is not
>PostgreSQL style.
>+ printf(_("%s compute the maximum LSN in PostgreSQL data
>pages.\n\n"), progname);
Fixed.
>+ /*
>+ * Don't allow pg_computemaxlsn to be run as root, to avoid
overwriting
>+ * the ownership of files in the data directory. We need only check
for
>+ * root -- any other user won't have sufficient permissions to
modify
>+ * files in the data directory.
>+ */
>+ #ifndef WIN32
>+ if (geteuid() == 0)
>+ {
>+ fprintf(stderr, _("%s: cannot be executed by \"root\".\n"),
>+ progname);
>+ fprintf(stderr, _("You must run %s as the PostgreSQL
>superuser.\n"),
>+ progname);
>+ exit(1);
>+ }
>+ #endif
>This utility only reads files; it does not modify them. So this seems
>unnecessary. I assume it's blindly copied from somewhere else.
>+ fprintf(stderr, _("%s: \"%s\" not a valid data
directory.\n"),
>Extra space.
>+ /*
>+ * Check for a postmaster lock file --- if there is one, refuse to
>+ * proceed, on grounds we might be interfering with a live
installation.
>+ */
>+ snprintf(path, MAXPGPATH, "%s/postmaster.pid", DataDir);
>Again, this might be appropriate for pg_resetxlog, but I see no reason
>for the restriction here. The output might be inaccurate in that
>case, but if you're using this tool you're required to know what
>you're doing.
Fixed.
>+ For safety reasons, you must specify the data directory on the
>command line.
>+ <command>pg_computemaxlsn</command> does not use the environment
variable
>+ <envar>PGDATA</>.
>Same thing here. I think using PGDATA would be quite appropriate for
>this utility.
Fixed.
>+ <para>
>+ This command must not be used when the server is
>+ running. <command>pg_computemaxlsn</command> will refuse to start up
if
>+ it finds a server lock file in the data directory. If the
>+ server crashed then a lock file might have been left
>+ behind; in that case you can remove the lock file to allow
>+ <command>pg_computemaxlsn</command> to run. But before you do
>+ so, make doubly certain that there is no server process still alive.
>+ </para>
>More of the same paranoia.
>Overall my feeling is that this can be simplified quite a lot.
>There's a lot of stuff in here that's really not needed.
Corrected.
Please find the updated patch attached.
Regards,
Hari babu.
Attachment | Content-Type | Size |
---|---|---|
pg_computemaxlsn_v7.patch | application/octet-stream | 15.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Kirkwood | 2013-07-04 08:08:57 | Re: [9.4 CF 1] The Commitfest Slacker List |
Previous Message | Michael Paquier | 2013-07-04 07:59:24 | Re: request a new feature in fuzzystrmatch |