Re: git: uh-oh

From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 11:03:58
Message-ID: 4C74F89E.8040002@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25/08/10 09:18, Magnus Hagander wrote:
> On Wed, Aug 25, 2010 at 07:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:

>>> 2. Any non-ASCII characters in, for example, contributor's names show
>>> up differently in the two repos. Generally, the original repo is OK
>>> and the new repo is garbled; although I found one very old example
>>> that went the other way.
>>
>> What it looks like to me is that a Latin1->UTF8 conversion has been
>> applied to the log text. Which might be a good idea if it all *was*
>> Latin1, but a fair-sized percentage isn't. Applying this conversion to
>> UTF8 entries results in garbage, of course. Even if this could be done
>> reliably, I think this counts as editorializing on the historical
>> record, and should be switched off if possible.
>
> I think the problem is that we have a mix of them :( git requires it to be utf8.
>
> cvs2git is configured to try, in order, latin1, utf8 and ascii, and
> use whichever first returns correct result. In this case it seems it
> does return saying things are right, because the result is valid utf8
> - just not the utf8 we expected.
>
> I can give it a try the other way around - trying utf8 *before*
> latin1, to see if that makes it better - utf8 tends to be more strict.

*Every* byte sequence is valid latin1, therefore if you try latin1,
utf8, ascii in that order, latin1 will always be used.

You most likely want utf8, latin1 (no point also including ascii since
it's a strict subset of latin1).

>>> There are also a number of commits that differ in order between the
>>> two repos, and an even larger number where commits are duplicated or
>>> merged in one repository relative to the other.
>>
>> I suspect that this is an artifact of the converter trying to merge
>> nearby commits into one commit, which it more or less *has* to do for
>> sanity since CVS commits aren't atomic. I don't have a problem with
>> the concept, but I notice cases where the converted commit has a
>> timestamp some minutes later than what the cvs2cl output claims.
>> I suspect this is what the converter was using as a cutoff time.
>> Would it be possible to make sure that the converted commit is always
>> timestamped with the latest individual file update timestamp from the
>> included CVS commits?
>
> I can't comment o nthis part - Michael or Max?

cvs2git will try to use the timestamps from the commits, but sometimes
the ordering of how revisions and tags relate to each other will
actually disagree with the timestamps. In such a case, cvs2git nudges
commit timestamps forward in time, to force the defined temporal
ordering into consistency with the topological ordering of events.

In other words, no, you can't make cvs2git *always* use the timestamp
from a cvs commit, but it should have a good reason for doing so when it
deviates from that.

Max.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2010-08-25 11:15:53 Re: git: uh-oh
Previous Message Simon Riggs 2010-08-25 09:53:28 Re: Deadlock bug