From: | Gerhard Heift <ml-postgresql-20081012-3518(at)gheift(dot)de> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Using database to find file doublettes in my computer |
Date: | 2008-11-18 12:42:28 |
Message-ID: | 20081118124228.GA9802@toaster.kawo1.rwth-aachen.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Nov 18, 2008 at 12:36:42PM +0000, Sam Mason wrote:
> On Mon, Nov 17, 2008 at 11:22:47AM -0800, Lothar Behrens wrote:
> > I have a problem to find as fast as possible files that are double or
> > in other words, identical.
> > Also identifying those files that are not identical.
>
> I'd probably just take a simple Unix command line approach, something
> like:
>
> find /base/dir -type f -exec md5sum {} \; | sort | uniq -Dw 32
You save a little bit of time by using
find /base/dir -type f -print0 | xargs -0 md5sum | sort | uniq -Dw 32
> this will give you a list of files whose contents are identical
> (according to an MD5 hash). An alternative would be to put the hashes
> into a database and run the matching up there.
>
>
> Sam
Gerhard
From | Date | Subject | |
---|---|---|---|
Next Message | Bill Moran | 2008-11-18 13:02:43 | Re: FreeBSD 7 needing to allocate lots of shared memory |
Previous Message | Sam Mason | 2008-11-18 12:36:42 | Re: Using database to find file doublettes in my computer |