From: | Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> |
---|---|
To: | Lothar Behrens <lothar(dot)behrens(at)lollisoft(dot)de> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Using database to find file doublettes in my computer |
Date: | 2008-11-20 11:01:46 |
Message-ID: | 4925439A.2010601@postnewspapers.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Lothar Behrens wrote:
> I have expected a smaller amount of records due to the fact that for 4
> files each available 2 times (sum = 8) I
> have 8 records in ECADFiles, but must have 4 in the above result.
>
> So for an average of 2 doubles I expected half the files from
> ECADFiles, because one is exactly right and the other
> is on the left.
It's a combinatorial problem. If you have 3 duplicate files, eg:
INSERT INTO paths (path) VALUES
(E'C:\\path\\file1.txt'),
(E'C:\\path2\\file1.txt'),
(E'/path/file1.txt');
then the query process I described above will output the matches:
C:\path\file1.txt | C:\path2\file1.txt
/path/file1.txt | C:\path2\file1.txt
/path/file1.txt | C:\path\file1.txt
because while it avoids showing both (A,B) and (B,A) pairs, for any A, B
and C it'll show:
(A,B)
(A,C)
(B,C)
I've attached test SQL that does the above.
Presumably, you want to only show, say:
(A,B)
(A,C)
or maybe:
(filename, A)
(filename, B)
(filename, C)
If that's what you want, you need to work a little differently. The
attached SQL in dups_test2.sql shows one way you might do it, by
generating a list of files with duplicates then listing all the
locations each appears in. Again, you can do it without the temp table,
it'll probably just be slower. None of what I've written is particularly
fast anyway - it evaluates those regular expressions many more times
than should be required, for example.
--
Craig Ringer
Attachment | Content-Type | Size |
---|---|---|
dups_setup.sql | text/x-sql | 810 bytes |
dups_test.sql | text/x-sql | 619 bytes |
dups_test2.sql | text/x-sql | 266 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2008-11-20 11:09:19 | Re: Res: Res: Res: Archive files growth!!! |
Previous Message | Richard Huxton | 2008-11-20 10:36:18 | Re: Serial - last value |