Merge/dedect duplicates (with different file/folder name)??

Windows specific questions, problems.
Post Reply
Utini

Merge/dedect duplicates (with different file/folder name)??

Post by Utini »

After loading 700 torrents from my previous uTorrent into qBittorrent, I noticed that about 300 Torrents were shown as "0% Done". 
I figured out that qBittorrent doesn't recognize a torrent when the filename or foldername is different, even if the content is the same. 
E.g.: 
Torrent-2017 
Torrent 2017 
Torrent.2017 

Even if all three folder have the exact same content (or the contents names are different like above described) qBittorent will not recognize them. 
I would need to manally "rename" it in qBittorrent when adding the torrent. 
uTorrent and also autodl-irssi seem to be able to detect such duplicates automatically and handle them without re-downloading. 

Can qBittorent do the same somehow?
User avatar
Peter
Administrator
Administrator
Posts: 2693
Joined: Wed Jul 07, 2010 6:14 pm

Re: Merge/dedect duplicates (with different file/folder name)??

Post by Peter »

Well, this would require adding regex kind of detection when adding... but not really a bad idea per se.
There are two things I can think about:
- How much overhead would this add on Windows? (Linux is usually much faster with io/file operations. As much as I use Windows on almost everything, that's just how it is.*)
- How many "hits" would this have, that is the other question. Would be great if we could make a build with this feature, add some kind of telemetry to the feature (it's work in progress to have a telemetry kind of thing), and then gather data. I mean, in my past ~8+ years use of qBittorrent I never ran into this. Different named folders, sub-folders, that kind of thing - yeah. But ./;/- and such? Not really. Then again that's just me.
Utini

Re: Merge/dedect duplicates (with different file/folder name)??

Post by Utini »

[quote="Peter"]
Well, this would require adding regex kind of detection when adding... but not really a bad idea per se.
There are two things I can think about:
- How much overhead would this add on Windows? (Linux is usually much faster with io/file operations. As much as I use Windows on almost everything, that's just how it is.*)
- How many "hits" would this have, that is the other question. Would be great if we could make a build with this feature, add some kind of telemetry to the feature (it's work in progress to have a telemetry kind of thing), and then gather data. I mean, in my past ~8+ years use of qBittorrent I never ran into this. Different named folders, sub-folders, that kind of thing - yeah. But ./;/- and such? Not really. Then again that's just me.
[/quote]

Wow that was a quick answer.

I am actually on Linux myself but that part seems to be dead here in the forum. Thats why I posted here and hoped it would also be implemented in the Linux qBittorent.

Tbh I have no idea how uTorrent has this implemented but I have been using uTorrent 1.6.1 for ages (8 years or so?) on all kind of slow hardware and it has never really caused problems. 
The way I could imagine it to work is:

1. Torrents gets added
2. Check if there are folders with the same size as the torrent (+/- 20MB?)
3. Check those folders for similar folder names
4. Check the content of the found folder for similiar file names / file size
5. Adapt/Merge the .torrent with the just dedected files.
6. Do the "Force Re-Check"
7. Start Torrent

I had to add my ~700 torrents to uTorrent about 3-5 times. The force check of the files would obviously take a long time (4TB of data that needs to be read) but at the end all my torrents were at 100% and ready to start seeding.
On qBittorrent I tried adding them twice (same folder structure, same .torrent files) and as already said, 300 out of the 700 are at "0% Done" although they are just duplicates of other torrents.

You cannot re-name them in qBittorrent afterwards (atleast on Linux). qBittorent just won't recognize them with a Force Re-Check. You can only rename them within qBittorent in the dialoge windows when adding the torrent to qBittorent.
So I have to delete every of those 300 torrents one-by-one, then re-add them one-by-one and carefully re-name every folder + every single file.. one by one...
That would cost me 7800 hours.
And if qBittorent decides to crash at some point and looses its torrent list, then I will have to re-add all torrents again and end up with 300 out of 700 torrents not being dedected :(
Last edited by Utini on Fri Oct 13, 2017 9:19 pm, edited 1 time in total.
there

Re: Merge/dedect duplicates (with different file/folder name)??

Post by there »

[quote="Utini"]

The way I could imagine it to work is:

1. Torrents gets added
2. Check if there are folders with the same size as the torrent (+/- 20MB?)
3. Check those folders for similar folder names
4. Check the content of the found folder for similiar file names / file size
5. Adapt/Merge the .torrent with the just dedected files.
6. Do the "Force Re-Check"
7. Start Torrent
[/quote]
I know where you are coming from, but feel points 2, 4, are not good. 1 bit difference in a film may be ok, but in software could be devastating.  Also the checksums that have to be calculated, and possibly calculated every time you start qbittorrent.

If it were to be implemented, some idiot like me would ask for the next step of just downloading the changed blocks and sourcing from the net the missing blocks of other any files with the same check sum, while using the existing blocks that you already have. There are only so many compilers out there, so there is going to be blocks of code in other applications and libraries that is the same - a dedupe raid system for filesharing.

Some software around already detects for duplicate files to zip, like 7zip -  just tick the box.

Have you ever tried the free SearchMyFiles from http://www.nirsoft.net as you can search for duplicate content, duplicate names, non duplicate content, including others?
Havokdan
Member
Member
Posts: 20
Joined: Mon Aug 27, 2012 9:57 am

Re: Merge/dedect duplicates (with different file/folder name)??

Post by Havokdan »

there

Re: Merge/dedect duplicates (with different file/folder name)??

Post by there »

Havokdan, you are a little devil - thanks :)

I did find this request from 2 years ago
http://vote.vuze.com/forums/170588-gene ... es-via-che

I left vuse a number of years ago because i could not tie a vpn nic to it unless i implement a firewall. I think i will revisit.

Due to searchmyfiles I have already found that the majority of .txt and .nfo files are duplicates. If one of these are missing, the the download never completes :(
Last edited by there on Sat Oct 14, 2017 3:28 pm, edited 1 time in total.
Utini

Re: Merge/dedect duplicates (with different file/folder name)??

Post by Utini »

Swarm Merging is a little different from what I meant. But I guess if swarm merging gets implemented then my problem is solved as well.
Post Reply