Finding duplicates

Introduce yourself to us and other members here, or share your own product reviews, suggestions, and tips and tricks of using QNAP products.
Locked
User avatar
ve7tcc
Starting out
Posts: 17
Joined: Mon Sep 06, 2021 1:32 am

Finding duplicates

Post by ve7tcc »

Is there a way to use QNAP’s deduplication feature in quts hero to show me what files have duplicates on the same share? Perhaps in filestation?

Thanks!
Model: TS-h973AX
FW: h4.5.4.1800
User avatar
ve7tcc
Starting out
Posts: 17
Joined: Mon Sep 06, 2021 1:32 am

Re: Finding duplicates

Post by ve7tcc »

For images, I found I can use Qumagie. It has an album for similar photos. Beneath each is the path to the file. It’d be great if file station would flag duplicate files, with a link…
Model: TS-h973AX
FW: h4.5.4.1800
User avatar
ve7tcc
Starting out
Posts: 17
Joined: Mon Sep 06, 2021 1:32 am

Re: Finding duplicates

Post by ve7tcc »

I found it doesn’t reindex when you delete photos, which is a problem. If you reindex, you lose all the facial recognition data.
Model: TS-h973AX
FW: h4.5.4.1800
gottabjoaken
New here
Posts: 5
Joined: Fri Dec 03, 2021 8:57 am

Re: Finding duplicates

Post by gottabjoaken »

QuMagie: Similar Photos: MacOS Web.

When browsing the Similar Photo album, the folder location of the photograph is truncated:

When you hover the cursor over the location, the full path is shown in a pop-up.
However to do that for each image in the set of similar photos is very time consuming.Immediate identification and ability to decide which, if any, are to be removed would be greatly assisted if the full location path were displayed under each image.
With perhaps thousands of similar sets, this will save a huge amount of time.

Ticket raised.
If this affects you please also raise a ticket.
gottabjoaken
New here
Posts: 5
Joined: Fri Dec 03, 2021 8:57 am

Re: Finding duplicates

Post by gottabjoaken »

Regarding duplicate photos, I have also raised a ticket for an enhancement request as follows:

I have many, many, many similar photo files reported.

Some are not similar: Head turned from profile to full face

Many are stored in multiple Folders in the manner of Albums, so I require these photos where they have been placed.

It becomes essential to have the ability to mark reported files to be ignored and therefore not to be listed.


Please advise if this can be provided as an enhancement, and when that might be available.


An even better enhancement would be the ability to create an Alias for files and folders withinFile Station, so that the duplication of a photo file is not necessary to propogate photos so that they can be viewed in virtual albums, only the need to list the aliases in a folder to make that folder a virtual album.

I appreciate that albums can be created in QuMagie, but a more physical accessibility is wished for in File Station.

If these issues might improve your experience please increase the chance of success by raising a ticket.
User avatar
Moogle Stiltzkin
Guru
Posts: 11448
Joined: Thu Dec 04, 2008 12:21 am
Location: Around the world....
Contact:

Re: Finding duplicates

Post by Moogle Stiltzkin »

to find duplicate files e.g. pictures, i use dupeguru
https://dupeguru.voltaicideas.net/
https://github.com/jlesage/docker-dupeguru

you can either run this app from your windows pc then browse to the nas location, OR, you can install it as a docker app and run it directly from your nas. Either option works, i tested it both.

whats nice about this app, it uses a gui. It will put the suspected duplicated images side by side, so you can manually inspect to check if it's a legit duplicate or a false positive. So you can go through the suspected duplicates to manually check whats needs to be deleted or not.

i do this kind of checking ever couple years cause i may make the mistake of saving duplicates without having realized it. You don't want to use precious storage space on duplicates, it's a waste of space :S
NAS
[Main Server] QNAP TS-877 (QTS) w. 4tb [ 3x HGST Deskstar NAS & 1x WD RED NAS ] EXT4 Raid5 & 2 x m.2 SATA Samsung 850 Evo raid1 +16gb ddr4 Crucial+ QWA-AC2600 wireless+QXP PCIE
[Backup] QNAP TS-653A (Truenas Core) w. 4x 2TB Samsung F3 (HD203WI) RaidZ1 ZFS + 8gb ddr3 Crucial
[^] QNAP TL-D400S 2x 4TB WD Red Nas (WD40EFRX) 2x 4TB Seagate Ironwolf, Raid5
[^] QNAP TS-509 Pro w. 4x 1TB WD RE3 (WD1002FBYS) EXT4 Raid5
[^] QNAP TS-253D (Truenas Scale)
[Mobile NAS] TBS-453DX w. 2x Crucial MX500 500gb EXT4 raid1

Network
Qotom Pfsense|100mbps FTTH | Win11, Ryzen 5600X Desktop (1x2tb Crucial P50 Plus M.2 SSD, 1x 8tb seagate Ironwolf,1x 4tb HGST Ultrastar 7K4000)


Resources
[Review] Moogle's QNAP experience
[Review] Moogle's TS-877 review
https://www.patreon.com/mooglestiltzkin
efarmun
First post
Posts: 1
Joined: Sun Aug 30, 2020 2:33 pm

Re: Finding duplicates

Post by efarmun »

Moogle Stiltzkin wrote: Fri Feb 11, 2022 1:02 pm to find duplicate files e.g. pictures, i use dupeguru
https://dupeguru.voltaicideas.net/
https://github.com/jlesage/docker-dupeguru

you can either run this app from your windows pc then browse to the nas location, OR, you can install it as a docker app and run it directly from your nas. Either option works, i tested it both.

whats nice about this app, it uses a gui. It will put the suspected duplicated images side by side, so you can manually inspect to check if it's a legit duplicate or a false positive. So you can go through the suspected duplicates to manually check whats needs to be deleted or not.

i do this kind of checking ever couple years cause i may make the mistake of saving duplicates without having realized it. You don't want to use precious storage space on duplicates, it's a waste of space :S
Thanks for the suggestion. Just installed docker and started scan.
User avatar
Moogle Stiltzkin
Guru
Posts: 11448
Joined: Thu Dec 04, 2008 12:21 am
Location: Around the world....
Contact:

Re: Finding duplicates

Post by Moogle Stiltzkin »

efarmun wrote: Sat Apr 23, 2022 11:08 am Thanks for the suggestion. Just installed docker and started scan.
ur welcome.

also i wouldn't 100% trust the result, but thats why they have a side by side comparison, so u can double check yourself what the app detected was considered a duplicate.

the false positives i found, was some sort of valid variation of a pic.

other times i notice that i kept a duplicate of a file unknowingly, it's these types i then procede to delete multiples of. i only bother to do this every few years, because it's not worth the effort to me when these types of things only tend to hit somewhere within a few gigabytes in terms of storage space wastage. your milleage may vary.


another way for storage space management, i would use the qsirch and sort files by their file size. and i will check the biggest file size first to see what that is and whether it's justified to be using so much space :D sometimes when storing stuff, you don't realize just how big some of these things are. over time i've been replacing some of my x264 videos for x265 that only requires sometimes less than half the storage space.
Last edited by Moogle Stiltzkin on Fri Dec 02, 2022 6:54 am, edited 1 time in total.
NAS
[Main Server] QNAP TS-877 (QTS) w. 4tb [ 3x HGST Deskstar NAS & 1x WD RED NAS ] EXT4 Raid5 & 2 x m.2 SATA Samsung 850 Evo raid1 +16gb ddr4 Crucial+ QWA-AC2600 wireless+QXP PCIE
[Backup] QNAP TS-653A (Truenas Core) w. 4x 2TB Samsung F3 (HD203WI) RaidZ1 ZFS + 8gb ddr3 Crucial
[^] QNAP TL-D400S 2x 4TB WD Red Nas (WD40EFRX) 2x 4TB Seagate Ironwolf, Raid5
[^] QNAP TS-509 Pro w. 4x 1TB WD RE3 (WD1002FBYS) EXT4 Raid5
[^] QNAP TS-253D (Truenas Scale)
[Mobile NAS] TBS-453DX w. 2x Crucial MX500 500gb EXT4 raid1

Network
Qotom Pfsense|100mbps FTTH | Win11, Ryzen 5600X Desktop (1x2tb Crucial P50 Plus M.2 SSD, 1x 8tb seagate Ironwolf,1x 4tb HGST Ultrastar 7K4000)


Resources
[Review] Moogle's QNAP experience
[Review] Moogle's TS-877 review
https://www.patreon.com/mooglestiltzkin
nagarenb
New here
Posts: 2
Joined: Fri Dec 02, 2022 6:11 am

Re: Finding duplicates

Post by nagarenb »

Moogle Stiltzkin wrote: Fri Feb 11, 2022 1:02 pm to find duplicate files e.g. pictures, i use dupeguru
https://dupeguru.voltaicideas.net/
https://github.com/jlesage/docker-dupeguru

you can either run this app from your windows pc then browse to the nas location, OR, you can install it as a docker app and run it directly from your nas. Either option works, i tested it both.

whats nice about this app, it uses a gui. It will put the suspected duplicated images side by side, so you can manually inspect to check if it's a legit duplicate or a false positive. So you can go through the suspected duplicates to manually check whats needs to be deleted or not.

i do this kind of checking ever couple years cause i may make the mistake of saving duplicates without having realized it. You don't want to use precious storage space on duplicates, it's a waste of space :S
Guys Thanks for the suggestion. Could you please help me to install docker on NAS, also suggest how I can install this utility on top of this?
User avatar
dolbyman
Guru
Posts: 35020
Joined: Sat Feb 12, 2011 2:11 am
Location: Vancouver BC , Canada

Re: Finding duplicates

Post by dolbyman »

Install Container Station and then install the container

https://www.qnap.com/en/software/container-station

If your unknown NAS with unknown firmware can do that
User avatar
Moogle Stiltzkin
Guru
Posts: 11448
Joined: Thu Dec 04, 2008 12:21 am
Location: Around the world....
Contact:

Re: Finding duplicates

Post by Moogle Stiltzkin »

nagarenb wrote: Fri Dec 02, 2022 6:17 am Guys Thanks for the suggestion. Could you please help me to install docker on NAS, also suggest how I can install this utility on top of this?
quick install (can't claim everything done correctly, but this what i tried to get it to work straight away. if you want to double check the configurations, read the github instructions carefully like i did :) )
https://github.com/jlesage/docker-dupeguru


step1:
install container station if you haven't already (i tested with cs v3 beta)

step2:
in cs, go to explore and type the following and install that app

Code: Select all

jlesage/dupeguru
select latest when offered a choice for what version

step3:

during the install configuration

to make your config persist after updating, best use a different folder location outside the default

for example i used this

Code: Select all

\\my nas\Container\jlesage-dupeguru\config
so in cs storage, u add the location path for the corresponding label. so set one for the config and the other for trash.

as for the label storage, i believe this should point to the share where you plan to have dupeguru inspect the contents (just be aware the app will have read/write, so if u delete using app, it will delete ur data from the share as well, so be careful when using)

by default the storage locations are all containerized. so i replaced those with a bind mount path with r/w for both the config and trash. as for the storage i set the location to the share i want dupeguru to scan. But i set read only (i assume this will work. i only want it to scan, but i don't want it to write, in case the app makes a mistake and tries to delete something. i'll do that manually myself within file explorer. i'm only dupeguru to detect dupes only)

/config rw This is where the application stores its configuration, states, log and any files needing persistency.
/storage rw This location contains files from your host that need to be accessible to the application.
/trash rw This is where duplicated files are moved when they are sent to trash.


notes: as part of the configuration for the initial setup, you can opt to change the network settings if you wish. the standard practise is to switch to bridge mode and set a static ip. i skipped this part and just use the default which uses the nas ip with a custom port.

another good practise is to set a limit on cpu and ram resources used by the app. personally i didn't bother, but you can if you want to. and for the record, after i finish using the app i just click stop. i only enable back the app when i need to run it, otherwise i just disable it. no need to bother removing app, you can just leave it and just enable when needed (though probably update it before using )

step4:

once the app is running, open the url to access the app that is now running natively on your QNAP NAS. to find the url look at the app in cs, the url u can copy there then insert to web browser.

step5:

within the app, click the + sign, then add the share you want to scan (this is where that previous storage location matters. you can only select based on that selection). Once done, click the scan button bottom right.



so i tested it works. but since it's read only with no write, the app is unable to delete anything in my important shares. it can merely scan to detect for duplicates only. if you want to be able to delete files via the dupeguru UI, you just merely have to provide it R/O instead of R only.


this is just an example of a false dupe. see the difference is the mouth. this is why we have to spot check the results and not simply believe it's a dupe because it might not be. Or you can decide the differences are not worth the storage space and delete if you wish regardless if it's a dupe or not. dupeguru just makes it convenient/easier to detect the dupes for you to save you a lot of time :D

Image



*update

to update the app, simply go to container station, select the app in your container, click the scroll wheel edit, then select recreate directly and pull the latest image (if you want to ammend some installation settings, selection recreate).

Becauz earlier we set a bind mount path to a custom folder not the default, this then lets any saved settings persist after updating the app. seems this app has a trash folder as well, so that should survive an update as well.
Last edited by Moogle Stiltzkin on Fri Dec 02, 2022 7:53 pm, edited 2 times in total.
NAS
[Main Server] QNAP TS-877 (QTS) w. 4tb [ 3x HGST Deskstar NAS & 1x WD RED NAS ] EXT4 Raid5 & 2 x m.2 SATA Samsung 850 Evo raid1 +16gb ddr4 Crucial+ QWA-AC2600 wireless+QXP PCIE
[Backup] QNAP TS-653A (Truenas Core) w. 4x 2TB Samsung F3 (HD203WI) RaidZ1 ZFS + 8gb ddr3 Crucial
[^] QNAP TL-D400S 2x 4TB WD Red Nas (WD40EFRX) 2x 4TB Seagate Ironwolf, Raid5
[^] QNAP TS-509 Pro w. 4x 1TB WD RE3 (WD1002FBYS) EXT4 Raid5
[^] QNAP TS-253D (Truenas Scale)
[Mobile NAS] TBS-453DX w. 2x Crucial MX500 500gb EXT4 raid1

Network
Qotom Pfsense|100mbps FTTH | Win11, Ryzen 5600X Desktop (1x2tb Crucial P50 Plus M.2 SSD, 1x 8tb seagate Ironwolf,1x 4tb HGST Ultrastar 7K4000)


Resources
[Review] Moogle's QNAP experience
[Review] Moogle's TS-877 review
https://www.patreon.com/mooglestiltzkin
nagarenb
New here
Posts: 2
Joined: Fri Dec 02, 2022 6:11 am

Re: Finding duplicates

Post by nagarenb »

Many thanks Moogle, I will check and revert in case of an issue.
User avatar
Moogle Stiltzkin
Guru
Posts: 11448
Joined: Thu Dec 04, 2008 12:21 am
Location: Around the world....
Contact:

Re: Finding duplicates

Post by Moogle Stiltzkin »

nagarenb wrote: Fri Dec 02, 2022 4:53 pm Many thanks Moogle, I will check and revert in case of an issue.
np ur welcome. feel free to ask if u get stuck :D

but i already tested myself and posted the steps :wink:
NAS
[Main Server] QNAP TS-877 (QTS) w. 4tb [ 3x HGST Deskstar NAS & 1x WD RED NAS ] EXT4 Raid5 & 2 x m.2 SATA Samsung 850 Evo raid1 +16gb ddr4 Crucial+ QWA-AC2600 wireless+QXP PCIE
[Backup] QNAP TS-653A (Truenas Core) w. 4x 2TB Samsung F3 (HD203WI) RaidZ1 ZFS + 8gb ddr3 Crucial
[^] QNAP TL-D400S 2x 4TB WD Red Nas (WD40EFRX) 2x 4TB Seagate Ironwolf, Raid5
[^] QNAP TS-509 Pro w. 4x 1TB WD RE3 (WD1002FBYS) EXT4 Raid5
[^] QNAP TS-253D (Truenas Scale)
[Mobile NAS] TBS-453DX w. 2x Crucial MX500 500gb EXT4 raid1

Network
Qotom Pfsense|100mbps FTTH | Win11, Ryzen 5600X Desktop (1x2tb Crucial P50 Plus M.2 SSD, 1x 8tb seagate Ironwolf,1x 4tb HGST Ultrastar 7K4000)


Resources
[Review] Moogle's QNAP experience
[Review] Moogle's TS-877 review
https://www.patreon.com/mooglestiltzkin
mediterrano
New here
Posts: 8
Joined: Sun Jan 16, 2011 11:33 am

Re: Finding duplicates

Post by mediterrano »

Moogle Stiltzkin wrote: Fri Dec 02, 2022 7:10 am
step3:

Code: Select all

\\my nas\Container\jlesage-dupeguru\config
so in cs storage, u add the location path for the corresponding label. so set one for the config and the other for trash.

as for the label storage, i believe this should point to the share where you plan to have dupeguru inspect the contents (just be aware the app will have read/write, so if u delete using app, it will delete ur data from the share as well, so be careful when using)

by default the storage locations are all containerized. so i replaced those with a bind mount path with r/w for both the config and trash. as for the storage i set the location to the share i want dupeguru to scan. But i set read only (i assume this will work. i only want it to scan, but i don't want it to write, in case the app makes a mistake and tries to delete something. i'll do that manually myself within file explorer. i'm only dupeguru to detect dupes only)
in my case, CS Version 3.0.5.623 (2023/08/30) does not accept the below UNC syntax

Code: Select all

\\my nas\Container\jlesage-dupeguru\config
,

also it does not accept backward slashes
mediterrano
New here
Posts: 8
Joined: Sun Jan 16, 2011 11:33 am

Re: Finding duplicates

Post by mediterrano »

that the CS interface doesn't allow using UNC syntax to add a shared folder is also mentioned in this forum post
https://github.com/jlesage/docker-dupeguru/issues/1

The solution suggested in this post is using one of the below command-lines in a terminal like WinSCP or Putty

Code: Select all

docker run -d --name=dupeguru -p 5800:5800 -v /docker/appdata/dupeguru:/config:rw -v $HOME:/share/CACHEDEV1_DATA/Download:rw   jlesage/dupeguru
OR

Code: Select all

docker run -d --name=dupeguru -p 5800:5800 -v /docker/appdata/dupeguru:/config:rw -v /share/CACHEDEV1_DATA/Download:/storage:rw jlesage/dupeguru
You also need to add the correct USER_ID and create a user for the CS.


Here are some very useful configuration instructions I found on https://www.reddit.com/r/qnap/comments/ ... _qnap_qts/
Click on Advanced Settings

Under Environment, change USER_ID and GROUP_ID values to the 0 (zero, which means root privileges in Linux (use at your own risk)

Under Network (left menu), you can leave NAT but i like to keep my services in different IPs so i put Bridge, Static IP and I set my IP. You do you.

Moving to Shared Folders (left menu), add the Volumes from Host - click Add, select the Shared Folder from the dropdown menu (it populates when you click the Host Path)

Choose your Mount Point, example: "/mysharedfolder" (without the "") and give it a name.

Click Create and wait till it's done

Now that we waited and everything is up and running, go to the QNAP's IP (if NAT) or to the IP you set up in step 8 (if BRIDGE) trough port 5800. (http://NAS_IP:5800 or http://IP_YOU_CHOSE:5800)

Dupeguru dashboard should appear
BTW the menu "Shared Folders" in CS2 now corresponds to the menu "Storage" in CS3, so don't get irritated by it.
Locked

Return to “Users' Corner”