Is dedupe a realistic requirement?

Tell us your most wanted features from QNAP products.
ppcrobcole
First post
Posts: 1
Joined: Sat Nov 02, 2013 4:34 pm

Re: Is dedupe a realistic requirement?

Post by ppcrobcole »

I understand this is an old thread but and FYI I've been able to use Dedupe on my QNAP by Creating and ISCSI share that I map onto a 2012 Server either via Native 2012 ISCSI or via VMWARE Mapping.
Then add the Volume as a 2012 Server Volume And then from the 2012 Server enable dedupe on the Volume.

It is great for storing backups etc and general use data.
User avatar
Don
Guru
Posts: 12289
Joined: Thu Jan 03, 2008 4:56 am
Location: Long Island, New York

Re: Is dedupe a realistic requirement?

Post by Don »

Well that is not really using dedupe on the NAS. Dedupe is running on a 2012 server and not on the NAS. The NAS is only the storage medium.
Use the forum search feature before posting.

Use RAID and external backups. RAID will protect you from disk failure, keep your system running, and data accessible while the disk is replaced, and the RAID rebuilt. Backups will allow you to recover data that is lost or corrupted, or from system failure. One does not replace the other.

NAS: TVS-882BR | F/W: 5.0.1.2346 | 40GB | 2 x 1TB M.2 SATA RAID 1 (System/VMs) | 3 x 1TB M.2 NMVe QM2-4P-384A RAID 5 (cache) | 5 x 14TB Exos HDD RAID 6 (Data) | 1 x Blu-ray
NAS: TVS-h674 | F/W: 5.0.1.2376 | 16GB | 3 x 18TB RAID 5
Apps: DNSMasq, PLEX, iDrive, QVPN, QLMS, MP3fs, HBS3, Entware, DLstation, VS, +
User avatar
sWORDs
Know my way around
Posts: 153
Joined: Wed Dec 15, 2010 3:17 am
Location: The Netherlands

Re: Is dedupe a realistic requirement?

Post by sWORDs »

+1
TS-870 upgraded with i7 3770t and 16GB F3-1600C10D-16GSQ kit, replaced LAN-1G2T-D with Intel I350-T4.
6x 2TB Samsung HD203WI in RAID 5 with Samsung 480GB MZ7WD480HAGM-00003 as SSD Cache.
NAS firmware 4.2.0 Build 20150716.
cbeerse
Starting out
Posts: 42
Joined: Wed Jun 01, 2011 4:43 am

Re: Is dedupe a realistic requirement?

Post by cbeerse »

Thinking about deduplication on disks (as in spindles, not ssd-s): blocks are compared and double blocks are stored only once. Hence, the first file with the block in it is stored continuously on disk, no problem so far. The next file with the same block is stored on disk with the double block pointing to the first one. This new file is fragmented by design. In the end, files with random double blocks can be stored in a spagetti-fragmentation due to the deduplicaiton. Hence, if it is for usability, I might not like this kind of fragmentation-by-deduplication.

On the other hand, the way Apple-s time-machine does its file-based deduplication can be used on qnaps since ages (since it uses rsync and a unix style filesystem): The first backup does create a fresh tree of files. Successive backups only add new or changed files. The files that have not been touched are linked from the previous backup using the '--link-dest=dir' option from rsync.
Backup is preparation for restore or recovery.
User avatar
schumaku
Guru
Posts: 43579
Joined: Mon Jan 21, 2008 4:41 pm
Location: Kloten (Zurich), Switzerland -- Skype: schumaku
Contact:

Re: Is dedupe a realistic requirement?

Post by schumaku »

cbeerse wrote:On the other hand, the way Apple-s time-machine does its file-based deduplication can be used on qnaps since ages (since it uses rsync and a unix style filesystem): The first backup does create a fresh tree of files.
When I think about the regular Time Machine indications of having to create a fresh full backup (informing the user to loose the history) on _both_ "native" Apple Time Capsules and NAS Time Machine destinations - not sure that's a feature is something I want on the regular storage :evil:
cbeerse
Starting out
Posts: 42
Joined: Wed Jun 01, 2011 4:43 am

Re: Is dedupe a realistic requirement?

Post by cbeerse »

schumaku wrote:
cbeerse wrote:On the other hand, the way Apple-s time-machine does its file-based deduplication can be used on qnaps since ages (since it uses rsync and a unix style filesystem): The first backup does create a fresh tree of files.
When I think about the regular Time Machine indications of having to create a fresh full backup (informing the user to loose the history) on _both_ "native" Apple Time Capsules and NAS Time Machine destinations - not sure that's a feature is something I want on the regular storage :evil:
I have to admit, I donnot know details on apples time capsules and such. The rsync way I'm using at home does create a complete tree-structure. The files are effectively the same where they have not been changed and they are new for changed and new files. That is all based on the unix filesystem feature of hard-links. If an old version is removed, the tree-structure is removed and all the hardlinks in it to the files are cleared. Only the files that have no hardlinks left are effectively removed. So there is no need to do a fresh full backup since every backup is effectively a full backup, only the old data is not transferred during incremental backups.

Some detail I found out with the above: if a backup is stopped/crashed half-way that backup is not complete as you'd expect. The next backup will find this half-way backup and use that as its base, hence, more data is transferred and the deduplication has a glitch there.
Backup is preparation for restore or recovery.
User avatar
schumaku
Guru
Posts: 43579
Joined: Mon Jan 21, 2008 4:41 pm
Location: Kloten (Zurich), Switzerland -- Skype: schumaku
Contact:

Re: Is dedupe a realistic requirement?

Post by schumaku »

Neither Time Machine nor these applications based on rsync and making use of hard links are anywhere near to an universal de-duplication I'm afraid.
cbeerse
Starting out
Posts: 42
Joined: Wed Jun 01, 2011 4:43 am

Re: Is dedupe a realistic requirement?

Post by cbeerse »

schumaku wrote:Neither Time Machine nor these applications based on rsync and making use of hard links are anywhere near to an universal de-duplication I'm afraid.
True as in a violin is a musical instrument but a musical instrument is not necessary a violin, it can also be a piano. Universal de-duplication is no implementation, it is a method to store a lot of data in less space by recognizing doubles and storing them less frequent. What kind of deduplication do you fancy, each one has its advantages and its disadvantages, both on writing on reading and in maintenance. In disk-storage I see block-deduplication and file-deduplication as reasonable implementations.

Since it is fairly easy to recognize file duplication in successive backups, it is my preferred way on low-power systems like my qnap 419 nas. The ms-exchange mail server at the office has a similar way of deduplication by message. The big san in the server-room does its block-level deduplication at night and needs its memory-buffers and cpu-power, where this memory buffers are used in day-time for file-reconstruction, to overcome the massive file fragmentation.
Backup is preparation for restore or recovery.
User avatar
schumaku
Guru
Posts: 43579
Joined: Mon Jan 21, 2008 4:41 pm
Location: Kloten (Zurich), Switzerland -- Skype: schumaku
Contact:

Re: Is dedupe a realistic requirement?

Post by schumaku »

That's an application to avoid duplication to be correct. What-if say Mac users, Windows users, and Linux users write the same file n-times by SAMBA, AFP, and NFS ...? I don't want to fight with you, ok agree - it's de-duplication by an rsync based application.
cbeerse
Starting out
Posts: 42
Joined: Wed Jun 01, 2011 4:43 am

Re: Is dedupe a realistic requirement?

Post by cbeerse »

schumaku wrote:That's an application to avoid duplication to be correct. What-if say Mac users, Windows users, and Linux users write the same file n-times by SAMBA, AFP, and NFS ...? I don't want to fight with you, ok agree - it's de-duplication by an rsync based application.
Well, if these users write the same file, it might be the same file to your view, but on most network-filesystems that is a different file as it has a different timestamp. With file deduplication, the file-properties as name, size and (write) timestamps are the first items to check.

How do you think block-based deduplication will find such a file being the same? It does not, as it does not look at the file specification. It only looks at the blocks that make the file. It does compare every block with other blocks, maybe on hash-base, maybe bit-by-bit (likely both). Then it removes the new block and changes the pointers to using the old one. It does this on every write action. Hence in your occasion, it accepts 3 different files, generates hashes for every block (maybe the same as for raid5, maybe different onces...) and start comparing those. Then you ask yourself why writing to such a device is slow... Next step, if you read those files, you wonder why the disks are crunching. That's because the files are fragmented block by block.
Backup is preparation for restore or recovery.
Anata mo
Starting out
Posts: 14
Joined: Fri Oct 23, 2009 6:10 pm
Location: Denmark

Re: Is dedupe a realistic requirement?

Post by Anata mo »

Well as you write:
Don wrote:...The NAS is only the storage medium.
then that fits just fine, because since he's storing the files on the iSCSI target, that is on the NAS, then data is only being written once with Dedup... So the description is fine "Dedupe on the NAS" it's just not being performed by the NAS it self ;)

I'm also using Windows Server 2012 R2 with Dedupe and that saves like +80% of my usage on the NAS in this way... so I would agrre with "ppcrobcole" that its a great way of doing it... at least until QNAP support it natively with fx. Permabit or the likes :)

So:
+1
from me too!
_____________________________________________________________________________
TVS-673e 2x SSD, 4x 6TB HGST NAS HDD (2018-)
✝TVS-663 w. 16G RAM, 2x SSD, 3x 6TB HGST NAS HDD
✝TS-469 Pro w. 4x 3TB WD RED
✝TS-239 Pro w. 2x 1/2/4TB (2007 -> 2019-05 after +12 years of 24/7 service)
User avatar
storageman
Ask me anything
Posts: 5507
Joined: Thu Sep 22, 2011 10:57 pm

Re: Is dedupe a realistic requirement?

Post by storageman »

All forms of dedupe have considerable load on processors (as much as 30%).
Therefore dedupe is more relevant to backup targets than day to day live storage.
Only storage with Xeon type processors can cope with the kind of load it imposes.
User avatar
forkless
Experience counts
Posts: 1907
Joined: Mon Nov 23, 2009 6:52 am
Location: The Netherlands

Re: Is dedupe a realistic requirement?

Post by forkless »

You have to look at the environment in which you want to apply a de-duplication mechanism. For most of QNAPs users it's not really hugely beneficial. In theory it sounds nice to have no duplicates on your filesystem, but most of us here are in a single (few) user environment. Duplication will be practically non-existent.

If you are running a QNAP for business purposes in a production environment... well then shame on you. I would use QNAP at best for lab/proof of concept set-ups.
User avatar
Briain
Experience counts
Posts: 1749
Joined: Tue Apr 20, 2010 11:56 pm
Location: Edinburgh (Scotland)

Re: Is dedupe a realistic requirement?

Post by Briain »

forkless wrote:...In theory it sounds nice to have no duplicates on your filesystem, but most of us here are in a single (few) user environment. Duplication will be practically non-existent.
Ah, if only I were that organised/competent! :roll: :lol:

Actually, being serious for a moment - unusual for me - I'm just wondering if there's a known utility to seek out and log exact duplicates and their paths (such that I could manually investigate/delete them)? It isn't something I'd even considered until reading this thread (I doubt I have many, but it would be mildly interesting to find that out; I'd bet there will be a few).

Bri :D
TS-119, 1 X Seagate ~~ TS-219, 2 X Seagate (R1) ~~ TS-453A, 2 X 3 TB WD Red (R1) ~~ TS-659, 5 X 1 TB Hitachi Enterprise (R6)
APC Smart-UPS 750
lleone
New here
Posts: 4
Joined: Tue Oct 28, 2014 9:34 pm

Re: Is dedupe a realistic requirement?

Post by lleone »

I'm running a Windows Home Server v1 (based on Win2003R2) on a single core 512MB ram virtual machine just to backup 7 family client (personal/work).
It does single instance storage (block based dedupe) , bare metal restore, backup rotation ( it keeps the last 54 backups for every single client, one a week ) and it copes it very well using standard ntfs formatted disks (unlike W2012r2 as far as I know).

I have not found out yet a similar (low cost) backup software and I plan to run it as a virtual machine on a TS-453pro.

I'd love if Qnap came out with native dedupe and such a backup software.
(or if someone could point me to alternatives)

Thanks,
L.
Luca

---
TS-453Pro 4.2.1 build 20160201, 16GBram
3xWD60EFRX (raid 5) firmware 82.00A82 -> Storage Pool 1 10.90TB -> DataVol1 (System) 10.80TB
trunk1 = eth1+eth2 / alb (static via dhcp reservation)
trunk2 = eth3+eth4 / tlb - VMs bridge, Virtual Switch 1
Post Reply

Return to “Features Wanted”