Private Cloud Storage: 1 + 1 = 27 backup solutions
Why would a convinced “pro-cloudian” invest into a geo-redundant backup and restore solution for private (cloud) storage? The reasons for this were fairly simple:
- I store quite a bit of music (imagery and audio all-the-same); storing that solely in the cloud is (a) expensive and (b) slow when streamed (Austrian downstream is not yet really that fast)
- In addition to that, I store quite a lot of important projects data meanwhile (in different public clouds, of course, but also on my office NAS); at one point I needed a second location to further secure this data
- I wanted a home media streaming solution close by my hifi
My current NAS used to be a Synology DS411 (4 2TB discs, Synology Hybrid Raid – SHR – which essentially is RAID5). My new one is now a DS416 (same configuration; I just upgraded discs in a way that both NASs now run 2 2TB and 2 3TB discs – mainly disc lifetime considerations were leading to this, and the fact that I didn’t wanna throw away still-good harddiscs (if you’re interested in the upgrade process, just post a quick comment and I’ll come back to that gladly – but with Synology that’s pretty straight forward).
Bored already and not keen on learning all the nitty-gritty details: You can jump to the end, if you really need to 😉
More than 1 backup
Of course, it’s not 27 options – as in the headline – but it’s a fair lot of possibilities to move data between to essentially identical NASs for the benefit of data resilience. Besides that, a few additional constraints come into account when setting it up for geo-redundancy:
- Is one of the 2 passively taking backup data only or are both actively offering services? (in my case: the latter, as one of the 2 would be the projects’ storage residing in the office and the other would be storage for media mainly – but not only – used at home)
- How much upstream/downstream can I get for which amount of data to be synced? (ease of thought for me: both locations are identical in that respect, so it boiled down to data volume considerations)
- Which of the data is really needed actively and where?
- Which of the data is actively accessed but not changed (I do have quite a few archive folder trees stored on my NAS which I infrequently need)
Conclusion: For some of the data incremental geo-backup suffices fully; other data needs to be replicated to the respective other location but kept read-only; for some data I wanted to have readable replications on both locations.
First things first: Options
The above screenshot shows available backup packages that can be installed on any Synology disc station:
- Time Backup is a Synology owned solution that offers incremental/differential backup; I recently heard of incompatibilities with certain disc stations and/or harddiscs, hence this wasn’t my first option (whoever has experiences with this, please leave a comment; thanx)
- Of all the public cloud backup clients (ElephantDrive, HiDrive, Symform and Glacier) AWS Glacier seemed the most attractive as I’m constantly working within AWS anyway and I wasn’t keen on diving into extended analysis of the others. However, Glacier costs for an estimate of 3 TB would be $36 in Frankfurt and $21 in the US. Per month. Still quite a bit when already running 2 disc stations anyway which both are far from being over-consumed – yet.
- Symform offers an interesting concept: In turn to contribution to a peer-to-peer network one gets ever more free cloud storage for backup; still I was more keen on finding an alternative without ongoing effort and cost
BTW: Overall CAPEX for the new NAS was around EUR 800,- (or less than 2 years of AWS Glacier storage costs for not even the full capacity of the new NAS). No option, if elasticity and flexibility aren't key that much ...
The NAS-to-NAS way of backup and restore
For the benefit of completeness:
- Synology “Cloud Sync” (see screen shot above) isn’t really backup: It’s a way of replicating files and folders from your NAS to some public cloud file service like GoogleDrive or Dropbox. I can confirm, it works flawlessly, but is no more than a bit of a playground if one intends to have some files available publicly – for whatever reason (I use it to easily mirror and share my collection of papers with others without granting them access to my NAS).
- Synology Cloud Station – mind(!) – is IMHO one of the best tools that Synology did so far (besides DSM itself). It’s pretty reliable – in my case – and even offers NAS-2-NAS synchronization of files and folders; hence, we’ll get back to this piece a little later.
- Finally – and that’s key for what’s to come – there’s the DSM built-in “Backup & Replication” options to be found in the App Launcher. And this is mainly what I bothered with in the first few days of running two of these beasts.
“Backup and Replication” offers:
- The activation and configuration of a backup server
- Backup and Restore (either iSCSI LUN backup, if used, or data backup, the latter with either a multi-version data volume or “readable” option)
- Shared Folder Sync (the utter Synology anachronism – see a bit further below)
So, eventually, there’s
- 4 Cloud backup apps
- 4 Synology owned backup options (Time Backup, iSCSI LUN backup, data volume backup and “readable” data backup) and
- 3 Synology sync options (Cloud Sync, Cloud Station and Shared Folder Sync)
Not 27, but still enough to struggle hard to find the right one …
So what’s wrong with syncing?
Cloud Station is one of the best private cloud file synchronization solutions I ever experienced; dropbox has a comparable user experience (and is still the service caring least about data privacy). So – anyway, I could just have setup the whole of the two NASs to sync using Cloud Station. Make one station the master and connect all my devices to it and make the other the backup station and connect it to the master, either.
However, the thought of awaiting the initial sync for that amount of data – especially as quite a bit of it was vastly static – let me disregard this option in the first place.
Shared Folder Sync sounded like a convenient idea to try. It’s configuration is pretty straight forward.
1: Enable Backup Services
The destination station needs to have the backup service running; so that is the first thing to go for. Launching the backup service is essentially kicking off an rsync server which can accept rsync requests from any source (this would even enable your disc station to accept workstation backups from pc, notebook, etc., if they’re capable of running rsync).
To configure the backup service, one needs to launch the “Backup and Replication” App and go to “Backup Service”:
NOTE: I do always consider to change the standard ports (22 in this case) to something unfamiliar - for security reasons (see this post: that habit saved me once)!
Other than that, one just enables the service and decides on possible data transfer speed limits (which can even be scheduled). The “Time Backup” tab allows enabling the service for accepting time backups; (update) and third tab makes volume backups possible by just ticking a checkbox. But that’s essentially it.
2: Shared Folder Sync Server
In order to accept sync client linkage, the target disc station needs to have the shared folder sync server enabled, additionally to the backup service. As the screenshot suggests, this is no big deal, really. Mind, though, that it is also here, where you check and release any linked shared folders (a button would appear under server status, where this can be done).
Once “Apply” is hit, the disc station is ready to accept shared folder sync requests.
3: Initiate “Shared Folder Sync”
This is were it gets weird for the first time:
- In the source station, go to the same page as shown above, but stay at the “Client” tab
- Launch the wizzard with a click to “Create”
- It asks for a name
- And next it asks to select the folders to sync
- In this very page it says: “I understand that if the selected destination contains folders with identical names as source folders, the folders at destination will be renamed. If they don’t exist at destination they will be created.” – You can’t proceed without explicitly accepting this by checking the box.
- Next page asks for the server connection (mind: it uses the same port as specified in your destination’s backup service configuration setup previously (see (1) above))
- Finally, a confirmation page allows verification – or, by going back, correction – of settings and when “Apply” is hit, the service commences its work.
Now, what’s it doing?
Shared Folder Sync essentially copies contents of selected shared folders to shared folders on the destination disc station. As mentioned above, it initially needs to explicitly create its link folder on the destination, so don’t create any folders in advance when using this service.
When investigating the destination in-depth, though, things instantly collapse into agony:
- All destination shared folders created by shared folder sync have no user/group rights set except for “read-only” for administrators
- Consequentially, the attempt to create or push a file to any of the destination shared folders goes void
- And altering shared folder permissions on one of these folders results in a disturbing message
“Changing its privilege settings may cause sync errors.” – WTF! Any IT guy knows, that “may” in essence means “will”. So, hands off!
- It did not allow me to create more than two different sync tasks
- I randomly experienced failures being reported during execution which I couldn’t track down to their root cause via the log. It just said “sync failed”.
Eventually, a closer look into Synology’s online documentation reveals: “Shared Folder Sync is a one way sync solution, meaning that the files at the source will be synced to the destination, but not the other way around. If you are looking for a 2-way sync solution, please use Cloud Station.” – Synology! Hey! Something like this isn’t called “synchronization”, that’s a copy!
While writing these lines, I still cannot think of any real advantage of this over
- Cloud Station (2-way sync)
- Data backup (readable 1-way copy)
- Volume backup (non-readable, incremental, 1-way “copy”)
As of the moment, I’ve given up with that piece … (can anyone tell me where I would really use this?)
BUR: Backup & Restore
The essential objective of a successful BUR strategy is to get back to life with sufficiently recent data (RPO – recovery point objective) in sufficiently quick time (RTO – recovery time objective). For the small scale of a private storage solution, Synology already offers quite compelling data security by its RAID implementation. When adding geo redundancy, the backup options in the “Backup & Replication” App would be a logical thing to try …
1: Destination first
As was previously mentioned, the destination station needs to have the backup service running; this also creates a new – administrable, in this case – shared folder “NetBackup” which could (but doesn’t need to) be the target for all backups.
Targets (called “Backup Destination” here), which are to be used for backups, still must be configured at the source station in addition to that. This is done in the “Backup & Replication” App at “Backup Destination”:
Even at this place – besides “Local” (which would e.g. be another volume or some USB attached harddisc) and “Network”- it is still possible to push backups to AWS S3 or other public cloud services by chosing “Public Cloud Backup Destination” (see following screenshots for S3).
NOTE, that the Wizzard even allows for bucket selection in China (pretty useless outside China, but obviously they sell there and do not differentiate anywhere else in the system ;))
As we’re still keen on getting data replicated between two privately owned NASs, let’s now skip that option and go for the Network Backup Destination:
- Firstly, chose and enter the settings for the “Synology server” target station (mind, using the customized SSH port from above – Backup Service Configuration)
- Secondly, decide on which kind of target backup data format to use. The screenshot below is self-explaining: Either go for a multi-version solution or a readable one (there we go!). All backup sets relying on this very destination configuration will produce target backup data according to this very selection.
2: And now: For the backup set
Unsurprinsingly, backup sets are created in the section “Backup” of the “Backup and Replication” App:
- First choice – prior to the wizzard even starting – is either to create a “Data Backup Task” or an iSCSI “LUN Backup Task” (details on iSCSI LUN can be found in the online documentation; however, if your Storage App isn’t mentioning any LUNs used, forget about that option – it obviously wouldn’t have anything to backup)
- Next, chose the backup destination (ideally configured beforehand)
- After that, all shared folders are presented and the ones to be included in the backup can be checkmarked
- In addition, the wizzard allows to include app data into the backup (Surveillance Station is the only example I had running)
- Finally some pretty important detail settings can be done:
- Encryption, compression and/or block-level backup
- Preserve files on destination, even when source is deleted (note the ambiguous wording here!)
- Backup metadata of files as well as adjacent thumbnails (obviously more target storage consumed)
- Enable backup of configurations along with this task
- Schedule the backup task to run regularly
- And last not least: bandwidth limitations! It is highly recommended to consider that carefully. While testing the whole stuff, I ran into serious bandwidth decrease within my local area network as both disc stations where running locally for the tests. So, a running backup task does indeed consume quite a bit of performance!
Once the settings are applied, the task is created and stored in the App – waiting to be triggered by a scheduler event or a click to “Backup Now”
So, what is this one doing?
It shovels data from (a) to (b). Period. When having selected “readable” at the beginning, you can even see folders and files being created or updated step by step in the destination directory. One nice advantage (especially for first-time backups) is, that the execution visibly shows its progress in the App:
Also, when done, it pushes a notification (by eMail, too, if configured) to inform about successful completion (or any failure happened).
Below screenshot eventually shows what folders look like at the destination:
And when a new or updated file appears in the source, the next run would update it on the destination in the same folder (tested and confirmed, whatever others claim)!
So, in essence this method is pretty useable and useful for bringing data across to another location, plus: maintaining it readable there. However, there’s still some disadvantages which I’ll discuss in a moment …
So, what about Cloud Station?
Well, I’ve been using Cloud Station for years now. Without any ado; without any serious fault; with
- around 100.000 pictures
- several 1000 business data files, various sizes, types, formats, …
- a nice collection of MP3 music – around 10.000 files
- and some really large music recording folders (some with uncut raw recordings in WAV format)
Cloud Station works flawlessly under these conditions. For the benefit of Mr. Adam Armstrong of storagereview.com, I’ve skipped a detailed explanation of Cloud Station and will just refer to his – IMHO – very good article!
Why did I look into that, though data backup (explained before) did a pretty good job? Well – one major disadvantage with backup sets in Synology is that even if you chose “readable” as the desired destination format, there is still not really a way of producing destination results which resemble the source in a sufficiently close way, meaning, that with backup tasks, the backed-up data goes into some subdir within the backup destination folder – thereby making permission management on destination data an utter nightmare (no useful permission inheritance from source shared folder, different permissions intended on different sub-sub-folders for the data, etc.).
Cloud Station solves this, but in turn has the disadvantage that initial sync runs are always tremendously tedious and consume loads of transfer resources (though, when using Cloud Station between 2 NASs this disadvantage is more or less reduced to a significantly higher CPU and network usage during the sync process). So, actually we’d be best to go with Cloud Station and just Cloud-sync the two NASs.
BUT: There’s one more thing with this – and any other sync – solution: Files are kept in line on both endpoints, meaning: When a file is deleted on one, its mirror on the other side is deleted, too. This risk can be mitigated by setting up recycle bin function for shared folders and versioning for Cloud Station, but still it’s no real backup solution suitable for full disaster recovery.
What the hell did I do then?
Neither of the options tested was fully perfect for me, so: I took all of them (well: not fully in the end; as said, I can’t get my head around that shared folder sync, so at the moment I am going without it).
Let’s once more have a quick glance at the key capabilities of each of the discussed options:
- Shared Folder Sync is no sync; and it leaves the target essentially unusable. Further: A file deleted in the source would – by the sync process – instantly be deleted in the destination as well.
- Data Backup (if chosen “readable”) just shifts data 1:1 into the destination – into a sub-folder structure; the multi-version volume option would create a backup package. IMHO great to use if you don’t need instant access to data managed equally to the source.
- Cloud Station: Tedious initial sync but after that the perfect way of keeping two folder trees (shared folders plus sub-items) in sync; mind: “in sync” means, that destroying a file destroys it at both locations (can be mitigated to a certain extent by using versioning).
I did it may way:
- Business projects are “Cloud Station” synced from the office NAS (source and master) to the home NAS; all devices using business projects connect to the office NAS folders of that category.
- Media files (photos, videos, MP3 and other music, recordings, …) have been 1:1 replicated to the new NAS by a one-time data backup task. At the moment, Cloud Station is building up its database for these shared folders and will maybe become the final solution for these categories. Master and source is the home NAS (also serving UPnP, of course); the office NAS (for syncing) and all devices, which want to stream media or manage photos, connect to this one.
- Archive shared folders (with rare data change) have been replicated to the new NAS and are not synced at the moment. I may go back to a pure incremental backup solution or even set some of these folders to read-only by permission and just leave them as they are.
Will that be final? Probably not … we’ll see.
Do you have a better plan? Please share … I’m curious!