Remote Support Start download

TrueNAS ZFS Replication: Offsite Disaster Recovery Between Sites

TrueNASZFSBackupDisaster Recovery
TrueNAS ZFS Replication: Offsite Disaster Recovery Between Sites

A backup is only as good as the last successful restore. And a restore only works if the data is not sitting in the same building that is currently on fire, flooded, or encrypted by ransomware. ZFS replication with TrueNAS is the technically superior approach for offsite disaster recovery: atomic, consistent, incremental — and without additional backup software.

This article explains how ZFS replication works internally, how to configure it in TrueNAS, and how to build a solid DR concept with measurable RPO and RTO targets.

Why ZFS Replication Is the Gold Standard for DR

Other backup methods copy files. ZFS replicates states. A ZFS snapshot is a consistent, atomic image of a dataset at a precise point in time — all write operations that had not yet completed at that moment are excluded. That is the core of the difference:

  • Rsync copies files incrementally, but without transaction guarantees. A database actively writing during an rsync run ends up in an inconsistent state in the backup.
  • Cloud Sync transfers objects or files to a cloud bucket — good for archiving, but without native snapshot semantics.
  • Veeam and similar agent-based solutions are powerful but expensive, proprietary, and require agents on the protected systems.
  • ZFS send/receive operates at the block level: it transfers only the blocks that changed since the last shared snapshot — fast, consistent, no agents required.

A further advantage: the replica on the target system is a fully usable ZFS dataset. No unpacking, no conversion — in an emergency, you simply import it and it is immediately operational.

How ZFS send/receive Works

The first replication run transfers a full snapshot. All subsequent runs use incremental snapshots:

# Full initial transfer (source → destination)
zfs send tank/data@2026-03-29-00:00 | ssh backup@dr-site.example.com zfs receive backup/data

# Incremental transfer: only changes since the last snapshot
zfs send -i tank/data@2026-03-28-00:00 tank/data@2026-03-29-00:00 \
  | ssh backup@dr-site.example.com zfs receive backup/data

The -i flag specifies the base snapshot. ZFS then calculates only the changed blocks between the two snapshots — on a 10 TB dataset with 50 GB of daily changes, the incremental run transfers only those 50 GB, not 10 TB.

For production environments, the -R flag is recommended, as it recursively replicates all child datasets and snapshots:

zfs send -Ri tank/data@2026-03-28-00:00 tank/data@2026-03-29-00:00 \
  | ssh backup@dr-site.example.com zfs receive -F backup/data

Setting Up Replication in the TrueNAS GUI

TrueNAS wraps zfs send/receive in a convenient GUI. The setup is done under Data Protection > Replication Tasks.

Step 1: Define Source and Destination

Data Protection > Replication Tasks > Add
  Source Location:      On this System
  Source Dataset:       tank/data
  Recursive:            Yes (recommended, replicates child datasets)
  Destination Location: On a Different System
  SSH Connection:       dr-truenas (create first under System > SSH Connections)
  Destination Dataset:  backup/data

Step 2: Snapshot Strategy

Replication without a snapshot schedule is meaningless. Under Data Protection > Periodic Snapshot Tasks, define how frequently snapshots are created:

Dataset:    tank/data
Recursive:  Yes
Schedule:   Hourly (recommended for RPO < 1 hour)
Keep:       24 hourly + 7 daily + 4 weekly snapshots

The replication task references these snapshots and transfers only the new ones each time.

Step 3: Schedule and Options

Schedule:               Hourly (after the snapshot task)
Replication from Scratch: No (only needed on the first run)
Encryption:             Yes (uses SSH transport encryption)
Limit (Bandwidth):      50 MiB/s (optional, prevents WAN saturation)

Step 4: SSH Connection to the Target TrueNAS

Under System > SSH Connections > Add, configure the connection to the DR system:

Name:               dr-truenas
Setup Method:       Semi-automatic (recommended between TrueNAS systems)
TrueNAS URL:        https://10.20.30.1
Username:           replication
Password:           [needed once for key exchange only]

TrueNAS automatically exchanges SSH keys during setup. After the initial configuration, no password authentication is required — the replication task runs fully automatically using key-based authentication.

The first full replication run of a large dataset can saturate a WAN link for days. TrueNAS provides two mechanisms to control this:

Option 1: Limit in the Replication Task — directly in the GUI under Limit (bytes/second). The value is specified in bytes: 50 MiB/s = 52428800.

Option 2: pv (Pipe Viewer) on the CLI for fine-grained control during manual transfers:

zfs send -Ri tank/data@snap-old tank/data@snap-new \
  | pv --rate-limit 52428800 \
  | ssh backup@dr-site.example.com zfs receive backup/data

For the initial seed transfer of a very large dataset, a different approach is advisable: prepare the destination dataset locally on an external drive, physically transport it to the DR site, and import it there. From that point on, regular incremental replication over the WAN takes over — transferring only the changes since the seed.

Replication to a VPS or Cloud Instance

Organizations without a second TrueNAS site can set up ZFS replication to a VPS running ZFS (e.g., Ubuntu 22.04 + OpenZFS):

# On the VPS: create a ZFS pool on a block volume
zpool create backup /dev/vdb

# The VPS then receives replicas via SSH:
# (Configure the VPS as an SSH Connection in TrueNAS)

For purely cloud-native storage (S3, Backblaze B2), rclone in combination with ZFS snapshots replaces the direct zfs receive approach:

# Export snapshot as a stream, compress, and upload to S3
zfs send -Ri tank/data@snap-old tank/data@snap-new \
  | gzip \
  | rclone rcat s3:my-backup-bucket/data-$(date +%F).zfs.gz

This approach is more cost-effective than a VPS with large block storage, but sacrifices the immediate importability of the replica — for a restore, the stream must be retrieved and imported using zfs receive.

RPO and RTO: What Actually Matters

RPO (Recovery Point Objective) defines the maximum acceptable data loss in a disaster scenario — measured in time. With hourly replication, the RPO is theoretically one hour: in the worst case, the last 59 minutes before the outage are lost.

RTO (Recovery Time Objective) defines how quickly the system is operational again after a failure. With ZFS replication, the RTO is very low because no restore process is required — the replica is imported directly and made available.

Replication intervalRPO (worst case)Typical use
Hourly60 minutesProduction data, databases
Every 4 hours4 hoursSecondary systems
Daily24 hoursArchive data, development environments
Continuous (sync)Near zeroCritical financial or medical data

Continuous replication (synchronous ZFS mirroring across two sites) is technically possible, but requires a network connection with very low and stable latency between the sites.

Testing Failover: Importing a Replicated Dataset

A DR plan that has never been tested is not a DR plan. Importing a replicated ZFS dataset is intentionally straightforward:

# Make the dataset on the DR system writable (remove read-only flag)
zfs set readonly=off backup/data

# Alternatively: clone from a snapshot for a non-destructive test
zfs clone backup/data@2026-03-29-00:00 restore/data-test

# Enable NFS or SMB share (via TrueNAS GUI on the DR system)

Recommendation: perform the failover test at least quarterly — not just as a technical check, but as a complete process walkthrough: Who decides to trigger the failover? Who activates the shares? Who notifies users?

Comparison: DR Methods at a Glance

MethodConsistencyRPORTOComplexityCost
ZFS ReplicationVery high (atomic)Minutes to hoursVery lowMediumLow
Cloud Sync (rclone/S3)MediumHoursMedium (restore needed)LowVariable
RsyncLow (no snapshot)HoursMediumLowVery low
Veeam / Agent-basedVery highMinutes to hoursLowHighHigh

ZFS replication combines the consistency of an agent-based solution with the low cost of rsync — at a significantly better RTO than cloud sync.

Monitoring with DATAZONE Control

Replication that silently fails is worse than no replication — because it creates a false sense of security. DATAZONE Control continuously monitors all TrueNAS replication tasks:

  • Task status: Last successful run, error messages, transfer duration
  • Dataset freshness: Comparison of the latest snapshot on source and destination — a deviation exceeding the RPO threshold triggers an alert
  • Network throughput: Detection of bandwidth bottlenecks that push replication windows outside acceptable limits
  • Snapshot consumption: Detection of accumulated, uncleaned snapshots that fill the pool

On every replication failure, the responsible team receives an immediate notification — well before the next DR window is missed.

Conclusion

ZFS replication with TrueNAS is the technically cleanest method for offsite disaster recovery in SMB environments: no proprietary agents, no complex backup formats, no manual restore process. The replica is always a fully usable dataset — import, share, operational.

The key is consistency: regular snapshots, tested failover, and continuous monitoring. Replication that is neither monitored nor tested is nothing more than an illusion of safety.


Looking to set up ZFS replication for your TrueNAS environment and put your disaster recovery concept on a solid technical foundation? Contact us — we plan RPO, RTO, and replication architecture to fit your infrastructure.

Need IT consulting?

Contact us for a no-obligation consultation on Proxmox, OPNsense, TrueNAS and more.

Get in touch