Hard drives do not die suddenly — they announce their failure in advance. The problem: without active monitoring, you only notice the warning signs when it is too late. ZFS scrub and SMART monitoring are the two tools that make silent data corruption and impending hardware failures visible before production data is affected.
This article shows how to properly configure both mechanisms on TrueNAS, interpret their output, and combine them into a proactive disk replacement strategy.
What Is Bit Rot and Why Does It Matter?
Bit rot refers to the gradual corruption of stored data on hard drives or SSDs — without error messages, without warnings. Causes include magnetic degradation, cosmic radiation, or firmware bugs. The result: a single flipped bit can render a file unusable, a database backup unreadable, or a VM image corrupt.
Conventional filesystems like ext4 or NTFS do not detect bit rot. The data sits on the disk, the filesystem reports “all good” — and during the next restore, you discover that your backup has been defective for months.
ZFS Checksumming: Every Block Is Verified
ZFS solves this problem at a fundamental level. Every data block receives a SHA-256 checksum stored in the metadata tree — separate from the actual data. When reading a block, ZFS compares the stored checksum against the calculated one. If they do not match, an error is detected.
In a redundant pool (mirror or RAIDZ), ZFS can automatically repair the corrupted block from an intact copy — completely transparent to the user. This is self-healing built into the filesystem layer.
What ZFS Scrub Does
A scrub is the systematic verification of every data block in the pool. ZFS reads each block, compares the checksum, and automatically repairs errors from redundancy copies.
The critical difference from normal operation: during daily use, ZFS only verifies blocks that are actually read. Blocks that remain untouched for months stay unchecked. A scrub ensures that even those blocks are intact.
Setting Up Scrub in TrueNAS
TrueNAS creates a monthly scrub task by default. For production environments, we recommend a shorter interval:
Data Protection > Scrub Tasks > Add
Pool: tank
Threshold: 14 (days)
Schedule: Every Sunday, 02:00 AM
Alternatively via cron on the TrueNAS shell:
# Scrub every 2 weeks, Sunday at 02:00 AM
echo "0 2 * * 0 root zpool scrub tank" >> /etc/cron.d/zfs-scrub
Interpreting Scrub Results
After a scrub completes, check the status with zpool status:
zpool status tank
A healthy pool shows:
scan: scrub repaired 0B in 04:32:15 with 0 errors on Sun Mar 22 06:32:15 2026
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
The critical columns are READ, WRITE, and CKSUM. Any value greater than 0 requires attention:
| Column | Meaning | Action |
|---|---|---|
| READ | Read errors on the device | Check disk SMART, replace if recurring |
| WRITE | Write errors on the device | Investigate immediately — possible controller or cable defect |
| CKSUM | Checksum errors (bit rot) | ZFS repaired the data, but the root cause must be found |
SMART Monitoring: Watching the Hardware
While ZFS secures data integrity at the logical level, SMART (Self-Monitoring, Analysis and Reporting Technology) monitors the physical condition of hard drives. SMART values reveal mechanical wear, defective sectors, and temperature issues — often weeks before a drive fails completely.
Critical SMART Attributes
| Attribute | ID | Meaning | Threshold |
|---|---|---|---|
| Reallocated_Sector_Ct | 5 | Replaced defective sectors | > 0 monitor, > 10 critical |
| Current_Pending_Sector | 197 | Unstable sectors awaiting reallocation | > 0 investigate immediately |
| Offline_Uncorrectable | 198 | Uncorrectable sectors | > 0 plan disk replacement |
| UDMA_CRC_Error_Count | 199 | Transfer errors (cable/controller) | > 0 check cables |
| Temperature_Celsius | 194 | Operating temperature | > 45 C improve cooling, > 55 C critical |
| Power_On_Hours | 9 | Total operating hours | Context for wear assessment |
Setting Up SMART Tests in TrueNAS
TrueNAS offers two test types:
Data Protection > S.M.A.R.T. Tests > Add
Type: SHORT (15-30 minutes, weekly)
Disks: All Disks
Schedule: Every Monday, 03:00 AM
Data Protection > S.M.A.R.T. Tests > Add
Type: LONG (2-8 hours, monthly)
Disks: All Disks
Schedule: First Saturday of the month, 01:00 AM
Short tests verify basic functionality and read the error log. Long tests scan the entire disk surface and find errors that short tests miss.
smartctl on the Command Line
Get detailed SMART information directly via CLI:
# Retrieve full SMART status
smartctl -a /dev/da0
# Show only critical attributes
smartctl -A /dev/da0 | grep -E "Reallocated|Pending|Uncorrectable|CRC|Temperature"
# Start a long test manually
smartctl -t long /dev/da0
# Retrieve test results
smartctl -l selftest /dev/da0
Alerting on SMART Failures
TrueNAS sends email alerts on SMART warnings by default. Make sure the alert configuration is active:
System > Alert Settings > Email
Recipient: admin@example.com
SMART: Warning + Critical
Combining Scrub and SMART: Proactive Disk Replacement
The real strength lies in combining both mechanisms. ZFS scrub detects logical errors (bit rot, checksum failures), SMART detects physical degradation (defective sectors, mechanical wear). Together, they form an early warning system.
When to Replace a Disk
| Situation | Urgency | Action |
|---|---|---|
| CKSUM errors in scrub, SMART OK | Medium | Wait for next scrub, replace if recurring |
| Reallocated_Sector_Ct rising | High | Order replacement disk, swap within 1-2 weeks |
| Current_Pending_Sector > 0 | High | Monitor disk closely, ensure resilver capacity |
| CKSUM errors + rising SMART values | Critical | Replace immediately — drive will fail soon |
| Offline_Uncorrectable > 0 | Critical | Replace immediately — data loss risk with further degradation |
| SMART self-test failed | Critical | Replace immediately |
Rule of thumb: A single checksum error in a scrub is worth monitoring. Rising SMART values combined with scrub errors are a clear signal for timely disk replacement.
Monitoring with DATAZONE Control
In a production TrueNAS environment, manual checks are not sufficient. With DATAZONE Control, we monitor scrub and SMART status automatically around the clock:
- Scrub monitoring: Last scrub time, duration, error count, overdue scrubs
- SMART trends: Historical development of critical attributes over weeks and months
- Threshold alerts: Automatic notification when reallocated sectors or checksum errors increase
- Disk lifecycle tracking: Operating hours and wear trends for predictive replacement planning
- Pool health: Overall status of all ZFS pools at a glance
Through trend analysis, we detect degradation not when the drive fails, but weeks in advance — and replace disks proactively before a rebuild under load becomes necessary.
Conclusion
Data integrity does not happen by itself. ZFS scrubs find silent data errors that no other filesystem would detect. SMART monitoring reveals physical wear before it leads to failure. Both mechanisms together form the foundation for a proactive storage strategy that prevents data loss rather than reacting to it.
The effort to set this up is minimal — the protection you gain is substantial.
Want to secure your TrueNAS environment with professional scrub and SMART monitoring? Contact us — we set up proactive disk health monitoring and ensure that disk failures never catch you off guard again.
More on these topics:
More articles
Backup Strategy for SMBs: Proxmox PBS + TrueNAS as a Reliable Backup Solution
Backup strategy for SMBs with Proxmox PBS and TrueNAS: implement the 3-2-1 rule, PBS as primary backup target, TrueNAS replication as offsite copy, retention policies, and automated restore tests.
Proxmox Notification System: Matchers, Targets, SMTP, Gotify, and Webhooks
Configure the Proxmox notification system from PVE 8.1: matchers and targets, SMTP setup, Gotify integration, webhook targets, notification filters, and sendmail vs. new API.
TrueNAS with MCP: AI-Powered NAS Management via Natural Language
Connect TrueNAS with MCP (Model Context Protocol): AI assistants for NAS management, status queries, snapshot creation via chat, security considerations, and future outlook.