Should TrueNAS use hardware RAID, or should ZFS manage the disks directly?

ZFS needs direct visibility of disks, SMART data, and error states. An HBA or JBOD mode is usually preferred, followed by vdev design based on performance, capacity, and rebuild windows.

Conclusion and scopeThis guide applies to enterprise environments dealing with “Should TrueNAS use hardware RAID, or should ZFS manage the disks directly?”. Confirm scope and reproducibility first, then work from low-risk checks to controlled changes. Do not make broad production changes without a backup, rollback point, and pilot system.

1. Conclusion and scope

Prepare the client and server versions, domain membership, DNS and gateway settings, network location, full error text, event timestamps, and recent changes. The reserved example domain corp.example is used throughout; no customer domain, IP address, account, or device identifier is included.

This issue falls under Backup, NAS and business continuity. Logs and configuration can often be collected remotely first. Bulk permission changes, switch-path work, production cutovers, and recovery drills should use a controlled implementation window.

2. Symptoms and environment

  • Capture the complete error text, event-log timestamp, and failed action rather than relying on a verbal description.
  • Record the affected scope, first occurrence, reproducibility, and whether the result changes on another subnet.
  • A successful backup job only means the job completed without a reported error; it does not prove restore-point integrity, application consistency, repository health, or bootability.

3. Troubleshooting sequence

  1. With TrueNAS and ZFS, the operating system should normally see each disk directly so SMART and error information remain available rather than hidden behind hardware RAID.
  2. ZFS should see individual disks and their real error state. Prefer an HBA or JBOD mode rather than hiding redundancy behind hardware RAID virtual disks.
  3. Choose mirror or RAIDZ vdevs from capacity, IOPS, rebuild window, and fault-tolerance requirements; pool topology cannot be changed as freely as conventional RAID.
  4. Before production, record SMART baselines, serial numbers, slot mapping, and replacement procedure so an alert identifies the physical drive, not merely a device name.
  5. Snapshots depend on the original storage and are suitable for short-term rollback; independent backups must cross devices or failure domains and be recovery-tested.
  6. Change one variable at a time and export the current configuration before making changes.
Read-only check examples
zpool status
zpool list
smartctl -a /dev/sdX

Replace server names, domains, and paths with values verified for your environment. Do not copy real IP addresses, domains, or accounts from an unrelated environment.

4. Safe remediation and rollout

Start with read-only queries, configuration exports, and one-system validation. Once the root cause is confirmed, define the target scope, change window, and rollback method. Include recovery testing in monthly or quarterly operations, rotating full-machine, file, database, and critical-application tests while recording recovery time.

  • Before production, record SMART baselines, serial numbers, slot mapping, and replacement procedure so an alert identifies the physical drive, not merely a device name.
  • Snapshots depend on the original storage and are suitable for short-term rollback; independent backups must cross devices or failure domains and be recovery-tested.
  • Change one variable at a time and export the current configuration before making changes.
Remote troubleshooting or on-site work?A single endpoint or a small group of systems can usually be assessed remotely when configuration and logs are available. Switch links, cabling, multi-subnet changes, production cutovers, and recovery drills are better handled in a controlled on-site window. On-site service is available in Zhejiang, Shanghai, and Jiangsu; other regions can be supported remotely.

5. Validation, rollback and common mistakes

Do not stop when the service works once. Revalidate with the user workflow, logs, a restart or fresh sign-in, another network location where relevant, and the next policy or backup cycle.

Validation and rollback checks

  • Change one variable at a time and export the current configuration before making changes.
  • Test full-machine, file, database, and application recovery separately and record RTO, RPO, credentials, network isolation, and acceptance results.
  • Check repository capacity, file-system health, integrity checks, retention chains, synthetic operations, and immutable or offline copies.

Common mistakes to avoid

  • Treating a successful job or an existing snapshot as proof of recoverability.
  • Running recovery tests on the production network and causing identity conflicts.
  • Keeping every copy on the same appliance without an independent or offline copy.
PreviousCan duplicate hostnames across virtual desktops cause domain trust, DNS, Group Policy, and sign-in problems?NextIf ransomware encrypts a shared file server, how can backups be protected from deletion or encryption at the same time?

Need an assessment based on your actual environment?

Send the exact error, screenshots, operating system and application versions, a high-level network diagram, the affected scope, and the steps already attempted. We will first determine whether the issue is suitable for remote troubleshooting or requires an on-site change window, then confirm scope and pricing.