Recovering a ZFS pool after a catastrophic power failure
what the damned hell? #
This sucked. so… Pine died. with the help of Some posts on freebsd forms1, Mailing lists2, and some proxmox forums posts3 I got stuff mostly kinda functional enough to recover…
zpool import -fFX
" The documentation states the extreme rewind import (import -X) performs a full traversal of all blocks in the pool for verification."
zfsspy #
This freebsd thread4 pointed me at ZfsSpy5 - and its’ code repo6
Enabling debug messages #
The openzfs docs7 pointed me to the zfsdbgmsg_enable
and zfs_recover
8 tuning parameters:
- zfs_dbgmsg_enable
- Internally ZFS keeps a small log to facilitate debugging. The contents of the log are in the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to /proc/spl/kstat/zfs/dbgmsg file clears the log.
as well as some other notes on importing9 and this tool kstat-analyzer10
zfs_recover #
pine# echo 0 > /proc/spl/kstat/zfs/dbgmsg
pine# echo 1 > /sys/module/zfs/parameters/zfs_dbgmsg_enable
pine# echo 1 > /sys/module/zfs/parameters/zfs_recover
pine# zpool import
pool: Wolfspyre id: 16715582643789691940
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
Wolfspyre ONLINE
raidz1-0 ONLINE
c114947f-2aa3-4111-85c8-93807d36e292 ONLINE
451ad62b-72a5-45f0-a6e1-347fda108334 ONLINE
7856ae71-5548-43ba-ae4f-ffd1bf6c9845 ONLINE
2c34ade3-3265-45b1-81b0-3b2ec6131aa9 ONLINE
raidz1-1 ONLINE
8f00fa1f-1a70-4767-98a4-bf59f626e3c1 ONLINE
cf69d566-a2ba-403c-ab85-8019d6498eeb ONLINE
1dc37743-5587-48ed-8636-613a79d5de63 ONLINE
407b271a-45a8-4240-9d93-d4207921cf1b ONLINE
raidz1-2 ONLINE
954508b2-e5ec-4028-994f-022dc014c38d ONLINE
9338c1e3-0be0-4412-b662-0584d81de55f ONLINE
0637d1dc-434e-492e-a09c-66d94d924b4a ONLINE
a2034c98-0311-458b-9084-43864833cd27 ONLINE
cache
nvme3n1p2
nvme2n1p2
nvme1n1p2
nvme0n1p2
logs
mirror-4 ONLINE
nvme0n1p1 ONLINE
nvme1n1p1 ONLINE
mirror-5 ONLINE
nvme2n1p1 ONLINE
nvme3n1p1 ONLINE
pine# tail -f /proc/spl/kstat/zfs/dbgmsg
1636590873 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1636590873 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1636590873 spa.c:6084:spa_import(): spa_import: importing Wolfspyre
1636590873 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): LOADING
1636590873 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa Wolfspyre. txg 10512767
1636590873 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config untrusted): using uberblock with txg=10512767
1636590873 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
tail: /proc/spl/kstat/zfs/dbgmsg: file truncated
timestamp message
1636590392 metaslab.c:3648:metaslab_condense(): condensing: txg 2019513, msp[38] ffff890e626cc800, vdev id 0, spa boot-pool, smp size 66032, segments 890, forcing condense=FALSE
1636590832 spa.c:6232:spa_tryimport(): spa_tryimport: importing Wolfspyre
1636590832 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING
1636590832 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa $import. txg 10512767
1636590832 spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=10512767
1636590832 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590832 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590832 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590832 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590838 spa.c:8337:spa_async_request(): spa=$import async request task=2048
1636590838 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1636590838 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1636590867 spa.c:6232:spa_tryimport(): spa_tryimport: importing Wolfspyre
1636590867 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING
1636590867 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa $import. txg 10512767
1636590867 spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=10512767
1636590867 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590867 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590867 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590867 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590873 spa.c:8337:spa_async_request(): spa=$import async request task=2048
1636590873 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1636590873 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1636590873 spa.c:6084:spa_import(): spa_import: importing Wolfspyre
1636590873 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): LOADING
1636590873 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa Wolfspyre. txg 10512767
1636590873 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config untrusted): using uberblock with txg=10512767
1636590873 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096VERIFY3(msp->ms_allocated_space >= range_tree_space(msp->ms_freeing)) failed
VERIFY3(msp->ms_allocated_space >= range_tree_space(msp->ms_freeing)) failed (135168 >= 761856)
1636591243 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): read 5648 log space maps (5648 total blocks - blksz = 131072 bytes) in 57270 ms
1636591243 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa Wolfspyre, vdev_id 5, ms_id 1, smp_length 7360, unflushed_allocs 262144, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5084731 ms, loading_time 64 ms, ms_max_size 304893952, max size error 304893952, old_weight 700000000000001, new_weight 700000000000001
1636591243 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa Wolfspyre, vdev_id 5, ms_id 28, smp_length 3184, unflushed_allocs 135168, unflushed_frees 131072, freed 0, defer 0 + 0, unloaded time 5084734 ms, loading_time 62 ms, ms_max_size 534667264, max size error 534536192, old_weight 700000000000001, new_weight 700000000000001
1636591243 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa Wolfspyre, vdev_id 4, ms_id 0, smp_length 11952, unflushed_allocs 430080, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5084796 ms, loading_time 95 ms, ms_max_size 354832384, max size error 354832384, old_weight 700000000000001, new_weight 700000000000001
1636591244 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_verify found 1 metadata errors and 1 data errors
1636591244 spa_misc.c:403:spa_load_failed(): spa_load(Wolfspyre, config trusted): FAILED: spa_load_verify failed [error=5]
1636591244 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): UNLOADING
1636591244 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_retry: rewind, max txg: 10512766
1636591244 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): LOADING
1636591244 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/2c34ade3-3265-45b1-81b0-3b2ec6131aa9': best uberblock found for spa Wolfspyre. txg 10512766
1636591244 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config untrusted): using uberblock with txg=10512766
1636591245 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591245 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591245 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591245 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591303 metaslab.c:3648:metaslab_condense(): condensing: txg 2019691, msp[19] ffff8915e2237000, vdev id 0, spa boot-pool, smp size 65560, segments 444, forcing condense=FALSE
1636591307 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): read 5648 log space maps (5648 total blocks - blksz = 131072 bytes) in 57280 ms
1636591308 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_verify found 1 metadata errors and 1 data errors
1636591308 spa_misc.c:403:spa_load_failed(): spa_load(Wolfspyre, config trusted): FAILED: spa_load_verify failed [error=5]
1636591308 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): UNLOADING
1636591308 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_retry: rewind, max txg: 10512765
1636591308 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): LOADING
1636591309 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa Wolfspyre. txg 10512765
1636591309 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config untrusted): using uberblock with txg=10512765
1636591309 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591309 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591309 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591309 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591371 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): read 5648 log space maps (5648 total blocks - blksz = 131072 bytes) in 56806 ms
1636591372 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_verify found 1 metadata errors and 1 data errors
1636591372 spa_misc.c:403:spa_load_failed(): spa_load(Wolfspyre, config trusted): FAILED: spa_load_verify failed [error=5]
1636591372 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): UNLOADING
1636591372 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_retry: rewind, max txg: 10512764
1636591372 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): LOADING
1636591372 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa Wolfspyre. txg 10512764
1636591372 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config untrusted): using uberblock with txg=10512764
1636591373 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591373 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591373 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591373 spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591434 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): read 5648 log space maps (5648 total blocks - blksz = 131072 bytes) in 56899 ms
1636591437 spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_verify found 0 metadata errors and 5 data errors
1636591437 mmp.c:240:mmp_thread_start(): MMP thread started pool 'Wolfspyre' gethrtime 5278287819390
1636591437 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 2, ms_id 15, smp_length 275960, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5278288 ms, loading_time 59 ms, ms_max_size 2154479616, max size error 2154479616, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 0, ms_id 546, smp_length 356648, unflushed_allocs 296476672, unflushed_frees 74809344, freed 0, defer 0 + 0, unloaded time 5278288 ms, loading_time 60 ms, ms_max_size 2153127936, max size error 2138218496, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 0, ms_id 713, smp_length 221920, unflushed_allocs 340606976, unflushed_frees 130342912, freed 0, defer 0 + 0, unloaded time 5278349 ms, loading_time 72 ms, ms_max_size 2896936960, max size error 2883379200, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 325, smp_length 504960, unflushed_allocs 550789120, unflushed_frees 54206464, freed 0, defer 0 + 0, unloaded time 5278288 ms, loading_time 139 ms, ms_max_size 2290089984, max size error 2277597184, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 468, smp_length 281000, unflushed_allocs 428285952, unflushed_frees 171515904, freed 0, defer 0 + 0, unloaded time 5278427 ms, loading_time 85 ms, ms_max_size 2262106112, max size error 2258239488, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 2, ms_id 141, smp_length 507560, unflushed_allocs 888864768, unflushed_frees 201637888, freed 0, defer 0 + 0, unloaded time 5278427 ms, loading_time 97 ms, ms_max_size 2688352256, max size error 2655731712, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 2, ms_id 173, smp_length 162656, unflushed_allocs 312688640, unflushed_frees 91111424, freed 0, defer 0 + 0, unloaded time 5278525 ms, loading_time 45 ms, ms_max_size 2758590464, max size error 2750849024, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 540, smp_length 345384, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5278512 ms, loading_time 68 ms, ms_max_size 2333810688, max size error 2333810688, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 590, smp_length 61000, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5280581 ms, loading_time 42 ms, ms_max_size 2179080192, max size error 2179080192, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 0, ms_id 727, smp_length 55928, unflushed_allocs 129941504, unflushed_frees 132620288, freed 0, defer 0 + 0, unloaded time 5280581 ms, loading_time 44 ms, ms_max_size 3126763520, max size error 3096371200, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 2, ms_id 278, smp_length 38424, unflushed_allocs 542277632, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5280581 ms, loading_time 59 ms, ms_max_size 2413092864, max size error 2413092864, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 0, ms_id 783, smp_length 166968, unflushed_allocs 525230080, unflushed_frees 46481408, freed 0, defer 0 + 0, unloaded time 5280626 ms, loading_time 50 ms, ms_max_size 2185601024, max size error 2182414336, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 642, smp_length 87504, unflushed_allocs 1131610112, unflushed_frees 260440064, freed 0, defer 0 + 0, unloaded time 5280678 ms, loading_time 41 ms, ms_max_size 2319802368, max size error 2296479744, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439 metaslab.c:3896:metaslab_flush(): flushing: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 439, unflushed_allocs 0, unflushed_frees 0, appended 0 bytes
1636591569 metaslab.c:3648:metaslab_condense(): condensing: txg 2019743, msp[51] ffff890d81a0c800, vdev id 0, spa boot-pool, smp size 65920, segments 404, forcing condense=FALSE
1636591575 metaslab.c:3648:metaslab_condense(): condensing: txg 2019744, msp[18] ffff8915e2232800, vdev id 0, spa boot-pool, smp size 65904, segments 1219, forcing condense=FALSE
pine#
pine# echo 0 > /sys/module/zfs/parameters/spa_load_verify_data
pine# echo 0 > /sys/module/zfs/parameters/spa_load_verify_metadata
pine# echo 1 > /sys/module/zfs/parameters/spa_load_verify_data
pine# echo 1 > /sys/module/zfs/parameters/spa_load_verify_metadata
pine# zpool import Wolfspyre
pine# echo 1 > /sys/module/zfs/parameters/zfs_recover
pine# echo 1 > /sys/module/zfs/parameters/zfs_dbgmsg_enable
pine# tail -f /proc/spl/kstat/zfs/dbgmsg
OMG I HAVE DATA! COPY IT OFF FUCKING NOW!!!!—-
-
https://forums.freebsd.org/threads/zfs-problem-need-help-recovering.71242/ ↩︎
-
https://groups.google.com/g/mailing.freebsd.hackers/c/QzQA32BmaqQ/m/AoLEZrfkElMJ ↩︎
-
https://forum.proxmox.com/threads/zfs-pool-failure-after-power-outage.90400/ ↩︎
-
https://forums.freebsd.org/threads/introduce-new-zfs-exploration-and-data-recovery-tool-zfs-spy.67522/#post-400824 ↩︎
-
https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html ↩︎
-
https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html#zfs-recover ↩︎
-
https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html#import ↩︎
-
https://raw.githubusercontent.com/richardelling/zfs-linux-tools/master/kstat-analyzer ↩︎