Skip to main content
  1. 2021 /
  2. November /

Recovering a ZFS pool after a catastrophic power failure

what the damned hell? #

This sucked. so… Pine died. with the help of Some posts on freebsd forms1, Mailing lists2, and some proxmox forums posts3 I got stuff mostly kinda functional enough to recover…

zpool import
zpool import -fFX

" The documentation states the extreme rewind import (import -X) performs a full traversal of all blocks in the pool for verification."

zfsspy #

This freebsd thread4 pointed me at ZfsSpy5 - and its’ code repo6

Enabling debug messages #

The openzfs docs7 pointed me to the zfsdbgmsg_enable and zfs_recover8 tuning parameters:

zfs_dbgmsg_enable
Internally ZFS keeps a small log to facilitate debugging. The contents of the log are in the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to /proc/spl/kstat/zfs/dbgmsg file clears the log.

as well as some other notes on importing9 and this tool kstat-analyzer10

zfs_recover #

go go gadget zfs_recover
pine# echo 0 > /proc/spl/kstat/zfs/dbgmsg
pine# echo 1 > /sys/module/zfs/parameters/zfs_dbgmsg_enable
pine# echo 1 > /sys/module/zfs/parameters/zfs_recover

ouch
pine# zpool import
   pool: Wolfspyre     id: 16715582643789691940
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	Wolfspyre                                 ONLINE
	  raidz1-0                                ONLINE
	    c114947f-2aa3-4111-85c8-93807d36e292  ONLINE
	    451ad62b-72a5-45f0-a6e1-347fda108334  ONLINE
	    7856ae71-5548-43ba-ae4f-ffd1bf6c9845  ONLINE
	    2c34ade3-3265-45b1-81b0-3b2ec6131aa9  ONLINE
	  raidz1-1                                ONLINE
	    8f00fa1f-1a70-4767-98a4-bf59f626e3c1  ONLINE
	    cf69d566-a2ba-403c-ab85-8019d6498eeb  ONLINE
	    1dc37743-5587-48ed-8636-613a79d5de63  ONLINE
	    407b271a-45a8-4240-9d93-d4207921cf1b  ONLINE
	  raidz1-2                                ONLINE
	    954508b2-e5ec-4028-994f-022dc014c38d  ONLINE
	    9338c1e3-0be0-4412-b662-0584d81de55f  ONLINE
	    0637d1dc-434e-492e-a09c-66d94d924b4a  ONLINE
	    a2034c98-0311-458b-9084-43864833cd27  ONLINE
	cache
	  nvme3n1p2
	  nvme2n1p2
	  nvme1n1p2
	  nvme0n1p2
	logs
	  mirror-4                                ONLINE
	    nvme0n1p1                             ONLINE
	    nvme1n1p1                             ONLINE
	  mirror-5                                ONLINE
	    nvme2n1p1                             ONLINE
	    nvme3n1p1                             ONLINE
fuck fuck fuck
pine# tail -f /proc/spl/kstat/zfs/dbgmsg
1636590873   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1636590873   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1636590873   spa.c:6084:spa_import(): spa_import: importing Wolfspyre
1636590873   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): LOADING
1636590873   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa Wolfspyre. txg 10512767
1636590873   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config untrusted): using uberblock with txg=10512767
1636590873   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
tail: /proc/spl/kstat/zfs/dbgmsg: file truncated
timestamp    message
1636590392   metaslab.c:3648:metaslab_condense(): condensing: txg 2019513, msp[38] ffff890e626cc800, vdev id 0, spa boot-pool, smp size 66032, segments 890, forcing condense=FALSE
1636590832   spa.c:6232:spa_tryimport(): spa_tryimport: importing Wolfspyre
1636590832   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING
1636590832   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa $import. txg 10512767
1636590832   spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=10512767
1636590832   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590832   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590832   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590832   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590838   spa.c:8337:spa_async_request(): spa=$import async request task=2048
1636590838   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1636590838   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1636590867   spa.c:6232:spa_tryimport(): spa_tryimport: importing Wolfspyre
1636590867   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING
1636590867   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa $import. txg 10512767
1636590867   spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=10512767
1636590867   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590867   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590867   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590867   spa.c:8337:spa_async_request(): spa=$import async request task=4096
1636590873   spa.c:8337:spa_async_request(): spa=$import async request task=2048
1636590873   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1636590873   spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1636590873   spa.c:6084:spa_import(): spa_import: importing Wolfspyre
1636590873   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): LOADING
1636590873   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa Wolfspyre. txg 10512767
1636590873   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config untrusted): using uberblock with txg=10512767
1636590873   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636590873   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096VERIFY3(msp->ms_allocated_space >= range_tree_space(msp->ms_freeing)) failed

VERIFY3(msp->ms_allocated_space >= range_tree_space(msp->ms_freeing)) failed (135168 >= 761856)
1636591243   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): read 5648 log space maps (5648 total blocks - blksz = 131072 bytes) in 57270 ms
1636591243   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa Wolfspyre, vdev_id 5, ms_id 1, smp_length 7360, unflushed_allocs 262144, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5084731 ms, loading_time 64 ms, ms_max_size 304893952, max size error 304893952, old_weight 700000000000001, new_weight 700000000000001
1636591243   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa Wolfspyre, vdev_id 5, ms_id 28, smp_length 3184, unflushed_allocs 135168, unflushed_frees 131072, freed 0, defer 0 + 0, unloaded time 5084734 ms, loading_time 62 ms, ms_max_size 534667264, max size error 534536192, old_weight 700000000000001, new_weight 700000000000001
1636591243   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa Wolfspyre, vdev_id 4, ms_id 0, smp_length 11952, unflushed_allocs 430080, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5084796 ms, loading_time 95 ms, ms_max_size 354832384, max size error 354832384, old_weight 700000000000001, new_weight 700000000000001
1636591244   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_verify found 1 metadata errors and 1 data errors
1636591244   spa_misc.c:403:spa_load_failed(): spa_load(Wolfspyre, config trusted): FAILED: spa_load_verify failed [error=5]
1636591244   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): UNLOADING
1636591244   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_retry: rewind, max txg: 10512766
1636591244   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): LOADING
1636591244   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/2c34ade3-3265-45b1-81b0-3b2ec6131aa9': best uberblock found for spa Wolfspyre. txg 10512766
1636591244   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config untrusted): using uberblock with txg=10512766
1636591245   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591245   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591245   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591245   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591303   metaslab.c:3648:metaslab_condense(): condensing: txg 2019691, msp[19] ffff8915e2237000, vdev id 0, spa boot-pool, smp size 65560, segments 444, forcing condense=FALSE
1636591307   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): read 5648 log space maps (5648 total blocks - blksz = 131072 bytes) in 57280 ms
1636591308   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_verify found 1 metadata errors and 1 data errors
1636591308   spa_misc.c:403:spa_load_failed(): spa_load(Wolfspyre, config trusted): FAILED: spa_load_verify failed [error=5]
1636591308   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): UNLOADING
1636591308   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_retry: rewind, max txg: 10512765
1636591308   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): LOADING
1636591309   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa Wolfspyre. txg 10512765
1636591309   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config untrusted): using uberblock with txg=10512765
1636591309   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591309   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591309   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591309   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591371   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): read 5648 log space maps (5648 total blocks - blksz = 131072 bytes) in 56806 ms
1636591372   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_verify found 1 metadata errors and 1 data errors
1636591372   spa_misc.c:403:spa_load_failed(): spa_load(Wolfspyre, config trusted): FAILED: spa_load_verify failed [error=5]
1636591372   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): UNLOADING
1636591372   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_retry: rewind, max txg: 10512764
1636591372   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): LOADING
1636591372   vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/c114947f-2aa3-4111-85c8-93807d36e292': best uberblock found for spa Wolfspyre. txg 10512764
1636591372   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config untrusted): using uberblock with txg=10512764
1636591373   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591373   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591373   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591373   spa.c:8337:spa_async_request(): spa=Wolfspyre async request task=4096
1636591434   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): read 5648 log space maps (5648 total blocks - blksz = 131072 bytes) in 56899 ms
1636591437   spa_misc.c:418:spa_load_note(): spa_load(Wolfspyre, config trusted): spa_load_verify found 0 metadata errors and 5 data errors
1636591437   mmp.c:240:mmp_thread_start(): MMP thread started pool 'Wolfspyre' gethrtime 5278287819390
1636591437   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 2, ms_id 15, smp_length 275960, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5278288 ms, loading_time 59 ms, ms_max_size 2154479616, max size error 2154479616, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 0, ms_id 546, smp_length 356648, unflushed_allocs 296476672, unflushed_frees 74809344, freed 0, defer 0 + 0, unloaded time 5278288 ms, loading_time 60 ms, ms_max_size 2153127936, max size error 2138218496, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 0, ms_id 713, smp_length 221920, unflushed_allocs 340606976, unflushed_frees 130342912, freed 0, defer 0 + 0, unloaded time 5278349 ms, loading_time 72 ms, ms_max_size 2896936960, max size error 2883379200, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 325, smp_length 504960, unflushed_allocs 550789120, unflushed_frees 54206464, freed 0, defer 0 + 0, unloaded time 5278288 ms, loading_time 139 ms, ms_max_size 2290089984, max size error 2277597184, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 468, smp_length 281000, unflushed_allocs 428285952, unflushed_frees 171515904, freed 0, defer 0 + 0, unloaded time 5278427 ms, loading_time 85 ms, ms_max_size 2262106112, max size error 2258239488, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 2, ms_id 141, smp_length 507560, unflushed_allocs 888864768, unflushed_frees 201637888, freed 0, defer 0 + 0, unloaded time 5278427 ms, loading_time 97 ms, ms_max_size 2688352256, max size error 2655731712, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 2, ms_id 173, smp_length 162656, unflushed_allocs 312688640, unflushed_frees 91111424, freed 0, defer 0 + 0, unloaded time 5278525 ms, loading_time 45 ms, ms_max_size 2758590464, max size error 2750849024, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591437   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 540, smp_length 345384, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5278512 ms, loading_time 68 ms, ms_max_size 2333810688, max size error 2333810688, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 590, smp_length 61000, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5280581 ms, loading_time 42 ms, ms_max_size 2179080192, max size error 2179080192, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 0, ms_id 727, smp_length 55928, unflushed_allocs 129941504, unflushed_frees 132620288, freed 0, defer 0 + 0, unloaded time 5280581 ms, loading_time 44 ms, ms_max_size 3126763520, max size error 3096371200, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 2, ms_id 278, smp_length 38424, unflushed_allocs 542277632, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 5280581 ms, loading_time 59 ms, ms_max_size 2413092864, max size error 2413092864, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 0, ms_id 783, smp_length 166968, unflushed_allocs 525230080, unflushed_frees 46481408, freed 0, defer 0 + 0, unloaded time 5280626 ms, loading_time 50 ms, ms_max_size 2185601024, max size error 2182414336, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439   metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 642, smp_length 87504, unflushed_allocs 1131610112, unflushed_frees 260440064, freed 0, defer 0 + 0, unloaded time 5280678 ms, loading_time 41 ms, ms_max_size 2319802368, max size error 2296479744, old_weight 7c0000000000001, new_weight 7c0000000000001
1636591439   metaslab.c:3896:metaslab_flush(): flushing: txg 10512767, spa Wolfspyre, vdev_id 1, ms_id 439, unflushed_allocs 0, unflushed_frees 0, appended 0 bytes
1636591569   metaslab.c:3648:metaslab_condense(): condensing: txg 2019743, msp[51] ffff890d81a0c800, vdev id 0, spa boot-pool, smp size 65920, segments 404, forcing condense=FALSE
1636591575   metaslab.c:3648:metaslab_condense(): condensing: txg 2019744, msp[18] ffff8915e2232800, vdev id 0, spa boot-pool, smp size 65904, segments 1219, forcing condense=FALSE

zpool import
pine# 
pine#  echo 0 > /sys/module/zfs/parameters/spa_load_verify_data
pine# echo 0 > /sys/module/zfs/parameters/spa_load_verify_metadata
pine# echo 1 > /sys/module/zfs/parameters/spa_load_verify_data
pine# echo 1 > /sys/module/zfs/parameters/spa_load_verify_metadata
pine# zpool import Wolfspyre
pine# echo 1 > /sys/module/zfs/parameters/zfs_recover
pine# echo 1 > /sys/module/zfs/parameters/zfs_dbgmsg_enable
pine# tail -f /proc/spl/kstat/zfs/dbgmsg

OMG I HAVE DATA! COPY IT OFF FUCKING NOW!!!!—-