Just a quick post today because I ran into (and fixed) an issue that’s apparently not well documented on the internet at large.
I realized today that my Ubuntu system wasn’t storing zfs snapshots for its users. This was a bit alarming, but fortunately I also keep daily backups in two other locations.
Still, if I’m out, and my portable backup storage fails, I’m out of luck. Let’s see what’s going on here.
First, some basic information.
Zsys is a bit of software used in Ubuntu to mange and integrate ZFS volumes with automatic snapshots, and even providing the ability to directly boot into an older snapshot of the system if the current system doesn’t function properly. This is done by way of making a snapshot of the entire system any time a user asks, or automatically when doing package installation, removal, or upgrades.
The system stores system, user, and boot data in separate ZFS datasets and uses the filesystem metadata tags to keep them correlated.
Handily enough, each dataset’s snapshots are available in the root directory of the dataset. In my case, $HOME/.zfs/snapshot should be a list of snapshot names but the directory was empty! I felt a bit like I was stuck in a fairy tail about living in a shoe.
Next, the issue
I ran a test to see if I could save a new snapshot with zsysctl, and it reported my user didn’t exist. This is quite alarming.
Off to Google. For a couple hours. For nothing. This has apparently happened a fair bit before, but no clear resolution has been posted. So, I’ll post my solution.
As mentioned, zsys uses metadata tags in the zfs pool to correlate datasets. Most importantly is a tag named ‘com.ubuntu.zsys:bootfs-datasets’. In my case, my user dataset didn’t have this set anymore. I haven’t had time to find out what caused the tag to get cleared, but it seems like setting it to what it should be has fixed it.
And now, the fix!
You’ll need a couple things to get started, first make sure the tag is empty:
- The dataset for your user directory (use df $HOME to get this)
- The dataset for the currently running system (use df / to get this)
- Root access to your system
Okay, let’s get the value for com.ubuntu.zsys:bootfs-datasets for the home directory:
# zfs get com.ubuntu.zsys:bootfs-datasets rpool/USERDATA/<user_dataset>
zfs will give the value for the bootfs-datasets entry, which is empty!
NAME PROPERTY VALUE SOURCE
rpool/USERDATA/<user_dataset> com.ubuntu.zsys:bootfs-datasets local
Because this property isn’t set, zsysctl can’t correlate your dataset with the running system, and won’t snapshot it. Instead it says the user doesn’t exist.
Let’s fix this by adding a bit of taxonomy to the pool:
# zfs set com.ubuntu.zsys:bootfs-datasets=rpool/ROOT/<root_dataset> rpool/USERDATA/<root_dataset>
…and test the ability to save a snapshot: First we verify .zfs/snapshots is empty,
$ cd $HOME/.zfs/snapshots
And now try a snapshot:
$ zsysctl save
Successfully saved as "autozsys_bgvl86"
Success! Now is the snapshot accessible?
And it looks like things are fixed. The periodic and triggered snapshots should start up once again.
Soapbox time 🙂
There’s a couple things to learn here: First, and most important, keep a backup of your backups. On this laptop I’m blessed with the ability to have multiple drives in it. I use the secondary drive as a file-based incremental backup that’s literally a copy of the root filesystem. Also, the same process copies data to an encrypted volume on my NAS when I’m able to access it. Zsys’s snapshots are incredibly handy when they work, but they failed me today. I was able to do what I needed thanks to the alternate bacukps. 🙂
Second, don’t trust your automated backups. Zsys is still in beta in Ubuntu, and as such it’s not fully vetted for bugs. This one has already been found as it’s reported a few places on the internet.
Third, when you can’t find help, find a working system and compare it to the non-working system. Fortunately I use zsys on two machines, one of which didn’t have an issue with snapshots. I was able to find the metadata information in the two zfs volumes, and compare attributes to see if there were significant differences. It turns out the bootfs-datasets entry had been emptied on the malfunctioning system.
In many cases, the general consensus for fixing things is a reboot or a re-install. These both throw the issue out along with the cause (refer to the old adage, ‘throw the baby out with the bathwater’) rather than resolving the issue. It can also lose important information and even more important time.
This is a quick and dirty post that I’ll clean up later, probably. I hope it was useful or at least a bit educational to read.
[edits: Apparently I have spent enough time between posts that wordpress’s editor has changed a bit. Pasting from vim as I used to do doesn’t work quite the same, so I had to fix it. That’ll show me for not reading before I publish!]