Difference between revisions of "Btrfs"
Views
Actions
Namespaces
Variants
Tools
(subvolume extended) |
(RAID1 added) |
||
Line 162: | Line 162: | ||
=== RAID 10 (automatic) === | === RAID 10 (automatic) === | ||
When using enough devices, Btrfs will distribute all data, so that it not only is mirrored but also striped. | When using enough devices with RAID 1, Btrfs will distribute all data, so that it not only is mirrored but also striped. | ||
== RAID 5 == | == RAID 5 == |
Revision as of 14:33, 10 October 2021
Btrfs is a modern CoW file system
A modern Copy on Write file system for Linux aimed at implementing advanced features while also focusing on fault tolerance, repair and easy administration. Btrfs not only is a file system, but also is partly a volume manager, software-raid, backup-tool, and it is flash-friendly.
Because Btrfs is different, some things seem unfamiliar and strange. If you want to learn the details and the newest stuff, then btrfs.wiki.kernel.org is the place to go. Development of Btrfs started in 2007. Since that time, Btrfs is a part of the Linux kernel and is under active development. The Btrfs code base is stable . However, new features are still under development. Its main features and benefits are:
- Snapshots which do not make the full copy of files
- RAID - support for software-based RAID 0, RAID 1, RAID 10
- Self-healing - checksums for data and metadata, automatic detection of silent data corruption (see btrfs@kernel.org, Btrfs@ARC-wiki, Btrfs@wikipedia)
Familiar with btrfs-slang ?
Because Btrfs is different, you will find some words that do have a special meaning when used for btrfs. This may be a source of confusion.
It is possible to make a writeable(rw) subvolume out of a ro-snapshot. This is the way roll back does work.
- Without raid it is possible to correct some faults that happen because of poweroutage. (when the filesystem is mounted)
- With raid it is possible to repair some parts of files that were damaged by small faults on one device. (when the file is read)
Btrfs Volume
A pool of raw storage. Consists of one or more devices. The size of the volume will be the addition of all included devices, unless you use RAID.
If you do use more then one device, please also read the section about RAID. You are able to add/remove devices at any time to increase/decrease the size of the volume. With adding/removing devices it is also possible to move a volume from one device to another (without changing the UUID).
Usually you do not mount the Btrfs volume itself, but you mount subvolumes. There may be times when it is practical to mount the Btrfs volume-root itself. Then you are able to change the volume layout. All (writeable) subvolumes inside a volume are movable inside the volume with mv. Moving subvolumes will not touch the data, but change the volume layout in an instant.
When not otherwise specified, additional devices are handled as Just a Bunch of Disks (JBOD)
move a volume to another disk
There are a lot of ways you can move a "normal" filesystem from one disk to another. But there are dangers with moving btrfs volumes that do not exist with other filesystems! Don´t ever move a btrfs volume with a tool that does not say it is 100% btrfs-proof. When at any time there are 2 partitions in one computer that have the same filesystem UUID, one ore both filesystems may be destroyed. Under the topic tips you will find an easy way to do move a volume without any danger
subvolume
A subvolume is an independently mountable POSIX file-tree and not a block device. It is the part of a volume that will be mounted writeable into your Linux system. If you dont´t care about snapshots, and you don´t care about backups, it would be possible to use only one subvolume for everything. But then you would not be able to use the powers of Btrfs. Lets assume you do care.
All subvolumes share the space of the Btrfs volume. You may create subvolumes at will. (You may think of subvolumes as sort of "dynamic partitions" inside a Btrfs volume)
When making snapshots (or send/receive) every subvolume will be handled separately. For example when you have 2 subvolumes(@, @home), and make a snapshot of one of them(@), this snapshot will contain every bit of data of all fines in this subvolume(@), but none of the data from the other subvolume(@home). So if you make a few subvolumes, you are able to follow different strategies for snapshots of them. And you can restore each of them separately.
By convention the names of subvolumes start with @ (@home, @snapshots ...).
subvolume @
This is the subvolume where your complete manjaro system will reside. It is mounted at "/" in your filesystem. You may take snapshots of this subvolume (or backups with send/receive) to secure a running manjaro system. When something bad happens, you are able to rollback to one of the snapshots, or to restore one of the backups of this subvolume without loosing your data at /home.
In order to make a rollback possible, this has to contain all and every data that is needed for your manjaro to work properly! This includes:
- config of your bootloader (/boot/brub/grub.cfg)
- initramdisk (/boot/initramfs-5.10-x86_64.img)
- kernel (/boot/vmlinuz-5.10-x86_64)
- kernel-modules (/usr/lib/modules/5.10.59-1-MANJARO/*)
- programs (/usr/bin/*)
- configs (/etc/*)
- libraries (/usr/lib/*)
- your root account (/root/*)
- rest of system files (/usr/*)
subvolume @home
This is the subvolume where all user data ist stored. When you rollback your "@", this will not change at all. You may take snapshots of /home at a different rate and for different reasons. While snapshots of "@" are good for rollback, snapshots of @homw are good for undeleting accidentally by users deleted (or overwritten) files.
subvolume @snapshots, @home.snapshots
It is wise to "store" snapshots NOT inside the subvolume they where taken from. So this may be the right place to store your snapshots of @ or @home.
subvolume @...
Sometimes it is desired to have other special snapshot strategies (or no snapshots at all) for some parts of the filesystem. If you need this, make another subvolume.
snapshot
A snapshot looks nearly the same as a subvolume. But snapshots really are "read-only photographs of a subvolume". While the subvolume changes with time. The snapshot is frozen in the state of the subvolume at the time you made it. A snapshot is read-only. Therefore it is guaranteed not to change. In a snapshot you will find all files of the subvolume frozen in time.
Taking a snapshot is very fast, and nearly priceless. After the snapshot is taken, all future writes will go as in CoW usual. But none of the space occupied by files in the snapshot will be reusable. As you write more and more new files the filesystem will grow because it can not reuse the files in the snapshot. A new snapshot will freeze additional all created or modified files since the last snapshot and so on. If you don´t release(delete) any snapshot you will eventually run out of space soon(disk full)
Deleting a snapshot does not delete any files that are actually in use by other snapshots or the subvolume they where taken from. But to free some space, Btrfs has to test for every file in the snapshot, whether it is in use, or it is not. If it is not, the space of this file/version will be freed.(This is greatly simplified) Therefore it is costly to remove snapshots. And Btrfs will do this work in the background. You may notice this, because when you delete a snapshot there will be no immediate gain in freed space. After a while you will notice that some space was freed.
Snapshots (if regularly made) may be used for:
- comparing config files from different "times"
- merging config files
- recovering accidentally deleted/overwritten files
- system roll back
- anchor for a backup with send/receive
- basis for a seed
- what do you use snapshots for ?
Making and deleting snapshots is best done automatically:
- snapper
- timeshift
If you need to roll back into a snapshot you have to replace the actual subvolume by the chosen snapshot.
- Make a snapshot of the actual subvolume (for later reference)
- Move the subvolume out of its actual place
- Create a new subvolume from the snapshot chosen for roll back
- Make the new subvolume the default
Btrfs RAID
With Btrfs you no longer need to use mdadm to create mirrored volumes or to create RAIDs. This is already included in btrfs, and very easy to use. There are even advanced features bult in:
- Add devices to the volume This will integrate a device into the mounted volume root # btrfs device add /dev/sdz7 /
- Remove devices from the volume. This will not delete any data, but remove the device from the volume. Bevorehand all data will be copied to the remaining devices of the volume.root # btrfs device delete /dev/sdz8 /
- Use devices with different sizes in one volume
- Switch the volume between RAID levels
- Convert data to different RAID levels
- Do this while the volume is mounted and being used
see RAID@wikipedia
RAID 0 (not Just a Bunch of Disks)
Using one ore more devices to build a volume. This volume has the capacity of all the used devices together(1+2+3+4...). This is an very easy way to expand your volume when you need more space. You even can add 2 or 3 devices at a time. When you want to replace a device, you can add the new device, then remove the old device. Btrfs will move all data as necessary. To distribute all data to all devices you may want to balance the volume. Btrfs will stripe the data to all devices.
1 device
In most setups you will start a volume with 1 device. If only one device is present, metadata will be duplicated on that device. Even with this simple setup you benefit from most features of Btrfs.
2 or more devices
By default, metadata will be mirrored across two devices and data will be striped across all of the devices present. But if you have 2 or more devices in your volume you should consider using RAID 1.
RAID 1 (mirrored), 1C3, 1C4
automatic repair
In order to preserve the integrity of the volume, Btrfs does separate CRC-checksums of metadata blocks and of data blocks. Every time a data block is read, the checksum is verified. When the checksum shows that the data is not good, Btrfs tries to get a good copy from the mirrored block. Then the bad block is written again with the good data from the mirrored block. This happens in background. The filesystem has been repaired, and this is logged into syslog. This can be forced by using btrfs scrub.
RAID 10 (automatic)
When using enough devices with RAID 1, Btrfs will distribute all data, so that it not only is mirrored but also striped.
RAID 5
RAID 6
Btrfs maintenance
balance
scrub
Btrfs options
compression
encryption
send⇒receive = backup
quotas
Quota support in Btrfs is implemented at the subvolume level.
For more info see Quota_support@btrfs.kernel.org
tips
move a volume
There is an easy and secure way to move a volume to another disk/device. If you use Btrfs itself to move the volume, there will be no danger. You even can do this while the volume is in use.
- Create the partition you want to use as destination without formatting it. Or remove the filesystem when one is present
- Add the destination device to your volume by btrfs device add /dev/nvme9n1p3
- Remove the source device from your volume by btrfs device remove /dev/sdz3
Btrfs will notice, that it is necessary for this setup to move all data from the source device to the destination device. And it will start immediately to move data in the background. Meanwhile you can use your PC as you want.
- Empty Blocks will not be moved
- Compressed data will remain compressed
- All Snapshots will remain
- The UUID of the filesystem will remain the same, but btrfs will be aware of this
- If you used the UUID to identify your volume, you even wont´t need to edit /boot/grub/grub.cfg and /etc/fstab
- Only, don't shutdown while the move of the volume is not complete.
If you want to watch the volume move, inside a terminal:
Btrfs Tools
Btrfs
btrfsck
this is not what you think it is 😜
Recomendations
Partition | Filesystem | Size | Partition type |
---|---|---|---|
/dev/sda1 | Fat32 | 1GiB | EFI system partition |
/dev/sda2 | Btrfs | 1Gib - 8EiB | Btrfs Volume |
/dev/sda3 | swap | 4GiB, at least your RAM-size | Swap partition (optional) |
Partition | Filesystem | Size | Partition type |
---|---|---|---|
/dev/sda1 | (bootloader) | 4MiB | BIOS boot partition |
/dev/sda2 | Btrfs | 1Gib - 8EiB | Btrfs Volume |
/dev/sda3 | swap | 4GiB, at least your RAM-size | Swap partition (optional) |
Example codes should be here.
Please be aware that the information on this page is a simplified version of the reality. Is is written to make the reader understand a little of these complex things. To get an in depth understanding it will be neccesary to read further at btrfs.wiki.kernel.org or other places.
Additional Information
Why not btrfs ?
A lot of people say: "I don't use btrfs because it is experimental and is not stable. You can´t use it in production. It is not safe!".
Not stable ?
The status of btrfs was experimental for a long time, but the the core functionality is considered good enough for daily use. (from kernel.org)
If you see statements declaring Btrfs as not stable, please look for the date of them. Some seem to date from 10 years ago. So if you want to give Btrfs a chance, you have to look for newer statements. Maybe even look at Btrfs Kernel Wiki as that sure is the best information regarding Btrfs
Experimental ?
Btrfs is feature-rich! There are new features being implemented and these should be considered experimental for a few releases when the bugs get ironed out when number of brave users help stabilizing it.(from kernel.org)
Some features are not implemented yet. Others are only partly implemented. Some are experimental and not suggested for production use. As is always the case in Linux-land you decide what to use, and so you are responsible for your own decisions.
Not usable for production ?
- Distro support for Btrfs as main filesystem
- Some companies do use Btrfs in production@wiki.btrfs.kernel.org
- Some manufacturers do deploy devices where Btrfs is installed by default.
Difficult to repair ?
Indeed, when you search for the usual ways to repair a file system like FAT or Ext4 then you don't find good information. But this is not because it is difficult to repair Btrfs, but because repairing Btrfs does work very differently.
What's this "Copy on Write"
When you want to get the most out of using Btrfs you do need to know some things about this file system. Then you are able to use it properly and to your advantage. Btrfs is not difficult, but different to some extend.
Write in place (FAT32)
Most older file systems do write "in place". This means that some data or metadata will be written "over" the previous data at the same place.
For example this is the case for FAT32 file systems. The File Allocation Table is at a fixed place on this file system. When the "FAT" changes (because a file got bigger and needs more blocks), this new FAT must be written with the new data to the same place as before. When the disk is ejected before (or while) this data is written, the file system will be corrupted. And the FAT does change a lot.
The danger of corruption is especially big while metadata (like filename, permission, usage of disk space ...) is being written.
Write to a metadata-log (Ext4)
There is a solution to this with newer file systems like Ext4. Instead of writing metadata "in place", metadata is written into an "endless" log. Then it is not possible to be corrupted while overwritten. This is possible because metadata is only a very small part of the data in a file system.
There has to be an additional mechanism to make this safe. Sometimes this is called "barriers", and there have to be checksums that tell when a part of the log is corrupted.
This does protect the file system itself, but not the files in it. Because a file may be overwritten in place, and then the old file is lost, and the new one may not have been written completely.
Copy on Write! (Btrfs)
Copy on Write is a "new" concept. It means the file system will try to never write over existing data. How is this even possible?
- Files are appended at the end of a "data page"
- Metadata is appended at a "metadata page"
- Inside a page nothing is ever overwritten
- When a page is full the file system will use the next free page
- Deleting a file does not write/clean its data, but writes metadata, that marks this file as deleted
- Overwriting a file does first append the new file to the "data page", then writes the metadata for this file
- Changing small parts of a file will write only the new parts, then link the rest to the old file
- there are checksums for data and metadata
Downsides
- Management of space is complex
- There are 2 sorts of pages
- There has to be a clean-up-process who makes the space of deleted files reusable, so that the disk does not run out of free pages
- It must be avoided to write data unnecessarily, because then the clean-up would also be very expensive
(Dis)advantages
- It is possible to detect nearly any corruption because of the checksums
- When the power is lost, or the disk is disconnected, all old data is save. WHY?
- Every bit of "old" data from before the power loss or the disconnection is present because it is NOT overwritten
- Only the newly written data may be partly damaged
- The metadata may also be partly damaged
- When mounting the volume it is possible by analysing checksums and metadata to find the point in the file system where all was good
- Btrfs will automatically roll back to this point, then it can mount the file system writeable
- CoW is a sound foundation to build upon
- Snapshots
- RAID
- Volume management
- Compression
- Encryption (maybe some time in the future)
Use the Forum!
It is a good Idea to search the forum for posts related to btrfs.
Btrfs is fast moving! See Also: