#340 QCOW images in recent Fedora-Cloud-Base-Vagrant libvirt boxes for Rawhide are not sparse
Closed: fixed 2 years ago by mrc0mmand. Opened 2 years ago by mrc0mmand.

Hello,

In the past ~week I noticed that our CI jobs (which are using Rawhide Vagrant libvirt boxes from https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/) take an unusually long time (the difference is 30 - 60 minutes). Upon closer inspection it's apparent the images take a really long time to get processed by Vagrant and then by libvirt:

02:50:06 ==> rawhide_selinux: Adding box 'fedora-rawhide-cloud' (v0) for provider: libvirt
02:50:06     rawhide_selinux: Downloading: https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/Fedora-Cloud-Base-Vagrant-Rawhide-20210623.n.0.x86_64.vagrant-libvirt.box
02:50:19 
03:00:14 ==> rawhide_selinux: Successfully added box 'fedora-rawhide-cloud' (v0) for 'libvirt'!
03:00:14 ==> rawhide_selinux: Uploading base box image as volume into Libvirt storage...
03:22:05 ==> rawhide_selinux: Creating image (snapshot of base box volume).

The download itself takes a couple of seconds, but the bsdtar, which Vagrant calls afterwards, takes the remaining ~10 minutes in this case:

# time /opt/vagrant/embedded/bin/bsdtar --no-same-owner --no-same-permissions -v -x -m  -f Fedora-Cloud-Base-Vagrant-Rawhide-20210626.n.0.x86_64.vagrant-libvirt.box 
x box.img
x Vagrantfile
x metadata.json

real    9m28.711s
user    1m32.817s
sys 0m6.098s

Also, after unpacking, the image is clearly not sparse:

# file box.img 
box.img: QEMU QCOW Image (v2), 42949672960 bytes
# ll -h box.img 
-rwxr-xr-x. 1 root root 38G Jun 26 19:55 box.img

This explains why it takes so long for the image to get imported into the libvirt storage (22 minutes in this very case).

This has to change quite recently. Just to confirm, here's a F34 Vagrant-libvirt image:

# wget https://dl.fedoraproject.org/pub/fedora/linux/releases/34/Cloud/x86_64/images/Fedora-Cloud-Base-Vagrant-34-1.2.x86_64.vagrant-libvirt.box
...
# time /opt/vagrant/embedded/bin/bsdtar --no-same-owner --no-same-permissions -v -x -m  -f Fedora-Cloud-Base-Vagrant-34-1.2.x86_64.vagrant-libvirt.box 
x box.img
x Vagrantfile
x metadata.json

real    0m29.522s
user    0m6.796s
sys 0m0.310s

# file box.img 
box.img: QEMU QCOW Image (v2), 42949672960 bytes
# ll -h box.img 
-rwxr-xr-x. 1 root root 1.8G Jun 26 19:56 box.img

Is this a feature or a bug? I suspect the latter.


Metadata Update from @davdunc:
- Issue tagged with: meeting

2 years ago

Related #335

BUG 1

Fedora-Cloud-Base-Vagrant-34-1.2.x86_64.vagrant-libvirt.box

fdisk (partial)

Device      Boot Start      End  Sectors Size Id Type
/dev/nbd0p1 *     2048 83886079 83884032  40G 83 Linux

40G is too big for an image with < 400M content.


BUG 2
Fedora-Cloud-Base-Vagrant-Rawhide-20210626.n.0.x86_64.vagrant-libvirt.box

1841344 -rwxr-xr-x. 1 chris chris 1885536256 Jun 26 19:58 box.img
# fstrim -v /mnt
/mnt: 38.5 GiB (41369563136 bytes) trimmed
760256 -rwxr-xr-x. 1 chris chris 1885536256 Jun 26 19:58 box.img

There's a gig of junk in this file. I'm not sure what or why. I guess it could depend on the method of issuing discards, if maybe the original image had e.g. e2fsck -E discard applied.


BUG 3

Fedora-Cloud-Base-Vagrant-Rawhide-20210626.n.0.x86_64.vagrant-libvirt.box

39538048 -rwxr-xr-x. 1 root root 40512126976 Jun 26 20:24 box.img
# fstrim -v /mnt
/mnt: 38.7 GiB (41511047168 bytes) trimmed
491968 -rwxr-xr-x. 1 root root 40512126976 Jun 26 20:30 box.img

So almost 39G is turned into holes, and if I merely tar -S -acf this sparse file, I get

310M -rw-r--r--. 1 root root 310M Jun 26 20:36 box.img.tar.gz
661M -rw-r--r--. 1 root root 661M Jun 26 20:16 Fedora-Cloud-Base-Vagrant-Rawhide-20210626.n.0.x86_64.vagrant-libvirt.box

But why is there 38+G to trim in the Rawhide (btrfs) case? Fully 37G more than in the ext4 case?

I suspect that https://pagure.io/fedora-kickstarts/blob/main/f/fedora-cloud-base-vagrant.ks is includes https://pagure.io/fedora-kickstarts/blob/main/f/fedora-cloud-base.ks#_109 which means this entire btrfs file system is being filled with zeros. But these get compressed in 128KiB increments by btrfs transparent compression. Whereby it turns into a pattern compressible by gzip instead of actually being a bunch of cheap holes that a sparse tar file could quickly unpack.

Bug 1 would be moot, and bug 2 and 3 would be fixed, by enhancing lorax per https://bugzilla.redhat.com/show_bug.cgi?id=1971186

Metadata Update from @davdunc:
- Issue untagged with: meeting

2 years ago

Following commit bc7e8ce we can see from build logs that fstrim runs within the same millisecond as zeroing out empty space.

08:51:30,882 INFO anaconda:program: Zeroing out empty space.
...
08:51:30,883 INFO anaconda:program: /: 4 GiB (4282814464 bytes) trimmed on /dev/vda4
  1. Writing zeros is no longer needed, due to the fstrim in bc7e8ce.
  2. Further, since there's not enough time for the rm -rf of the "zeros" file to be committed prior to the fstrim, the fstrim is ineffective.

https://pagure.io/fedora-kickstarts/pull-request/823

This made the situation worse, and now the resulting images are nearly ~900 MB. I'm putting back the filesystem zeroing, since it clearly helps. But I'm also adding btrfs fi sync calls to enforce disk commit before fstrim.

https://pagure.io/fedora-kickstarts/pull-request/824

334232420 Jul 12 09:17 Fedora-Cloud-Base-Rawhide-20210712.n.0.x86_64.raw.xz
901985396 Jul 15 11:09 Fedora-Cloud-Base-Rawhide-20210715.n.0.x86_64.raw.xz

however

4629504 -rw-r--r--. 1 chris chris 5368709120 Jul 15 12:42 Fedora-Cloud-Base-Rawhide-20210712.n.0.x86_64.raw
1431928 -rw-r--r--. 1 chris chris 5368709120 Jul 15 12:42 Fedora-Cloud-Base-Rawhide-20210715.n.0.x86_64.raw

Dropping the zeros made the image smaller, but less compressible. It's less compressible because the fstrim was ineffective. It was ineffective due to btrfs long commit time (30s by default) and lack of sync. If all I do is mount the existing images and fstrim them, they become smaller, proves this.

# losetup /dev/loop0 Fedora-Cloud-Base-Rawhide-20210712.n.0.x86_64.raw
# losetup /dev/loop1 Fedora-Cloud-Base-Rawhide-20210715.n.0.x86_64.raw
# kpartx -a /dev/loop0
# kpartx -a /dev/loop1
# mount /dev/mapper/loop0p4 /mnt/0
# mount /dev/mapper/loop1p4 /mnt/1
# fstrim -v /mnt/0
/mnt/0: 3.6 GiB (3896537088 bytes) trimmed
# fstrim -v /mnt/1
/mnt/1: 4.1 GiB (4424015872 bytes) trimmed
# ls -ls
898368 -rw-r--r--. 1 chris chris 5368709120 Jul 15 12:49 Fedora-Cloud-Base-Rawhide-20210712.n.0.x86_64.raw
405036 -rw-r--r--. 1 chris chris 5368709120 Jul 15 12:48 Fedora-Cloud-Base-Rawhide-20210715.n.0.x86_64.raw

Both of the images had substantial garbage in them, it's just that the zeros compress better.

If I use the same xz -z6T0 I get:

325220 -rw-r--r--. 1 chris chris 333023324 Jul 15 12:49 Fedora-Cloud-Base-Rawhide-20210712.n.0.x86_64.raw.xz
274676 -rw-r--r--. 1 chris chris 281265940 Jul 15 12:48 Fedora-Cloud-Base-Rawhide-20210715.n.0.x86_64.raw.xz

We needed just a sync to make the fstrim effective, which is not made more effective by zeros because FITRIM() just punches holes wherever the file system says blocks are unused. Zeros don't trim better than sync'd garbage removal.

I think this is fixed now, @mrc0mmand can you verify?

The issue indeed seems to be fixed, with Fedora-Cloud-Base-Vagrant-Rawhide-20210727.n.0.x86_64.vagrant-libvirt.box the import to the libvirt storage takes ~4 seconds.

Thank you!

Metadata Update from @mrc0mmand:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #823 Merged 2 years ago
  • #822 Merged 2 years ago