#266 atomichost: Update partitioning for new model
Merged 6 years ago by maxamillion. Opened 6 years ago by walters.
walters/fedora-kickstarts atomichost-newpart  into  master

file modified
+12 -2
@@ -24,11 +24,21 @@ 

  

  zerombr

  clearpart --all

- # Atomic differs from cloud - we want LVM

+ # Implement: https://pagure.io/atomic-wg/issue/281

+ # The bare metal layout default is in http://pkgs.fedoraproject.org/cgit/rpms/fedora-productimg-atomic.git

+ # However, the disk size is currently just 6GB for the cloud image (defined in pungi-fedora).  So the

+ # "15GB, rest unallocated" model doesn't make sense.  The Vagrant box is 40GB (apparently a number of

+ # Vagrant boxes come big and rely on thin provisioning).

+ # In both cases, it's simplest to just fill all the disk space.

+ #

+ # For /boot, we currently diverge from the Fedora default of 1GB here since it really feels like a huge

+ # waste of space with the 6GB layout.  At some point we could investigate dropping the /boot partition, see

+ # https://github.com/ostreedev/ostree/pull/215 and https://github.com/ostreedev/ostree/pull/268

  part /boot --size=300 --fstype="ext4"

  part pv.01 --grow

  volgroup atomicos pv.01

- logvol / --size=3000 --fstype="xfs" --name=root --vgname=atomicos

+ # Start from 3GB as we did before, since we just need a size.  But we do --grow to fill all space.

+ logvol / --size=3000 --grow --fstype="xfs" --name=root --vgname=atomicos

  

  # Equivalent of %include fedora-repo.ks

  # Pull from the ostree repo that was created during the compose

See https://pagure.io/atomic-wg/issue/281

This causes us to match the productimg setup. At some point hopefully we can use
autopart and not duplicate it.

(Not tested locally yet)

This causes us to match the productimg setup. At some point hopefully we can use
autopart and not duplicate it.

A few comments:

  • I don't know if we actually want to make the size of the partition
    be 15G for cloud images. It is possible to start a cloud image with
    < 15G of disk and this would probably fall on its face in that case.
    Is the more appropriate approach here to just configure the system
    to auto-extend the root LV and filesystem to fill to the size of
    the disk that was provided to it? If we used partitions and not LVs
    this is what would happen by default (don't know if I prefer one
    over the other, just mentioning it).

  • I think using autopart would be beneficial for https://pagure.io/atomic-wg/issue/299

rebased

6 years ago

You're right, after testing it does fail with our current cloud image default of 6GB. Updated to just use --grow and it will also DTRT with the Vagrant box.

Updated :arrow_up:

in order to get the behavior we want (full disk being used on a cloud instance) we have to do this along with some settings in docker-storage-setup to make it use the full disk for the root filesystem.

So we either need to modify our settings for docker-storage-setup, ORRR we could consider not using LVM and allowing the root partition (and filesystem) extended by cloud-init on instance boot.

Is there anything I'm missing?

Are we using container-storage-setup ROOT_SIZE or not to grow root lv and fs.

Also, if you use full disk for rootfs, that means users can't go back to devicemapper graph drivver easily if they want to?

@dustymabe If no lvm is used, how could we extend the file system in the future if we needed more space. We occasionally receive questions about extending the / filesystem. Will this still be possible on boot/reboot of the machine? Or am I missing the whole point here?

NOTE: this discussion is specifically about the cloud images and booting them in a cloud environment. It is also (currently) targeted at Fedora 27/Rawhide.

@vgoyal
Are we using container-storage-setup ROOT_SIZE or not to grow root lv and fs.

Currently we are not. That is one of the approaches I meant when I said "we either need to modify our settings for docker-storage-setup".

@vgoyal
Also, if you use full disk for rootfs, that means users can't go back to devicemapper graph drivver easily if they want to?

correct, unless they add a new disk, which is easy to do in a cloud environment

@valentinb
If no lvm is used, how could we extend the file system in the future if we needed more space. We occasionally receive questions about extending the / filesystem. Will this still be possible on boot/reboot of the machine? Or am I missing the whole point here?

How is this handled on non-atomic cloud instances today (i.e. ones that only use partitioning and not LVM)? I think some clouds allow for resizing the disk dynamically and then it's easy to use growpart and resize your FS. The discussion here is really "do we want to look more like a traditional cloud image" or not?

Remember that when booted, the cloud image will still be 6GB. Hence actually we are still relying on container-storage-setup to extend the rootfs beyond that, and it should still be possible to set up a separate LV for devicemapper in cloud-init.

On cloud base today I think cloud-init uses cloud-utils-growpart to handle resizing of the rootfs. But that script doesn't know about LVM, and we wanted the storage configurable in concert with containers, so for AH having container-storage-setup preinstalled is important.

@walters
Remember that when booted, the cloud image will still be 6GB. Hence actually we are still relying on container-storage-setup to extend the rootfs beyond that, and it should still be possible to set up a separate LV for devicemapper in cloud-init.

Right, but the default configuration (which can be altered by the sysadmin on bringup), would be to extend that LV and fill all space with the root filesystem, correct?

@walters
On cloud base today I think cloud-init uses cloud-utils-growpart to handle resizing of the rootfs. But that script doesn't know about LVM, and we wanted the storage configurable in concert with containers, so for AH having container-storage-setup preinstalled is important.

Right, still worth having the discussion on whether or not we should act more like a traditional cloud image (partition based), or not (stick with LVM based approach). We needed the LVM based approach before mainly because we needed a separate device for devicemapper. devicemapper is now optional and not default, which means that if someone wants devicemapper or any other storage setup then they can configure it, along with an extra disk on their cloud instance to put it on. I honestly prefer LVM over pure partition based systems because of obvious benefits, but in a cloud environment there are so many assumptions that have been made by hosting companies in the past that make it harder to use (took us a long time to get into DigitalOcean while we waited for them being able to support it, etc).

Right, but the default configuration (which can be altered by the sysadmin on bringup), would be to extend that LV and fill all space with the root filesystem, correct?

Right. At least, that's the idea. I didn't yet test cloud-init+c-s-s though.

As far as dropping LVM...at this point Atomic Host has been around in cloud image form for over 2 years...it feels like talking about "traditional" as being before that becomes a bit weird :smile: It's certainly true that (AFAIK) Fedora/Ubuntu/etc. cloud images tended not to use LVM before.

A problem I see here (and this is a perennial topic) is that making the cloud image fundamentally different from the baremetal path greatly increases risk for our testing of the latter. Given the risks/benefits, plus the fact that we've plowed through most of the issues now, would seem to argue for keeping LVM. This is definitely a good time to discuss it though.

If anything the testing and risk of problems is less without LVM. However, two consequences of dropping it:

Atomic host images need a separate /boot volume [1]. Without LVM (or Btrfs), root fs must be on the last partition in the image to be resizeable, and I'm not certain what logic the installer has to guarantee this is true. The installer has logic to trigger creation of a separate /home volume when the disk is above a certain size, and that /home partition is usually after root fs partition.

The kernel still does not update its idea of a device's partition table if that device contains an active root fs. At least one reboot is required before rootfs can be resized.

Whereas with LVM/Btrfs, you can just add a new block device (hotplug, no reboot needed) to the existing pool; partition maps don't need modification, so no rebooting.

[1] libostree installations appear to still need /boot on a separate fs volume, I still run into this bug with Fedora-Workstation-ostree-x86_64-Rawhide-20170714.n.0.iso if /boot is a directory.
https://bugzilla.redhat.com/show_bug.cgi?id=1395910

I'd like to keep LVM. Is there any downside to keeping it?

NOTE: this discussion is specifically about the cloud images and booting them in a cloud environment. It is also (currently) targeted at Fedora 27/Rawhide.

@chrismurphy
If anything the testing and risk of problems is less without LVM. However, two consequences of dropping it:

risk of problems is less when integrating with cloud providers because
this is what they expect. I think @walters point was that we would
have less problems by keeping LVM because we test a common path
for both our atomic cloud image and atomic ISO install (i.e. LVM)

@chrismurphy
Atomic host images need a separate /boot volume [1]. Without LVM (or Btrfs), root fs must be on the last partition in the image to be resizeable, and I'm not certain what logic the installer has to guarantee this is true. The installer has logic to trigger creation of a separate /home volume when the disk is above a certain size, and that /home partition is usually after root fs partition.

I'm not worried about this for the atomic cloud image. We already
build cloud base images with non-LVM partitions today that handle
this just fine today and the partitions get resized by cloud-init
on boot.

@chrismurphy
The kernel still does not update its idea of a device's partition table if that device contains an active root fs. At least one reboot is required before rootfs can be resized.
Whereas with LVM/Btrfs, you can just add a new block device (hotplug, no reboot needed) to the existing pool; partition maps don't need modification, so no rebooting.

Not sure about this problem. My experience booting cloud base images
over time and having it get resized by cloud-init on boot would seem
to go against that statement. Am I misunderstanding something?

@chrismurphy
[1] libostree installations appear to still need /boot on a separate fs volume, I still run into this bug with Fedora-Workstation-ostree-x86_64-Rawhide 20170714.n.0.iso if /boot is a directory.
https://bugzilla.redhat.com/show_bug.cgi?id=1395910

Yeah, still an open bug. @walters, mind adding some information to the BZ?

@jasonbrooks
I'd like to keep LVM. Is there any downside to keeping it?

A positive to keeping it is that we align more closely with what is
done on bare metal install. A negative is that we don't fit the mold
for cloud providers that they have come to expect. Some cloud
providers can handle us just fine, others have hiccups as a result.
That's the simple +/- game we are playing here. We used to require LVM
because we needed a separate device for DM storage. Now that we don't
I wanted to revisit the subject.

@dustymabe
Not sure about this problem. My experience booting cloud base images
over time and having it get resized by cloud-init on boot would seem
to go against that statement. Am I misunderstanding something?

Disregard.

Explanation: I'm uncertain when cloud-init can do resizes, I suspect it's doing them during startup before rootfs is mounted. But I see cloudinit/config/cc_disk_setup.py refers to partprobe which will update the kernel's idea of a partition map on a device containing rootfs.

As to the central question for cloud images using LVM or not, I'd say it brings nothing to the table. As for downside, is there ever a need for an admin to interact with LVM, e.g. resizing root fs? Or do they use cloud-init or ssm which handles LVM stuff for them? I would want to burden the admin the least possible.

Hmm, we never merged this...

rebased onto ca4d6f4

6 years ago

OK turned out I had tested this locally before, and it worked then and still does now against f27. I did an ISO+kickstart install with a 20GB root disk and the whole thing was used.

LGTM - let's merge this

I know this is not the best place to ask since it is fedora-kickstarts, but..

How we can achieve the same with container-storage-setup?

In our project, we are trying to grow the space but not in a very nice way [1]. We are doing lvextend
[1] http://git.openstack.org/cgit/openstack/magnum/tree/magnum/drivers/common/templates/fragments/configure_docker_storage_driver_atomic.sh

@strigazi Allow me to comment on the script you pointed to. Although it's not a direct answer to your question I want to point out a few things:

Line 34 local lvname=$(lvdisplay | grep "LV\ Path" | awk '{print $3}')
could and probably should be replaced by local lvname=$(lvdisplay | awk '/LV Path/{print $NF}'

This will not require an extra process like grep. Furthermore, I've created a script for online file system resize for normal images(fedora,centos,ubuntu etc..).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/usr/bin/env bash
# chkconfig: 2345 20 80

currentDiskSize=$(< /sys/block/sda/size)
echo "3 1 4 3" >/proc/sys/kernel/printk
echo 1 > /sys/block/sda/device/rescan 
shopt -s nullglob
scan_loc=(/sys/class/scsi_device/*/device/rescan)
for file in "${scan_loc[@]}"; do
  echo 1 > "$file" >/dev/null 2>&1
done
shopt -u nullglob
newDiskSize=$(< /sys/block/sda/size)

((currentDiskSize == newDiskSize)) && exit 1

sleep 1 # Wait 1 second for the kernel to notify userspace.
stsector=$(parted /dev/sda unit s print | awk '/lvm/{print $2}')
parted /dev/sda --script rm 2 2>/dev/null
parted /dev/sda --script "mkpart primary ext4 ${stsector} -1s"
parted /dev/sda --script set 2 lvm on
partx -u /dev/sda2
partprobe
pvresize /dev/sda2 >/dev/null 2>&1
lvresize -r -l +100%FREE /dev/vg00/lv_root -f >/dev/null 2>&1

I am searching for a solid solution where an online file system resize is possible on the atomic image as well. I've been unable to run this on an atomic hosts.

Pull-Request has been merged by maxamillion

6 years ago

@maxamillion thanks!

@strigazi @valentinb - can you send a mail to atomic-devel@projectatomic.io asking that same question and we can discuss there?