[Buildroot] [PATCH 6/6 v3] system: add option to use an overlayfs on /var on a r/o root w/ systemd

Norbert Lange nolange79 at gmail.com
Sun Nov 6 16:13:33 UTC 2022


Am Di., 25. Okt. 2022 um 14:12 Uhr schrieb Norbert Lange <nolange79 at gmail.com>:
>
> Am Di., 25. Okt. 2022 um 10:08 Uhr schrieb <yann.morin at orange.com>:
> >
> > Norbert, All,
> >
> > Thank you for your feedback! :-)
> >
> > On 2022-10-23 23:47 +0200, Norbert Lange spake thusly:
> > > Am Di., 18. Okt. 2022 um 21:43 Uhr schrieb <yann.morin at orange.com>:
> > > > While the /var factory seems to be working in most cases, there have
> > > > been suggestions that it may be slightly and subtely borken in some
> > > > (rare? edge?) cases, especially about symlinks.
> > >
> > > Had to dig up an old post of mine (not the only one touching on that),
> > > some issues are:
> > >
> > > -   it kills previous files in /usr/share/factory/var
> > > -   it doesn't handle symlinks (just think of /var already containing
> > > a symlink into that factory),
> > >     especially relative ones.
> > > -   it has sideeffects with tmpfile .confs that are ordered before and
> > > touch /var
> > > -   it has sideeffects with other PRE_CMD_HOOKS touching /var
> > >
> > > (Post is from mid 2020, so forgive me if my memory is fuzzy,
> > > but I had already practical problems with atleast the last 2 of those).
> >
> > Forgive me if my memory is fuzzy, but I don't recall seeing any patch to
> > fix those issues with the factory... ;-)
>
> I am not sure a complete fix for the factory would be a computable problem,
> know the implementation then you can "break it".
> there are way simpler solutions for "make a copy of that stuff".
>
> >
> > > To me this is just not a robust solution
> >
> > Yet, there are some people for whom the factory does work just fine
> > (first-hand experience here, and besides your comments, we have had
> > noone reporting actual issues in the 5+ years we've implemented the
> > factory, AFAICR). So, we do not want to break the situation for them.
> >
> > Once the overlayfs scheme has been in place for some time and it got
> > exercised, we can consider switching the default, and eventualy we can
> > get rid of the factory if it proves to be unfixable (again, without
> > concrete examples that do break it, we can devise a fix).
>
> Thats a sunken cost fallacy ;)
>
> >
> > If we can't yet agree on how to integrate the overlayfs based scheme, we
> > need a way to sort out the conflict between the factory and running
> > tmpfiles at build time, which is what patches 1-4 are for, since they do
> > not change the current behaviour, but clarifies the current situation.
> >
> > So, those are the patches where we should concentrate for now.
> >
> > Patches 5-6 introduce the new overlayfs scheme as an alternative to the
> > factory, a new feature, so they can go in later...
> >
> > And yes, I did test the overlayfs scheme in our use-case here, and yes
> > it does work as advertised, so yes, this is a good feature!
>
> Glad to hear, I am going to clean it up a bit
>
> >
> > > > An other solution is to pre-populate /var at build time, by way of
> > > > calling systemd-tmpfiles, and mounting an overlayfs on-top of it at
> > > > runtime.
> > > >
> > > > This is slightly accrobatic, though, and requires a few hoops:
> > > >   - first, we create a tmpfs
> > > >   - there, we create three directories:
> > > >     - the first to bind-mount /var as it is, i.e. read-only
> > > >     - the second as the read-write upper for the overlayfs
> > > >     - the third as the "working area" for the overlays
> > > ..and we depend on overlayfs
> >
> > I just had to enable overlayfs in the kernel, and it worked without any
> > other package.
>
> I meant its no option if the kernel does not provide the overlayfs,
> which would be an argument against it.
>
> >
> > FTR, I have added new runtime tests to validate both the factory and
> > the overlayfs scenarii. I will post them in the coming days when I've
> > cleaned them up (have to run locally, as my gtilab free minutes are
> > exhausted):
> >
> >     https://gitlab.com/ymorin/buildroot/-/tree/systemdify-var
> >
> > [--SNIP--]
> > > > Systemd units courtesy Norbert, with slight tweaks and cleanups.
> > >
> > > Yeah, Im not fine with the tweaks to drop the symlink
> > > /usr/lib/systemd/system/var.mount -> ../var.mount
> > > (and the added intstall section)
> > >
> > > First in the same local-fs "target" you could mount /etc,
> > > making this a complicated hidden issue, I don't know
> > > when systemd reloads, I believe only after that target.
> > >
> > > Second, this should be enabled by default, and
> > > in a way even when /etc is borked/not ready.
> >
> > So, currently, Buildroot does not work (does nothing to officialy work)
> > seemlessly with an empty /etc, because we explicitly run "systemctl
> > preset-all" at the end of the build (as a prefs_cmd), and that fills in
> > /etc/systemd/system/.
> >
> > As you said, supporting an empty /etc will require *way* more explicit
> > support in Buildroot
>
> It helps if there arent any additional blocks in the way, systemd
> envisions your rootfs "master" to live under /usr.
> Key services should be statically linked (like for ex. the dbus.socket).
>
> > > If a user really wants to disable the mount, he can mask it.
> >
> > That is true if using either the symlink or the install section, no?
> > I.e. they'd just provide a preset that reads:
> >
> >     disable rootfs-bindount-var.service
> >     disable var.mount
>
> Yeah, it should be a "hard" default. And not affected by the usuall
> en/disable/preset
> operations.
> the mask/unmask operations are the "hard" stuff.
>
> Lets turn it around: what arguments can you muster
> for using the install functionality?
>
> >
> > > > Signed-off-by: Yann E. MORIN <yann.morin at orange.com>
> > > > Cc: Norbert Lange <nolange79 at gmail.com>
> > > > Cc: Romain Naour <romain.naour at smile.fr>
> > > > Cc: Jérémy Rosen <jeremy.rosen at smile.fr>
> > [--SNIP--]
> > > > --- /dev/null
> > > > +++ b/package/skeleton-init-systemd/overlayfs/rootfs-bindmount-var.service
> > > > @@ -0,0 +1,21 @@
> > > > +[Unit]
> > > > +Description=Bind-mount variable storage (/var)
> > > > +Documentation=man:file-hierarchy(7)
> > > > +ConditionPathIsSymbolicLink=!/var
> > > > +# ConditionPathIsReadWrite=!/var
> > > > +DefaultDependencies=no
> > > > +Conflicts=umount.target
> > > > +Before=local-fs.target umount.target
> > > > +After=local-fs-pre.target
> > >
> > > A am actually considering changing that to:
> > >
> > > Before=local-fs-pre.target umount.target
> > > # After=local-fs-pre.target
> >
> > No reason to keep comment in units.
>
> For displaying the change, expect a series from me later.
>
> >
> > > It does not depend an anything,
> >
> > It does depend on /run being mounted.
>
> Which is a *given invariant* with systemd, one
> of the first things that happen.
>
> >
> > > so no technical reason to order
> > > it after anything. And it is technically a preparation for the
> > > actual local-fs.target.
> >
> > This.
> >
> > We do not have a vision of the grand scheme of how systemd organises
> > stuff, what each .target means and how they depend on each others. Maybe
> > my search skills are getting rusted as time passes, but I could never
> > find such a design doc, and it lacks sorely. Manpages are only so good
> > as to explain each details, but they do not provide a global overview...
>
> Like that? :
> https://www.freedesktop.org/software/systemd/man/bootup.html
>
> >
> > [--SNIP--]
> > > > +config BR2_INIT_SYSTEMD_VAR_OVERLAYFS
> > > > +       bool "mount an overlayfs backed by a tmpfs"
> > > > +       help
> > > > +         Mount an overlayfs on /var, with the upper as a tmpfs.
> > > > +
> > > > +         To use a persistent storage, provide your own systemd unit(s)
> > > > +         that eventually mount that persistent storage on
> > > > +         /run/varoverlay/upper/
> > > perhaps pull in or depend on overlayfs here
> >
> > I did not need anything beside enabling overlayfs in the kernel (see the
> > runtime tests branch I pointed to above).
> >
> > > Generally the unit and directory names could be more logical,
> >
> > Yes, I agree that we should have a kind of naming scheme for this. But I
> > have no good idea...
> >
> > What I was thinking, though, is that we maybe should make dotted
> > directories, i.e. /run/.varoverlay/{lower,upper,work}
>
> id namespace it like /run/.buildroot/overlay_var_{lower,upper,work}.
>
> >
> > > and for allowing the user to specify a custom mount
> > > by reading an EnvironmentFile in the rootfs-bindmount-var unit.
> >
> > It was my understanding that users could provide their own unit(s),
> > something that would ultimately end up with a mount unit like:
> >
> >     # cat run_varoverlay_upper.mount
> >     [Unit]
> >     After=rootfs-bindmount-var.service
> >     BindsTo=rootfs-bindmount-var.service
> >
> >     [Mount]
> >     What=/dev/something
> >     Where=/run/varoverlay/upper
> >     Type=ext4-or-whatever
> >
> >     [Install]
> >     BoundBy=var.mount
> >     WantedBy=var.mount
> >
> > That way, they do not have to override any of our units; they would just
> > intersperse their unit in the existing dependency graph, between the
> > rootfs-bindmount-var.service and the var.mount.
> >
> > And the content of the filesystem on /dev/something would only get the
> > content of /var, not the {lower,upper,work} directories, which could be
> > a bit confusing.
>
> Yeah, I havent thought about a customizing. Not sure if it wouldnt be better
> to provide a simple one and a customizable one.
>
> ie. the simple one doesnt have to wait for udev, loaded kernel modules
> for blockdevices
> or network connections to do its job.
>
> >
> > > Planning to add some more comments next week, should find dome time here,
> > > huge commit msgs to get through.
> >
> > Commit messages are important, because they do provide all the rationale
> > and reasoning behind a change. They will be there forever, and in the
> > future, we can refer to them to understand why the code eneded up like
> > it is, and with new insight then, we can understand where our reasoning
> > was flawed, or if it was correct, how the environment around has
> > changed.
> >
> > I consider a good commit log more important than the actual change.
>
> Wasnt meant as complaint.
>
> Regards, Norbert

Im still doctoring with this one, please keep this open for now.

The basic idea would be to denote a few buildroot specific directories
that can be used by multiple units.

/run/.br - small filesystem stuff fitting the run mount
/tmp/.br - filesystem stuff to fat for /run

/run/.br/bnd/* - bind mounts for evil trickery, replicating the
original path (eg /var/hugo is bind mounted to /run/.br/bnd/var/hugo)

Some feedback on where to document this? Anyone else required to look at that?

The /var overlay would end up in /tmp/.br/ovl_var - no additional
tmpfs required.

eg. the mount option would be:
lowerdir=/run/.br/bnd/var,upperdir=/tmp/.br/ovl_var/up,workdir=/tmp/.br/ovl_var/wd

Regards, Norbert



More information about the buildroot mailing list