[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Cannot cache bootstrap stage of live-build - cp error reading chroot/proc/1/task/1/* - invalid argument



Hi. I'm glad you've fixed it.

If the problem occurs with a completely clean build directory, and
failing during the `lb boostrap_cache save` stage, then this would
indicate that the failure lies in debootstrap itself. Up to that point
live-build hasn't really done much at all and certainly has not itself
done anything with mount points. The debootstrap tool will have been
run however which itself does do stuff with mount points as I've
previously mentioned.

I would thus expect that your modified wgetrc has indeed interfered
with debootstrap in some way such that it exited (1) with mount points
like /proc left still mounted and (2) with a clean exit code instead of
one indicating error. This combined meant that live-build was unable to
recognise that deboostrap had encountered a problem, and so tried to
carry on, and then since things like /proc were still mounted, it tried
to copy their contents into the cache as part of backing up the
bootstrap filesystem debootstrap created for us.

I think that it would certainly be of value to try and gain some
understanding of what went wrong in debootstrap - what failed; why did
it fail to unmount things; and why did it fail to communicate that
things went wrong.

As a first step it would be helpful to run debootstrap directly to help
prove to the debootstrap folks that the problem exists within their
tool. Could you please do the following:
 1. Create a new empty directory.
 2. Restore your modified wgetrc file (I don't know whether or not
you'd need to reboot for it to go into effect, I expect not).
 3. Run deboostrap in a terminal as follows:
sudo deboostrap buster <your-target-directory>
 4. Confirm whether or not any of the 'proc', 'sys' and 'dev'
directories within your target directory are not empty after it has
completed.

Normally they should be empty once it completes. If any of them are not
empty, then there's our proof that the problem lies in debootstrap.

To clear things up after the test, just delete the temporary directory
you created (the mounts should either get unmounted from this
automatically or at least when you next reboot).

We can then go to the debootstrap folks to get them to look into the
failure and consider fixes.

Would it be possible to send me a copy of the modified wgetrc file so
that I can have some idea of the modifications that have caused this?

On Wed, 2020-04-15 at 18:23 +0000, dbgr wrote:
> Hello again.
> 
> Thank you very much for the attention and for the clarifications.
> 
> I reinstalled live build and created a new clean build directory,
> but 
> got the same problem as described before.
> 
> Then I remembered about some modifications that I made some time ago
> to 
> my 'wgertrc'. I reverted the configuration file back to the original 
> defaults and the error in the bootstrap_cache stage stopped :)
> 
> So I believe it was really something on my end that was causing the 
> problem (the other machine/system that i have tested also had the 
> modified wgetrc), but I am not sure what exactly was going wrong :( 
> Maybe the debootstrap was failing to get some file? In any case, the 
> live-build was not throwing any error besides the ones that I
> pointed 
> (even with the --verbose and --debug flags active).
> 
> Do you believe that is something that I can do to further diagnose
> this 
> issue (maybe try running debootstrap more verbosely). Do you believe 
> there is any need to?
> 
> Every other thing regarding this specific error is not occuring for
> me 
> anymore and now the bootstrap stage is being cached and the build is 
> finishing without any errors...
> 
> 
> 
> On 2020-04-14 23:16, jnqnfe@gmail.com wrote:
> > On Tue, 2020-04-14 at 22:37 +0000, dbgr wrote:
> > > Thank you for your prompt response!
> > > 
> > > Sorry for the delay answering back, but each build take a long
> > > time
> > > in
> > > my machine and I was making sure nothing went wrong again before
> > > writing
> > > here.
> > 
> > That's fine.
> > 
> > > So, the most strange thing to me was exactly the fact that the
> > > bootstrap
> > > stage was trying to make a cache by copying things that should
> > > not
> > > be
> > > there yet =P
> > 
> > "there yet" - clarification, the contents of these directories
> > (chroot/proc, chroot/sys, etc) should never be cached, the
> > directories
> > should be empty. On your host system the /proc, /sys, etc
> > directories
> > are populated with system runtime information, it is not saved to
> > disk,
> > just held in memory! During the build stage the chroot equivalents
> > need
> > to be temporarily bind mounted to the host directories at certain
> > points of the build process so that this stuff is available to any
> > programs that get run within the chroot.
> > 
> > The cached copies of the base filesystem constructed in the
> > bootstrap
> > stage should always have empty such directories. Additionally, the
> > final copies of these directories built into the final image will
> > be
> > empty, to be temporarily populated at runtime when you're running
> > the
> > live disk on a system.
> > 
> > The fact that somehow these directories are not empty at the time
> > of
> > the bootstrap filesystem being saved to the cache, or after being
> > restored from the cache, is indication that something went wrong,
> > i.e.
> > the directories were still bind mounted to the host system
> > directories
> > when they should not have been.
> > 
> > > I was using the lb_clean --purge option to remove the whole
> > > cache,
> > > but
> > > followed your advice and removed just the cache/bootstrap
> > > directory.
> > > It
> > > was a 'no go' - same error/result :(
> > 
> > `sudo lb clean --purge` is exactly right and deletes
> > cache/bootstrap
> > along with other stuff.
> > 
> > You'd not mentioned previously that you'd done anything that would
> > have
> > cleared the cached bootstrap.
> > 
> > Since this has not fixed things, the next thing I might suggest, if
> > you've not already is to try rebooting to ensure any stray bind
> > mounts
> > are cleared, and then try `sudo lb clean --purge` before
> > reattempting
> > building.
> > 
> > Finally, if that is not enough, avoid `lb clean` and just delete
> > the
> > build directory entirely and start from scratch.
> > 
> > If these actions are not enough then I'd suggest that something is
> > indeed wrong with your system somehow.
> > 
> > > As I said, I tried the same process in other
> > > machines/installations
> > > and
> > > got the same outcome (and I did not copy the build directory or
> > > anything). So I am now hoping that the problem is somewhat
> > > related to
> > > my
> > > setups. I will try to purge my installation of live-build and
> > > install
> > > it
> > > back to start again from a clean slate. Will report back -
> > > probably
> > > tomorrow.
> > 
> > Very strange.
> > 
> > > On 2020-04-14 20:50, jnqnfe@gmail.com wrote:
> > > > chroot/proc should be empty at the point where it tries to
> > > > cache
> > > > the
> > > > bootstrapped filesystem it has built. the fact that it is not
> > > > suggests
> > > > something has clearly gone very wrong.
> > > > 
> > > > as you perhaps already know, running programs within a chroot
> > > > environment typically requires first mounting /proc, /sys, and
> > > > /dev/pts
> > > > within the chroot. live-build makes use of a chroot during the
> > > > build
> > > > process, and these (and others) get mounted and unmounted at
> > > > various
> > > > points throughout the build. debootstrap is used early in the
> > > > bootstrapping stage to construct the bootstrapped filesystem,
> > > > and
> > > > this
> > > > itself mounts and unmounts these.
> > > > 
> > > > live-build has a clean-up mechanism setup to try to catch
> > > > failure
> > > > and
> > > > clean things up like unmounting those mount points when things
> > > > go
> > > > wrong. things can sometimes fail in such a way though that it
> > > > cannot
> > > > clean up and/or recover if you re-run it, in which case you
> > > > need to
> > > > scrap the build and try again afresh (with a clean directory).
> > > > it
> > > > is
> > > > currently typically a good idea to always start afresh after a
> > > > failure
> > > > or cancelling to be certain that nothing is wrong (trying to
> > > > make
> > > > things more robust is part of the work i'm contributing).
> > > > 
> > > > it would appear that somehow you have ended up in a situation
> > > > where
> > > > the
> > > > chroot filesystem is getting saved as the cached bootstrap,
> > > > while
> > > > the
> > > > mount points remain mounted, and so it is trying to copy a
> > > > whole
> > > > bunch
> > > > of stuff that should not be copied (the contents of these
> > > > mounts).
> > > > 
> > > > with an essentially "corrupt" bootstrap filesystem then in your
> > > > cache,
> > > > all subsequent builds are thus going to be corrupt also, if
> > > > they
> > > > can
> > > > even complete without error, (unless you bypass cache use as
> > > > you
> > > > found).
> > > > 
> > > > to fix this you need to clean things up for a fresh start -
> > > > delete
> > > > the
> > > > cached bootstrap! (sudo rm -rf cache/bootstrap).
> > > > 
> > > > as for why on earth you're experiencing this on multiple
> > > > machines...
> > > > are you copying the build directory to them including the
> > > > corrupt
> > > > cached bootstrap? are you performing a common set of actions on
> > > > them
> > > > that could explain ending up with a corrupt bootstrap?
> > > > 
> > > > On Tue, 2020-04-14 at 16:36 +0000, dbgr wrote:
> > > > > Hello.
> > > > > 
> > > > > I am using the live-build version 20191221 (the one in
> > > > > testing)
> > > > > on a
> > > > > debian stable/buster system.
> > > > > 
> > > > > Whenever I try to build an image (even the most basic with
> > > > > all
> > > > > the
> > > > > default options) the live-build hangs trying to save the
> > > > > bootstrap
> > > > > stage
> > > > > to cache with the following messages:
> > > > > 
> > > > >    I: Base system installed successfully.
> > > > >    [2020-04-11 11:17:29] lb bootstrap_cache save
> > > > >    P: Saving bootstrap stage to cache...
> > > > >    cp: error reading 'chroot/proc/1/task/1/attr/prev': Ivalid
> > > > > argument
> > > > >    cp: error reading 'chroot/proc/1/task/1/attr/exec': Ivalid
> > > > > argument
> > > > >    cp: error reading 'chroot/proc/1/task/1/attr/fscreate':
> > > > > Ivalid
> > > > > argument
> > > > >    cp: error reading 'chroot/proc/1/task/1/attr/keycreate':
> > > > > Ivalid
> > > > > argument
> > > > >    cp: error reading 'chroot/proc/1/task/1/attr/sockcreate':
> > > > > Ivalid
> > > > > argument
> > > > >    cp: error reading 'chroot/proc/1/task/1/mem': I/O error
> > > > >    cp: error reading 'chroot/proc/1/task/1/clear_refs':
> > > > > Ivalid
> > > > > argument
> > > > > 
> > > > > When this happens the
> > > > > ../cache/bootstrap/proc/1/task/1/pagemap
> > > > > file
> > > > > starts to build up with garbage data until it runs my storage
> > > > > out
> > > > > of
> > > > > space or until I stop the process with crtl+c.
> > > > > 
> > > > > I believe this is related to the saving of the bootstrap
> > > > > stage to
> > > > > cache
> > > > > (besides the message saying so =P) because as soon I set the
> > > > > flag
> > > > > --cache-stage deliberately stating to exclude this stage from
> > > > > caching
> > > > > (with an empty value or defining other stages) the issue do
> > > > > not
> > > > > occur
> > > > > and the build finishes without a problem.
> > > > > 
> > > > > This also happens in other machines running debian
> > > > > stable/buster
> > > > > with
> > > > > this version of live-build and with the one in stable
> > > > > (20190311).
> > > > > 
> > > > > Can anyone help me diagnose this problem? The possibility to
> > > > > cache
> > > > > the
> > > > > bootstrap stage would be very helpful for me :)
> > > > > 
> > > > > Thanks for your attention.
> > > > > 
> > > > > --
> > > > > 
> > > > > dbgr
> > > > > 


Reply to: