#201 kojid unable to remove deep subdir in expired buildroot
Closed: Fixed 5 years ago Opened 5 years ago by kevin.

We are seeing in the kojid.log:

2016-10-26 02:16:01,912 [INFO] koji.build.buildroot: /usr/bin/mock -r koji/f26-build-6712292-659284 --init
2016-10-26 02:17:11,909 [INFO] koji.build.buildroot: /usr/bin/mock -r koji/f26-build-6712292-659284 --no-clean --target i686 --rebuild /var/tmp/koji/tasks/1290/16201290/local/work/../packages/valgrind/3.12.0/2.fc26/src/valgrind-3.12.0-2.fc26.src.rpm
2016-10-26 02:33:26,633 [WARNING] koji.build: dir removal failed (code 31488) for /var/lib/mock/f26-build-6712292-659284/root/tmp

and in the journal for kojid:
(tons and tons and tons of):

kojid[536]: rmdir: failed to remove '/var/lib/mock/f26-build-6712292-659284/root/tmp/wd_test_9401jH/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subd
kojid[536]: ir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir/subdir': File name too long

With varying levels of subdir. Looks like valgrind tests make a tree that is 585 subdirs deep there.
Likely it should clean up after itself, but kojid should also be able to remove it's buildroot.

Just leaving a comment on this bug so I (hopefully) get added to the CC since there seems to be no CC list.

In this case, the code is actually calling out to rmdir. If coreutils can't remove this directory, I'm not sure that we should expect koji to.

Looks like I can replicate this locally without koji or python:
mkdir -p subdir/.....
find subdir -xdev -depth -type d -print0 |xargs -0 rmdir

Heck, python can't even os.lstat such a path

I probably should replace that removal code with a call to koji.util.rmtree, but that appears to currently fail with a similar error.

Ah, this appears to be the PATH_MAX limit (4096).

>>> len(os.path.join(*['subdir']*585))

Somewhat inclined to consider this notabug. If you feel differently, then please speak up

Well, the practical effect here was that kojid was denial of serviceing our central log host, as it was trying to remove that dir every few seconds on N builders and logging it. This resulted in a few days with ~20GB log files.

perhaps it could try and do what it does now and then fall back to a 'rm -rf' on the buildroot if that fails?

rm -rf is not an option. There is no way to prevent it from crossing fs boundaries (and you really, really don't want kojid to ever do that, esp since runroot is in play).

Also, I guess kojid should probably give up after a number of removal failures, or at least put a wait in place or something.

Oh hey, coreutils finally added an option to rm for that (--one-file-system).
Still, that appears to be fairly recent and I can't rely on it in koji yet.

Another option might be to make kojid just error and exit? It would be anoying, but at least then you could see the problem and fix it and not DOS your logs? That might be a bit extreme for some people however. ;(

Can't we test, that rm supports --one-file-system and use it if it can? If not, use rmdir solution + potential kojid shutdown?

How should we check this? Parse the output of rm --version? That seems clunky, though I guess it would work. I am hesitant to rely on simply trying rm -rf --one-file-system FOO and checking for failure for several reason.

Also, it just seems wrong to exec out to rm to handle file operations that we should be able to handle ourselves. Granted, working around this particular issue in python looks to be very tedious.

Yep, I don't see better variant of testing, 'rm --version' parsing is one, or checking 'rm --one-file-system xyz' for 'unrecognized option'.

What about trying find command with -xdev option to find file or directory then removing these files and dirs by rm command? The find in RHEL5 has this option, too.

What about trying find command with -xdev option to find file or directory then removing these files and dirs by rm command? The find in RHEL5 has this option, too.

That will fail as well. The rm command is also limited by PATH_MAX. It's actually rather interesting that rm -rf manages. It appears that it walks down the tree with chdir and thus avoids these long paths.

I've got a rewrite of rmtree that does the same. I'll post that once I do a little more testing.

@mikem I mean, perhaps we can do as follow as follow

find path_to_remove -xdev -type d -printf '%d  %h/%f\0' | sort -z -n -r |  cut -z -d ' ' -f 2  | xargs -0 rm -rf

@xning yes I know. I've used find like this before. However, you are still passing the full paths to rm, and rm will fail because the full path is too long.

Ha, seems need more carefully a recursive solution.

Merged #245, which fixes this

@mikem changed the status to Closed

5 years ago

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.12

3 years ago

Login to comment on this ticket.