When I last left off, I expressed my concerns about building software from nixpkgs’s stable 22.05 release on Glitch. Here’s a report on what happened with those issues I was concerned about and more.
They were actually very friendly about this. When you try to run one of these commands, you get a helpful message:
error: experimental Nix feature ‘nix-command’ is disabled; use ‘–extra-experimental-features nix-command’ to override
I went and put those darned
--extra-experimental-features nix-command flags everywhere, and things mostly went back to working.
But in one place they turned out to be right about these commands being unstable.
nix copy on a derivation now instead tries to copy the derivation’s outputs instead of the derivation itself. Luckily, there’s also a new flag
--derivation flag to copy the derivation itself.
This indeed didn’t work on Glitch, as predicted. It broke all sorts of threading related stuff.
I couldn’t find any patches for this from a web search. I think by the time glibc started using clone3, various container engines had already corrected themselves to use the proper ENOSYS error code instead of EPERM. So I ended up writing up my own patch very similar to the faccessat2 patch from Red Hat.
diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c index 979f788..e4cd5c9 100644 --- a/sysdeps/unix/sysv/linux/clone-internal.c +++ b/sysdeps/unix/sysv/linux/clone-internal.c @@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args, /* Try clone3 first. */ int saved_errno = errno; ret = __clone3 (cl_args, sizeof (*cl_args), func, arg); - if (ret != -1 || errno != ENOSYS) + if (ret != -1 || !(errno == ENOSYS || errno == EPERM)) return ret; /* NB: Restore errno since errno may be checked against non-zero
We encountered this problem running nixpkgs stable 21.11 on Glitch, and nixpkgs stable 22.05 on Glitch continues to exhibit this problem.
I continued to include that glibc patch by Red Hat, which makes glibc use its fallback code on EPERM in addition to ENOSYS.
These are old problems, but I might as well record that they still happen. The tests for these libraries try to set a thread’s CPU affinity, which isn’t allowed on Glitch. The tests fail, and, as a result, the packages fail to build.
I’ve compiled these outside of Glitch instead.
Nix uses the AWS S3 SDK for certain “upload packages to the internet” sort of features, so we’ll have to avoid using those. This project uses a custom solution which only relies on Nix’s plain HTTP cache, and things seem to be working.
make check-TESTS make: Entering directory '/tmp/nix-build-boehm-gc-8.0.6.drv-0/gc-8.0.6' make: Entering directory '/tmp/nix-build-boehm-gc-8.0.6.drv-0/gc-8.0.6' PASS: cordtest PASS: middletest PASS: smashtest PASS: hugetest PASS: realloc_test PASS: staticrootstest PASS: test_atomic_ops PASS: threadleaktest PASS: threadkey_test make: /tmp/nix/store/91gza0zcl96wz8pkhjqdqlxvlwnlf51g-bash-5.1-p16/bin/bash: Operation not permitted make: *** [Makefile:2150: subthreadcreate_test.log] Error 127 make: Leaving directory '/tmp/nix-build-boehm-gc-8.0.6.drv-0/gc-8.0.6' make: *** [Makefile:2048: check-TESTS] Error 2 make: Leaving directory '/tmp/nix-build-boehm-gc-8.0.6.drv-0/gc-8.0.6' make: *** [Makefile:2396: check-am] Error 2 make: Leaving directory '/tmp/nix-build-boehm-gc-8.0.6.drv-0/gc-8.0.6' make: *** [Makefile:1825: check-recursive] Error 1 builder for '/tmp/nix/store/qkjjynmwsig6ax1r8ldw55njkpd6w78f-boehm-gc-8.0.6.drv' failed with exit code 2
That happened, but I couldn’t reproduce it reliably outside of the build process. And without being able to reproduce it, I couldn’t figure out what exactly was experiencing this “Operation not permitted” error.
I worked around this by building this package outside of Glitch.
FAIL: run-backtrace-dwarf.sh ============================ PTRACE_TRACEME failed: Operation not permitted
That’s right, you can’t ptrace on Glitch. But if we were allowed to use ptrace we could just use PRoot and not even worry about compiling nix for a different store directory. Nevertheless,
Anyway, I compiled this outside of Glitch.
file the program that describes what kind of file a given file is)
Apparently for some time nix couldn’t download things from FTP. This doesn’t often come up, because most packages describe how to use
curl to download its sources. The “built in” downloader is only used in some select packages early on in dependency graph.
file happens to be such a package which specifies an FTP URL.
Files downloaded from the internet are stored in a content-addressed way, so you can download them from anywhere and nix will always put it in the right place. I worked around this by downloading a copy of the same file over HTTP instead.
Later versions of nix, including the version built after the challenges described in this post, no longer experience this issue with FTP downloads, so we shouldn’t have this problem in the future.
nixpkgs’s distribution of git doesn’t sufficiently rewrite some tests’ dependencies into nix store paths, resulting in some tests trying to use the host system’s libraries. And the host system’s libraries on Glitch are very old, which causes those tests to fail.
I reported this problem: git 2.36 installCheck uses LD_PRELOAD · Issue #176999 · NixOS/nixpkgs · GitHub
I worked around this by building these packages outside of Glitch.
Note: Some issues, including this one, aren’t involved in building
nix itself. Instead, they came up in building other packages. I’ll mark these based on my recollection.
(This one came up when trying to build emacs-nox.)
It turns out there’s more in this version of glibc (Yeah, glibc not glib as in the heading. Unfortunate about the similar naming .) that breaks. This version of glibc now exposes
close_range, which glib uses. And as with all these newfangled syscalls,
close_range is giving EPERM on Glitch. But one thing we’re quite lucky about is that this gets caught in glib’s own test suite.
** (/tmp/nix-build-glib-2.72.2.drv-0/glib-2.72.2/build/glib/gtester:16380): WARNING **: 02:11:36.692: Failed to execute test binary: /tmp/nix-build-glib-2.72.2.drv-0/glib-2.72.2/build/glib/gtester: Failed to close file descriptor for child process (Operation not permitted) ninja: build stopped: subcommand failed. error: builder for '/tmp/nix/store/akrf00idnfzmbrhk3w7qyiydp6xxss6q-glib-2.72.2.drv' failed with exit code 1
Searching for that “Failed to close file descriptor for child process” message takes you almost directly to a
close_range call. Very nice design on glib’s part .
Unlike what glibc has done with, for example,
faccessat internally attempting to use
faccessat2 and falling back to jankier logic if it’s not available, all glibc does is expose
close_range directly. It’s up to the code calling
close_range do handle ENOSYS. And thus the check for ENOSYS and the fallback code are in glib.
I worked around this by adding a patch to glib, again similar to Red Hat’s faccessat2 patch.
diff --git a/glib/gspawn.c b/glib/gspawn.c index 0a2cbe5..c9fda00 100644 --- a/glib/gspawn.c +++ b/glib/gspawn.c @@ -1544,7 +1544,7 @@ safe_fdwalk_set_cloexec (int lowfd) * fall back to safe_fdwalk(). Handle EINVAL in case `CLOSE_RANGE_CLOEXEC` * is not supported. */ int ret = close_range (lowfd, G_MAXUINT, CLOSE_RANGE_CLOEXEC); - if (ret == 0 || !(errno == ENOSYS || errno == EINVAL)) + if (ret == 0 || !(errno == ENOSYS || errno == EPERM || errno == EINVAL)) return ret; #endif /* HAVE_CLOSE_RANGE */ return safe_fdwalk (set_cloexec, GINT_TO_POINTER (lowfd)); @@ -1597,7 +1597,7 @@ safe_closefrom (int lowfd) * Handle ENOSYS in case it’s supported in libc but not the kernel; if so, * fall back to safe_fdwalk(). */ int ret = close_range (lowfd, G_MAXUINT, 0); - if (ret == 0 || errno != ENOSYS) + if (ret == 0 || !(errno == ENOSYS || errno == EPERM)) return ret; #endif /* HAVE_CLOSE_RANGE */ return safe_fdwalk (close_func, GINT_TO_POINTER (lowfd));
I saw someone else expressing the sentiment that nix after 2.3.x suddenly had a lot bigger build dependencies closure, but now I can’t find the link. Possibly related to this " Any consensus on documentation generation tool for nixpkgs manual? - Documentation - NixOS Discourse ." Anyway, that’s indeed the case, with nix 2.8.1 which ships in nixpkgs stable 22.05. For one thing, it now needs LLVM.
By the way, it needs LLVM for Rust, in order to build mdBook, in order to generate nix’s documentation files. It’s not even needed at runtime .
Building the llvm package actually works. But it takes about 9 hours 38 minutes (unboosted,
/sys/fs/cgroup/cpu/cpu.cfs_quota_us:100000). That’s a tense time making sure you always come back to wiggle the mouse every half hour. If Glitch could please Warn us if our editor session will go idle
Oh there’s also this really scary time during the tests where there’s no output and no CPU usage for several minutes. I have no idea what it’s doing during that time.
I found an issue in a package called “mailcap,” which is more or less a file containing information about MIME types. The only building involved is to download a release archive and extract it. Due to that being so simple, the exact output is known, and changes to the script that nixpkgs uses to build this package won’t invalidate any caches of the finished package.
It thus went unnoticed when some changes to the build script really did cause the build to break. We only know because we’re building for a different nix store path, which causes nix not to use the cache of the normal “/nix/store” version that’s readily available online.
I reported this problem: mailcap hash change infinite recursion · Issue #176286 · NixOS/nixpkgs · GitHub (A maintainer later renamed the issue when the same problem uncovered a more pressing problem.)
Because the package has exactly known contents, I was able to work around this by downloading the “/nix/store” version from a public cache, extracting it, and importing it into our “/tmp/nix/store” store.
I believe the nixpkgs maintainers have since fixed the build issue, so we should no longer experience this in future nixpkgs releases.
(This one came up when trying to build graphviz-nox.)
___________________________ test_unused_port_fixture ___________________________ tests/test_simple.py:47: in test_unused_port_fixture server1 = await asyncio.start_server(closer, host="localhost", port=unused_tcp_port) /tmp/nix/store/pmcprpmmhn5crv44f2rwrfwirgis86yy-python3-3.9.13/lib/python3.9/asyncio/streams.py:94: in start_server return await loop.create_server(factory, host, port, **kwds) /tmp/nix/store/pmcprpmmhn5crv44f2rwrfwirgis86yy-python3-3.9.13/lib/python3.9/asyncio/base_events.py:1506: in create_server raise OSError(err.errno, 'error while attempting ' E OSError: [Errno 99] error while attempting to bind on address ('::1', 49314, 0, 0): cannot assign requested address
It looks like some of the tests need IPv6, and we don’t have that on Glitch.
I worked around this by building this package outside of Glitch.
Imagine if software made people happy.