Things encountered while building nixpkgs 22.05

Last thread: Building Nix packages and saving them in Glitch assets

When I last left off, I expressed my concerns about building software from nixpkgs’s stable 22.05 release on Glitch. Here’s a report on what happened with those issues I was concerned about and more.

nix copy and nix show-derivation harder to use

They were actually very friendly about this. When you try to run one of these commands, you get a helpful message:

error: experimental Nix feature ‘nix-command’ is disabled; use ‘–extra-experimental-features nix-command’ to override

I went and put those darned --extra-experimental-features nix-command flags everywhere, and things mostly went back to working.

But in one place they turned out to be right about these commands being unstable. nix copy on a derivation now instead tries to copy the derivation’s outputs instead of the derivation itself. Luckily, there’s also a new flag --derivation flag to copy the derivation itself.

glibc clone3

This indeed didn’t work on Glitch, as predicted. It broke all sorts of threading related stuff.

I couldn’t find any patches for this from a web search. I think by the time glibc started using clone3, various container engines had already corrected themselves to use the proper ENOSYS error code instead of EPERM. So I ended up writing up my own patch very similar to the faccessat2 patch from Red Hat.

diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
index 979f788..e4cd5c9 100644
--- a/sysdeps/unix/sysv/linux/clone-internal.c
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args,
   /* Try clone3 first.  */
   int saved_errno = errno;
   ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
-  if (ret != -1 || errno != ENOSYS)
+  if (ret != -1 || !(errno == ENOSYS || errno == EPERM))
     return ret;
 
   /* NB: Restore errno since errno may be checked against non-zero

glibc faccessat2

We encountered this problem running nixpkgs stable 21.11 on Glitch, and nixpkgs stable 22.05 on Glitch continues to exhibit this problem.

I continued to include that glibc patch by Red Hat, which makes glibc use its fallback code on EPERM in addition to ENOSYS.

aws-c-common and aws-c-sdkutils

These are old problems, but I might as well record that they still happen. The tests for these libraries try to set a thread’s CPU affinity, which isn’t allowed on Glitch. The tests fail, and, as a result, the packages fail to build.

I’ve compiled these outside of Glitch instead.

Nix uses the AWS S3 SDK for certain “upload packages to the internet” sort of features, so we’ll have to avoid using those. This project uses a custom solution which only relies on Nix’s plain HTTP cache, and things seem to be working.

boehm-gc test failure

make  check-TESTS
make[2]: Entering directory '/tmp/nix-build-boehm-gc-8.0.6.drv-0/gc-8.0.6'
make[3]: Entering directory '/tmp/nix-build-boehm-gc-8.0.6.drv-0/gc-8.0.6'
PASS: cordtest
PASS: middletest
PASS: smashtest
PASS: hugetest
PASS: realloc_test
PASS: staticrootstest
PASS: test_atomic_ops
PASS: threadleaktest
PASS: threadkey_test
make[3]: /tmp/nix/store/91gza0zcl96wz8pkhjqdqlxvlwnlf51g-bash-5.1-p16/bin/bash: Operation not permitted
make[3]: *** [Makefile:2150: subthreadcreate_test.log] Error 127
make[3]: Leaving directory '/tmp/nix-build-boehm-gc-8.0.6.drv-0/gc-8.0.6'
make[2]: *** [Makefile:2048: check-TESTS] Error 2
make[2]: Leaving directory '/tmp/nix-build-boehm-gc-8.0.6.drv-0/gc-8.0.6'
make[1]: *** [Makefile:2396: check-am] Error 2
make[1]: Leaving directory '/tmp/nix-build-boehm-gc-8.0.6.drv-0/gc-8.0.6'
make: *** [Makefile:1825: check-recursive] Error 1
builder for '/tmp/nix/store/qkjjynmwsig6ax1r8ldw55njkpd6w78f-boehm-gc-8.0.6.drv' failed with exit code 2

That happened, but I couldn’t reproduce it reliably outside of the build process. And without being able to reproduce it, I couldn’t figure out what exactly was experiencing this “Operation not permitted” error.

I worked around this by building this package outside of Glitch.

elfutils test failure

FAIL: run-backtrace-dwarf.sh
============================

PTRACE_TRACEME failed: Operation not permitted

That’s right, you can’t ptrace on Glitch. But if we were allowed to use ptrace we could just use PRoot and not even worry about compiling nix for a different store directory. Nevertheless,

I am once again asking for ptrace on Glitch.

Anyway, I compiled this outside of Glitch.

couldn’t download source archive for file

(file the program that describes what kind of file a given file is)

Apparently for some time nix couldn’t download things from FTP. This doesn’t often come up, because most packages describe how to use curl to download its sources. The “built in” downloader is only used in some select packages early on in dependency graph. file happens to be such a package which specifies an FTP URL.

Files downloaded from the internet are stored in a content-addressed way, so you can download them from anywhere and nix will always put it in the right place. I worked around this by downloading a copy of the same file over HTTP instead.

Later versions of nix, including the version built after the challenges described in this post, no longer experience this issue with FTP downloads, so we shouldn’t have this problem in the future.

git and git-minimal test failure

nixpkgs’s distribution of git doesn’t sufficiently rewrite some tests’ dependencies into nix store paths, resulting in some tests trying to use the host system’s libraries. And the host system’s libraries on Glitch are very old, which causes those tests to fail.

I reported this problem: git 2.36 installCheck uses LD_PRELOAD · Issue #176999 · NixOS/nixpkgs · GitHub

I worked around this by building these packages outside of Glitch.

glib

Note: Some issues, including this one, aren’t involved in building nix itself. Instead, they came up in building other packages. I’ll mark these based on my recollection.

(This one came up when trying to build emacs-nox.)

It turns out there’s more in this version of glibc (Yeah, glibc not glib as in the heading. Unfortunate about the similar naming :woman_shrugging:.) that breaks. This version of glibc now exposes close_range, which glib uses. And as with all these newfangled syscalls, close_range is giving EPERM on Glitch. But one thing we’re quite lucky about is that this gets caught in glib’s own test suite.

** (/tmp/nix-build-glib-2.72.2.drv-0/glib-2.72.2/build/glib/gtester:16380): WARNING **: 02:11:36.692: Failed to execute test binary: /tmp/nix-build-glib-2.72.2.drv-0/glib-2.72.2/build/glib/gtester: Failed to close file descriptor for child process (Operation not permitted)
ninja: build stopped: subcommand failed.
error: builder for '/tmp/nix/store/akrf00idnfzmbrhk3w7qyiydp6xxss6q-glib-2.72.2.drv' failed with exit code 1

Searching for that “Failed to close file descriptor for child process” message takes you almost directly to a close_range call. Very nice design on glib’s part :clap:.

Unlike what glibc has done with, for example, faccessat internally attempting to use faccessat2 and falling back to jankier logic if it’s not available, all glibc does is expose close_range directly. It’s up to the code calling close_range do handle ENOSYS. And thus the check for ENOSYS and the fallback code are in glib.

I worked around this by adding a patch to glib, again similar to Red Hat’s faccessat2 patch.

diff --git a/glib/gspawn.c b/glib/gspawn.c
index 0a2cbe5..c9fda00 100644
--- a/glib/gspawn.c
+++ b/glib/gspawn.c
@@ -1544,7 +1544,7 @@ safe_fdwalk_set_cloexec (int lowfd)
    * fall back to safe_fdwalk(). Handle EINVAL in case `CLOSE_RANGE_CLOEXEC`
    * is not supported. */
   int ret = close_range (lowfd, G_MAXUINT, CLOSE_RANGE_CLOEXEC);
-  if (ret == 0 || !(errno == ENOSYS || errno == EINVAL))
+  if (ret == 0 || !(errno == ENOSYS || errno == EPERM || errno == EINVAL))
     return ret;
 #endif  /* HAVE_CLOSE_RANGE */
   return safe_fdwalk (set_cloexec, GINT_TO_POINTER (lowfd));
@@ -1597,7 +1597,7 @@ safe_closefrom (int lowfd)
    * Handle ENOSYS in case it’s supported in libc but not the kernel; if so,
    * fall back to safe_fdwalk(). */
   int ret = close_range (lowfd, G_MAXUINT, 0);
-  if (ret == 0 || errno != ENOSYS)
+  if (ret == 0 || !(errno == ENOSYS || errno == EPERM))
     return ret;
 #endif  /* HAVE_CLOSE_RANGE */
   return safe_fdwalk (close_func, GINT_TO_POINTER (lowfd));

llvm working, but being a tense day

I saw someone else expressing the sentiment that nix after 2.3.x suddenly had a lot bigger build dependencies closure, but now I can’t find the link. Possibly related to this " Any consensus on documentation generation tool for nixpkgs manual? - Documentation - NixOS Discourse ." Anyway, that’s indeed the case, with nix 2.8.1 which ships in nixpkgs stable 22.05. For one thing, it now needs LLVM.

By the way, it needs LLVM for Rust, in order to build mdBook, in order to generate nix’s documentation files. It’s not even needed at runtime :laughing::sob:.

Building the llvm package actually works. But it takes about 9 hours 38 minutes (unboosted, /sys/fs/cgroup/cpu/cpu.cfs_quota_us:100000). That’s a tense time making sure you always come back to wiggle the mouse every half hour. If Glitch could please Warn us if our editor session will go idle

That would be great

Oh there’s also this really scary time during the tests where there’s no output and no CPU usage for several minutes. I have no idea what it’s doing during that time.

mailcap unpacking bug

I found an issue in a package called “mailcap,” which is more or less a file containing information about MIME types. The only building involved is to download a release archive and extract it. Due to that being so simple, the exact output is known, and changes to the script that nixpkgs uses to build this package won’t invalidate any caches of the finished package.

It thus went unnoticed when some changes to the build script really did cause the build to break. We only know because we’re building for a different nix store path, which causes nix not to use the cache of the normal “/nix/store” version that’s readily available online.

I reported this problem: mailcap hash change infinite recursion · Issue #176286 · NixOS/nixpkgs · GitHub (A maintainer later renamed the issue when the same problem uncovered a more pressing problem.)

Because the package has exactly known contents, I was able to work around this by downloading the “/nix/store” version from a public cache, extracting it, and importing it into our “/tmp/nix/store” store.

I believe the nixpkgs maintainers have since fixed the build issue, so we should no longer experience this in future nixpkgs releases.

python3.9-pytest-asyncio test failure

(This one came up when trying to build graphviz-nox.)

___________________________ test_unused_port_fixture ___________________________
tests/test_simple.py:47: in test_unused_port_fixture
    server1 = await asyncio.start_server(closer, host="localhost", port=unused_tcp_port)
/tmp/nix/store/pmcprpmmhn5crv44f2rwrfwirgis86yy-python3-3.9.13/lib/python3.9/asyncio/streams.py:94: in start_server
    return await loop.create_server(factory, host, port, **kwds)
/tmp/nix/store/pmcprpmmhn5crv44f2rwrfwirgis86yy-python3-3.9.13/lib/python3.9/asyncio/base_events.py:1506: in create_server
    raise OSError(err.errno, 'error while attempting '
E   OSError: [Errno 99] error while attempting to bind on address ('::1', 49314, 0, 0): cannot assign requested address

It looks like some of the tests need IPv6, and we don’t have that on Glitch.

I worked around this by building this package outside of Glitch.

Conclusion

Imagine if software made people happy.

7 Likes

Wow.​​​​​​​​​

1 Like