last thread: Explore the insides of Nix packages from the comfort of your browser
I’m quite late on the release of NixOS 22.11, but I’ve built the packages in the “small” distribution for use on Glitch. A really cool thing about this time is that I actually built it on Glitch itself. This post is about that and a few other neat things along the way.
Actually building on Glitch
There had been a few things that didn’t compile on Glitch. Previously I had built those outsidef of Glitch, and they would run fine on Glitch (mostly). But now that I had some experience configuring custom patches, I went ahead and made it so that I could build these on Glitch as well.
AWS SDKs
Background. These are used in Nix, because Nix supports uploading to AWS S3.
Problem. These had, as part of the build process, some tests that would try to set a thread’s CPU affinity. The calls to do that would fail with EPERM on Glitch.
Resolution. I added some a patch to disable the CPU affinity setting in those tests. These packages now build. You still can’t use it to set CPU affinities, but Nix doesn’t do that, so we’re fine.
libredirect
Background. This is Nixpkgs’s internal LD_PRELOAD shim for simulating remapping parts of the filesystem without needing root privileges. It’s needed as a build dependency of OpenSSH.
Problem. libredirect is meant to support the system
function, which internally runs /bin/sh
. However, on Glitch, /bin/sh
is the old one from our Ubuntu 16.04 containers, which uses a glibc incompatible with the one that libredirect itself is built with. The result is that our container’s loader tries to put this newfangled library with the older glibc in the /bin/sh process, and that fails. I searched around, and there don’t seem to be many uses of libredirect, and among those, I’m not using any software that uses with system,
so it seems okay to go without it. However, the tests (again, part of the build process) try to exercise it, causing the build to fail.
Resolution. I’ve added a patch to skip the tests that use system
. It now builds, but we can’t use system
when using libredirect. Additionally, using basically any other “host” programs (i.e. programs that came with the project container, rather than the newer software managed by Nix) will cause the same problem. Usually it is /bin/sh. The Nixpkgs maintainers have been pretty good about trying not to use host software. But /bin/sh continues to be a pain point: The hermeticity of /bin/sh · Issue #6081 · NixOS/nix · GitHub.
OpenSSH
Problem. The build process for this package uses libredirect. Lots of the tests in the test suite (again, tests are part of the build process) use /bin/sh
as the interpreter. Nixpkgs usually fixes up executable scripts by rewriting the “shebang” line at the top to use a specific Nix store path. However, these are just in tests, so they don’t get rewritten. This causes the sh
from the project container to try to run under libredirect, which crashes.
Resolution. I added an overlay to append a step to the preCheck
script in the derivation to rewrite the test scripts to use the right shell from the build dependencies. If someone happens to know what they’re doing, if you could see if I’m doing it right, that would be great .
preCheck = old.preCheck + ''
substituteInPlace regress/*.sh \
--replace '#!/bin/sh' '#!${self.stdenv.shell}'
'';
Problem. Unfortunately, in some deep corners of the test suite, they use egrep
and fgrep
. In GNU grep, these are shell scripts that exec grep -E
and grep -F
respectively. And in Nixpkgs’s distribution, they’re rewritten to use specifically /bin/sh
as the interpreter gnugrep: Fix bootstrap-tools reference · NixOS/nixpkgs@a1d9c56 · GitHub. (Seems to have something to do with grep being needed early in the toolchain? I think it’s used to build the shell itself. So thus the only interpreter it could legitimately reference is an earlier phase’s shell. And I suppose people don’t want to keep that early-phase shell around in an installed system.) That /bin/sh
from the project container can’t handle libredirect. As a result, trying to run egrep
or fgrep
under libredirect crashes.
Resolution. I added a patch to use grep -E
and grep -F
instead, respectively. With that, we avoid those crashes. And we only use libredirect during the test, so we shouldn’t encounter any weirdness when using the built package.
Problem. There’s this long standing issue with trying to run the SSH server as a non-root user, where OpenSSH tries to chown
a pseudoterminal file that it creates. That’s why, by the way, Snail’s SSH snail ssh proxy — Snail can’t start a tty session. I recall there was a test for it too, and that fails.
Resolution. It turns out OpenSSH has some exceptions where it’s willing to give up on chown
-ing/chmod
-ing the /dev/pts/xx file openssh-portable/sshpty.c at V_9_1_P1 · openssh/openssh-portable · GitHub : that it failed because the filesystem is readonly, and that the current ownership is reasonably sane. It also turns out that we already met that ‘current ownership is reasonably sane’ criteria. I consider having no permission to be similar enough to the read only filesystem scenario, so I added a patch to expand the exception to include when we don’t have permission to do the chown
-ing/chmod
-ing. With that, tty allocation succeeds. I wouldn’t mind seeing this patch upstreamed! And with this second patch, OpenSSH now builds okay Glitch.
elfutils
Problem. elfutils includes some utilities for examining running processes. These rely on ptrace, which we aren’t allowed to use on Glitch. By the way,
There are tests for these utilities (and again, tests are included in the build process). So the build fails.
Resolution. I’ve added a patch to skip these tests. Oddly memorable, you do that by adding an exit 77
line in the tests’ shell scripts. With that, elfutils builds on Glitch. Obviously you won’t be able to use those tools though. But we don’t, so yay.
psutil (Python)
Background. It gets stuff like what’s running, what’s using the CPU, and info like that. I forget where this is in the dependency tree.
Problem. There’s a (very reasonable, IMO) test that makes sure that when you ask for the per-CPU usage stats, it will give you the same number of stats as when you ask it how many CPUs there are. Buuuuuut apparently nobody told the Glitch project container that it should do that, so:
$ ls -ld /sys/devices/system/cpu/cpu*
drwxr-xr-x 5 nobody nogroup 0 Mar 11 05:17 /sys/devices/system/cpu/cpu0
drwxr-xr-x 5 nobody nogroup 0 Mar 11 05:17 /sys/devices/system/cpu/cpu1
drwxr-xr-x 5 nobody nogroup 0 Mar 11 05:17 /sys/devices/system/cpu/cpu2
drwxr-xr-x 5 nobody nogroup 0 Mar 11 05:17 /sys/devices/system/cpu/cpu3
... (four CPUs)
$ cat /proc/stat
cpu 127534558 260746675 123066597 623338256 868652 0 13165171 3480 0 0
cpu0 45460379 84252698 41219229 207262759 288906 0 4222751 1168 0 0
cpu1 36692422 91793351 42164118 207109486 303190 0 4763176 1156 0 0
cpu2 45381757 84700626 39683250 208966011 276556 0 4179244 1156 0 0
... (three CPUs)
And the library sees that, and it gives you different numbers of CPUs, and that causes the test to fail (and again, the tests are part of the build process).
Resolution. I added a patch to let the test pass even when there are fewer stats than CPUs. I have no idea why this happens. I have no idea if any downstream software will explode because of this. But the psutil library itself isn’t going to make it any worse than if you actually looked in sysfs and procfs manually, so eh.
My overlays and patches
You can see my “overlays” for Nixpkgs, which is how you configure it to use custom patches and other tweaks, and my patches in this project.
In the next post, I’ll talk about making it easier to build multiple packages in parallel. (The “1” in the above project link kind of gives away the gist of it though .)