Actually building Nix packages on Glitch

last thread: Explore the insides of Nix packages from the comfort of your browser

I’m quite late on the release of NixOS 22.11, but I’ve built the packages in the “small” distribution for use on Glitch. A really cool thing about this time is that I actually built it on Glitch itself. This post is about that and a few other neat things along the way.

Actually building on Glitch

There had been a few things that didn’t compile on Glitch. Previously I had built those outsidef of Glitch, and they would run fine on Glitch (mostly). But now that I had some experience configuring custom patches, I went ahead and made it so that I could build these on Glitch as well.

AWS SDKs

Background. These are used in Nix, because Nix supports uploading to AWS S3.

Problem. These had, as part of the build process, some tests that would try to set a thread’s CPU affinity. The calls to do that would fail with EPERM on Glitch.

Resolution. I added some a patch to disable the CPU affinity setting in those tests. These packages now build. You still can’t use it to set CPU affinities, but Nix doesn’t do that, so we’re fine.

libredirect

Background. This is Nixpkgs’s internal LD_PRELOAD shim for simulating remapping parts of the filesystem without needing root privileges. It’s needed as a build dependency of OpenSSH.

Problem. libredirect is meant to support the system function, which internally runs /bin/sh. However, on Glitch, /bin/sh is the old one from our Ubuntu 16.04 containers, which uses a glibc incompatible with the one that libredirect itself is built with. The result is that our container’s loader tries to put this newfangled library with the older glibc in the /bin/sh process, and that fails. I searched around, and there don’t seem to be many uses of libredirect, and among those, I’m not using any software that uses with system, so it seems okay to go without it. However, the tests (again, part of the build process) try to exercise it, causing the build to fail.

Resolution. I’ve added a patch to skip the tests that use system. It now builds, but we can’t use system when using libredirect. Additionally, using basically any other “host” programs (i.e. programs that came with the project container, rather than the newer software managed by Nix) will cause the same problem. Usually it is /bin/sh. The Nixpkgs maintainers have been pretty good about trying not to use host software. But /bin/sh continues to be a pain point: The hermeticity of /bin/sh · Issue #6081 · NixOS/nix · GitHub.

OpenSSH

Problem. The build process for this package uses libredirect. Lots of the tests in the test suite (again, tests are part of the build process) use /bin/sh as the interpreter. Nixpkgs usually fixes up executable scripts by rewriting the “shebang” line at the top to use a specific Nix store path. However, these are just in tests, so they don’t get rewritten. This causes the sh from the project container to try to run under libredirect, which crashes.

Resolution. I added an overlay to append a step to the preCheck script in the derivation to rewrite the test scripts to use the right shell from the build dependencies. If someone happens to know what they’re doing, if you could see if I’m doing it right, that would be great :pray:.

    preCheck = old.preCheck + ''
      substituteInPlace regress/*.sh \
        --replace '#!/bin/sh' '#!${self.stdenv.shell}'
    '';

Problem. Unfortunately, in some deep corners of the test suite, they use egrep and fgrep. In GNU grep, these are shell scripts that exec grep -E and grep -F respectively. And in Nixpkgs’s distribution, they’re rewritten to use specifically /bin/sh as the interpreter gnugrep: Fix bootstrap-tools reference · NixOS/nixpkgs@a1d9c56 · GitHub. (Seems to have something to do with grep being needed early in the toolchain? I think it’s used to build the shell itself. So thus the only interpreter it could legitimately reference is an earlier phase’s shell. And I suppose people don’t want to keep that early-phase shell around in an installed system.) That /bin/sh from the project container can’t handle libredirect. As a result, trying to run egrep or fgrep under libredirect crashes.

Resolution. I added a patch to use grep -E and grep -F instead, respectively. With that, we avoid those crashes. And we only use libredirect during the test, so we shouldn’t encounter any weirdness when using the built package.

Problem. There’s this long standing issue with trying to run the SSH server as a non-root user, where OpenSSH tries to chown a pseudoterminal file that it creates. That’s why, by the way, Snail’s SSH snail ssh proxy — Snail can’t start a tty session. I recall there was a test for it too, and that fails.

Resolution. It turns out OpenSSH has some exceptions where it’s willing to give up on chown-ing/chmod-ing the /dev/pts/xx file openssh-portable/sshpty.c at V_9_1_P1 · openssh/openssh-portable · GitHub : that it failed because the filesystem is readonly, and that the current ownership is reasonably sane. It also turns out that we already met that ‘current ownership is reasonably sane’ criteria. I consider having no permission to be similar enough to the read only filesystem scenario, so I added a patch to expand the exception to include when we don’t have permission to do the chown-ing/chmod-ing. With that, tty allocation succeeds. I wouldn’t mind seeing this patch upstreamed! And with this second patch, OpenSSH now builds okay Glitch.

elfutils

Problem. elfutils includes some utilities for examining running processes. These rely on ptrace, which we aren’t allowed to use on Glitch. By the way,

I am once again asking for ptrace on Glitch.

There are tests for these utilities (and again, tests are included in the build process). So the build fails.

Resolution. I’ve added a patch to skip these tests. Oddly memorable, you do that by adding an exit 77 line in the tests’ shell scripts. With that, elfutils builds on Glitch. Obviously you won’t be able to use those tools though. But we don’t, so yay.

psutil (Python)

Background. It gets stuff like what’s running, what’s using the CPU, and info like that. I forget where this is in the dependency tree.

Problem. There’s a (very reasonable, IMO) test that makes sure that when you ask for the per-CPU usage stats, it will give you the same number of stats as when you ask it how many CPUs there are. Buuuuuut apparently nobody told the Glitch project container that it should do that, so:

$ ls -ld /sys/devices/system/cpu/cpu*
drwxr-xr-x 5 nobody nogroup 0 Mar 11 05:17 /sys/devices/system/cpu/cpu0
drwxr-xr-x 5 nobody nogroup 0 Mar 11 05:17 /sys/devices/system/cpu/cpu1
drwxr-xr-x 5 nobody nogroup 0 Mar 11 05:17 /sys/devices/system/cpu/cpu2
drwxr-xr-x 5 nobody nogroup 0 Mar 11 05:17 /sys/devices/system/cpu/cpu3
... (four CPUs)
$ cat /proc/stat
cpu  127534558 260746675 123066597 623338256 868652 0 13165171 3480 0 0
cpu0 45460379 84252698 41219229 207262759 288906 0 4222751 1168 0 0
cpu1 36692422 91793351 42164118 207109486 303190 0 4763176 1156 0 0
cpu2 45381757 84700626 39683250 208966011 276556 0 4179244 1156 0 0
... (three CPUs)

And the library sees that, and it gives you different numbers of CPUs, and that causes the test to fail (and again, the tests are part of the build process).

Resolution. I added a patch to let the test pass even when there are fewer stats than CPUs. I have no idea why this happens. I have no idea if any downstream software will explode because of this. But the psutil library itself isn’t going to make it any worse than if you actually looked in sysfs and procfs manually, so eh.

My overlays and patches

You can see my “overlays” for Nixpkgs, which is how you configure it to use custom patches and other tweaks, and my patches in this project.


In the next post, I’ll talk about making it easier to build multiple packages in parallel. (The “1” in the above project link kind of gives away the gist of it though :sweat_smile:.)

5 Likes

Other build problems from 22.05 and before

A quick addendum on the last post: weren’t there a few other packages that couldn’t be built on Glitch back when I was building the 22.05 release Things encountered while building nixpkgs 22.05? Yes. But for some reason they build now.

  • Git wouldn’t build due to some exotic malloc-instrumenting LD_PRELOAD that it wanted to do in its test suite (did I remember to say? the tests are part of the build process), which somehow didn’t work right. I looked through the recent changes in Git’s source code and Nixpkgs, but I didn’t find anything that would have fixed the problem. Yet it built fine this time. This is going to be one of those things that I never figure out. Ugh!
  • pytest-asyncio had a test that (almost forgot to say, the tests are part of the build process) would try to listen on an IPv6 address. Well somehow that wasn’t a problem this time. Maybe something fixed something? Eh.

Cloneable builders

As you may have noticed, the link in the previous post goes to a project called “tmpnix-cb1.” That “1” is there because there’s now more than one project doing the building.

Deploying files with secrets

Previously I had split up the build process into a big multi-project system of derivation instantiator, builder, and uploader. That made some things complicated, because the various parts needed all sorts of credentials to authenticate with each other: Nix signing keys, netrc files, and Snail persistent token files. These are all in the .data directory so that remixes (they’re public projects, after all) don’t get copies.

As a result, it used to take many steps to set up another project to use as a builder, creating multiple various files in .data in the project terminal.

I’ve simplified the process of setting up the various secret-containing files by shipping an encrypted tarball containing all of those files I need in .data. Now, there’s still some manual setup, but it’s all in one step with much fewer switching around tabs. All I do is decrypt and extract the big file with a password that’s saved in the main project’s .env file.

Saving a password in .env

I needed a place to put it so that I could easily copy it from there. .env seems nice, because it shows up in the editor. A downside though is that it puts it into the environment of all your processes, which seems unnecessarily risky. The processes don’t need to know the password.

I’m working around this “things in .env actually go into the environment” issue by doing this in .env:

SECRETS_PASSPHRASE=...
# don't actually tell every process lol
SECRETS_PASSPHRASE=1

Thus processes only see SECRETS_PASSPHRASE=1. Unless they specifically go looking in .env, but I can’t do anything about that, haha.

Deploying dotfiles

There are some dotfiles that are usually ignored, but which I want to replicate in these cloneable builders. So far, that’s these:

  • .config/nix: Nix config
  • .config/nixpkgs: my Nixpkgs overlays
  • .emacs: my precious editor settings lol. Mostly to tell Emacs to use a dark theme. It boggles the mind that it defaults to using a light theme when running in a terminal. And “using a light theme” here doesn’t even mean drawing dark text on a light background. It simply uses dark colors on what is still a black terminal background :new_moon_with_face:
  • .gitconfig: it’s got my username and command aliases :sunglasses:. Note that this is my “global” Git config as a dotfile in the app user’s home directory, distinct from .git/config.
  • .gitignore: this is the file that’s going to be programming the inclusion of these dotfiles, so
  • .profile: needed for setting up a Nix environment. Basically it puts the stuff installed into your “profile” on your PATH. Possibly a few other things, I should learn what.

.emacs, .gitconfig, and .gitignore are tracked by default. so there’s no issue including them.

.profile you can put a !/.profile in your .gitignore easily enough.

There’s a trick to un-ignoring .config/nix and .config/nixpkgs. /etc/gitignore-global includes .config as the whole directory. The way Git works, that makes it not explore the directory at all, so even if you put !/.config/nix in your .gitignore, it still wouldn’t pick it up.

You have to un-ignore the directory (!/.config), ignore everything else inside it (./config/*), and then unignore the things you want to track (!/.config/nix, !/.config/nixpkgs).

Weird thing I encountered while setting up dotfiles’ ignored-ness: Glitch by default includes .gnupg in remixes. So if you’ve generated a GPG key on Glitch (in a public project), uh-oh.

You can kind of make these files visible in the editor too. Un-ignoring them doesn’t cause them to be included in the editor, but what I’ve done is to make the dotfiles symlinks to other file names that would normally show up in the editor. For example:

$ ls -al .config
total 4
...
lrwxrwxrwx  1 app app   28 Mar  6 05:55 nix -> ../tmpnix/visible.config/nix
...

Screen Shot 2023-03-11 at 12.19.04

Coordinating which project builds what

Nix has its own system for running builds remotely. It’s done over SSH. Nix has a little protocol to send over the build dependencies, tell the remote builder to build, and send back the build outputs. But I don’t have all the pieces in place to use it. I have some experiments connecting to Glitch projects over SSH Connecting via SSH, editing a project in VS Code. For one thing, I haven’t gotten it all set up here. Besides that, there’s this lurking evil of sending sizeable build dependencies and outputs over the Glitch terminal, which uh… does not have backpressure Experience report: building an administrative connection to Glitch - #5 by wh0. Oh and there’s the obvious problem of Glitch project containers going to sleep when the editor isn’t open. So I’m not doing this coordination in the Nix remote build way.

Currently I’m doing this:

  1. Have the first builder try to build everything (nix-store -rk ...).
  2. When it gets to a big package that I know takes a long time to build, I interrupt that build (it suffices to kill the running make or whatever).
  3. The -k in the nix-store -r command causes the first builder to carry on building other packages that it can build.
  4. I manually (lol) open up a second builder and run a command to build specifically that big package. The build dependencies for this package are ready, because the first builder tried to build it. Thus the second builder only has to build this one package.
  5. The second builder finishes and uploads the outputs to the shared cache.
  6. Eventually the first builder finishes everything it can do and reports that building the big package failed.
  7. I rerun the command to build everything on the first builder, and it discovers that the big package is available from the cache. So it continues building other things that depend on that big package.

And there are enough known “big” packages to get good time savings this way:

  • gcc (~2.5h)
  • ghc (~3h)
  • llvm (~6.5h)
  • clang (~4h)
  • rustc (~2.5h)
  • openjdk-headless (~1h)
  • nodejs (~6.5h)

But it’s kind of a pain to be switching through multiple editor tabs to wiggle the mouse around in each of them.

dramatization of me keeping more than one Glitch editor open


In the next post, I’ll talk about re-bootstrapping the whole build.

Re-bootstrapping the whole build

There’s a slightly creepy thing about this build of Nix, from the 22.11 release. It was built under the supervision of the Nix from the 22.05 release that I built. And that one was built under some version I built from Nixpkgs unstable. And that one from unstable was built under an earlier one also from Nixpkgs unstable. And that one… the origin of that one was kind of weird. I think I had built it on a virtual machine somewhere on a cloud provider, under a stock Nix installed using whatever curl to sh thing they had on their website at the time. It’s all kind of like a tall stack of blocks, and I wonder if that’s getting too high.

Supposedly Nix has a focus on reproducible builds, but we’re cutting a lot of corners in how we’re building it. One big thing is that we don’t do our builds in a “sandbox”—Nix normally sets up some isolation to make sure everything even outside the Nix store looks the same when running a build. We don’t, because Glitch doesn’t support the kernel features needed for it, namely user namespaces. I’m not sure if there are any other things that could make our builds less reproducible. I do have a bunch of custom patches in very central parts of the software stack. But those should apply equally to any build, hm.

For a while I’ve been trying to reproduce the build of Nix without using earlier versions that I built myself.

nixStatic

The NixOS project builds a statically linked version of Nix Hydra - nixos:release-22.11:nixpkgs.nixStatic.x86_64-linux. My previous attempts to use this had run into weird issues, like Nixpkgs being somehow incompatible, or the NIX_BIN_DIR environment variable not working for whatever reason. But all that stuff seems to have been sorted out at this moment.

You have to set the NIX_BIN_DIR environment variable because some parts of Nix try to run other parts of itself, and it thinks it can do that by going where it thinks it’s installed, somewhere in /nix/store/...-nix-static-.../bin. With this, you can unpack it somewhere in /tmp and tell Nix where to find its other programs. But wait, unpacking it—how do you do that?

Getting the executable from a .nar file

The NixOS project does built this statically compiled version of Nix, but it provides it in a compressed .nar file. .nar is an archive format (like .tar) and, well, you’re supposed to use Nix to extract it. Oh no… we need Nix to unpack Nix?

a pair of scissors in clamshell packaging, which you'd need some scissors to open

There’s a way out of this. The NixOS project also publishes a “.ls” file of the package https://cache.nixos.org/4l4mjrsmsx35d00l0062k126d9j52ncg.ls.

"nix":{"type":"regular","size":24218048,"executable":true,"narOffset":400}

You see that? that narOffset and size are actual numbers directing where in the .nar file you find this nix program. You could take an X-Acto knife and cut out exactly that range of the .nar and get the program.

curl ....nar.xz | \
  xz -d | \
  dd \
    iflag=count_bytes,skip_bytes \
    bs=4k \
    count=24218048 \
    skip=400 \
    of=/tmp/nix-static/nix

While that .ls file looks easily parseable as JSON, when I tried to automate this process, I ran into an annoying problem. Their S3-powered cache serves it Brotli-compressed, and whatever version of curl we have on Glitch can’t decompress it for us. And if you try to ask the server to use a different compression algorithm, they ignore you and send you the Brotli version anyway. Ugh. So for now, I’ve done it manually and hardcoded it.

(It has come to my attention that Node.js has a module that can decompress Brotli. Will investigate.)

Instantiating our derivations

The first step once we get Nix running is to download the Nixpkgs “channel” and “instantiate” a bunch of “expressions” into “derivations.” But to make sure we get the same derivations as we had been building before, we need to make sure we have the same “overlays.”

Glossary for the above:

Term Vague description
channel a repository of build instructions, containing functional programming expressions
instantiate evaluate a bunch of functional programs to generate actual build instructions from channel source code
expression piece of functional programming code, in particular one that ought to evaluate into build instructions
derivation the actual build instructions for a package
overlay a declaration of tweaks and custom patches etc. that you apply to a channel

So I needed a way to ship the set of patches I had going over to this new clean Glitch project. See this other thread Dump files into a self-extracting shell script for a neat thing that came from me doing that.

After this step, I was able to confirm that the generated derivations were the same as what I was working with before.

Actually rebuilding stuff

And the second step is to build packages from those derivations.

I ran the build for a little while, compiling a few packages, but I never actually finished, nor compared the newly rebuilt packages with the ones I had in my cache from before :sleeping_bed:. But it was neat just getting this far along.


Tell me if they ever make computers easy again.

2 Likes

The actual code for re-bootstrapping the whole build

I had written all that in the last post and never even posted a link to the project.

Here you go.

Node.js having Brotli built in

It does! Behold:

  curl -sfo "/tmp/$out_hash.ls.br" "https://cache.nixos.org/$out_hash.ls"
  node -e "process.stdin.pipe(require('zlib').createBrotliDecompress()).pipe(process.stdout)" <"/tmp/$out_hash.ls.br" >"/tmp/$out_hash.ls"

Running the rebuild

I’m trying outside of Glitch so that it can run faster. Also I have to figure out how to compare the build outputs afterwards.

1 Like

Since it sounds like you have nix-store already, you can unpack .nar files to a local dir using nix-store --restore, did that not work?

In some sense we do have Nix, including nix-store, already. And that unpacks things just fine. It was the Nix I built myself though. The goal of re-bootstrapping was to simulate getting started without earlier iterations of this “get Nix on Glitch” project.

Welcome back to the support/soon-to-be-community forum by the way!

2 Likes

Cheers! It’s been a while >_>;

For a clean bootstrap, you should be able to run curl -L https://nixos.org/nix/install | sh (As non-root user, which is pretty nice) to at least get a working nix-store command, if that helps at all.

I think you’re mistaken about that working as a non-root user. In the usual case, including in a Glitch project container, /nix doesn’t exist, so the installer fails right away.

Valid suggestion if it’s about reproducing the build outside of Glitch though :+1:

Ah, right. The absence of sudo means you won’t be able to install to /nix at any point - there are instructions on using nix-user-chroot so that you can install nix in ~/nix instead, but that requires cargo (or rustup), neither of which will get you much further. That said, if you can create a precompiled ubuntu nix-user-chroot binary, that might be an easier path forward.

(mind you, that’s more than I tried to do =)

nix-user-chroot isn’t viable on glitch. from the installation guide:

[nix-user-chroot] also requires user namespaces to be enabled on the system.

which in project containers, they’re not enabled.

It seems like dropbear is also affected by that.

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.