Explore the insides of Nix packages from the comfort of your browser

edit: this project is now deployed on GitHub Pages:
https://wh0.github.io/nix-cache-view/

and the source is here:

I originally posted this project url, which is where I was developing it on Glitch. I still use this workspace to develop this project, so it might be in a weird state sometimes.
https://trusted-friendly-sesame.glitch.me/

view source: Glitch :・゚✧

cross posted:

There’s this format that the Nix package manager uses for distributing packages online, in these .nar files. I wrote a client-side app to parse those package files so you can explore what’s inside them without having to set up nix and download the package (and all its dependencies :grimacing:). See the post linked above from the NixOS Discourse for instructions on how to use it, if you’re interested.

Nix also uses “derivation” files, which formally specify the instructions for building a package, and you can also distribute these online similar to the packages that they describe how to build. And they’re also in .nar files. I coded up an unglamorous webpage to view those as well, with hyperlinks going to various packages referenced in the build inputs, so you can thusly navigate the dependency tree.

Side node: The derivation files usually aren’t shared online. It’s more common to share a “flake,” which is a functional program that generates a bunch of derivations. However, due to the one giant flake representing all packages of the official “nixpkgs” distribution containing references to cryptocurrency mining software, the usual way of distributing the flake is not usable on Glitch. So I’m indeed transmitting derivation files, which allows me to pick and choose only specific non-cryptocurrency-mining-related build instructions. So it’s weird. And for that reason, I have not mentioned this second tool in my NixOS Discourse post :face_with_hand_over_mouth: .

That’s a brief overview of the project. It’s late as it always tends to be when I get around to posting on the forum. I’ll be back with some technical content on what it was like building this.

Last thread: Broadening Nix package builds to the NixOS "small" set

7 Likes

Wow…

1 Like

Alright: details about the journey to this point.

High level binary cache structure

It’s a set of files, mainly (1) .nar.xz files containing a compressed serialization of a package’s files (think like .tar.xz) and (2) .narinfo files containing some metadata about each package, such as its dependencies.

To get a package from a binary cache, you… well you normally use the Nix package manager. But if you were to do it manually, here’s how.

To start with, you identify it by a “hash,” which acts as a cryptographic summary of the source files and build instructions that built it. I’ll assume you already know the hash (somehow :person_shrugging:). You construct a URL with that hash to retrieve the .narinfo file, and that .narinfo file contains the URL where you can download the .nar.xz file.

Let’s go through an example to get a sense of what these URLs are like. To get the package with hash 7ghhnlwla2mddkg7hgqa5v0sr8g5hga8 (a build of the Nix package manager itself) from the cache https://cache.nixos.org, you download https://cache.nixos.org/7ghhnlwla2mddkg7hgqa5v0sr8g5hga8.narinfo, which is a plaintext file with the following (some parts removed to save space):

StorePath: /nix/store/7ghhnlwla2mddkg7hgqa5v0sr8g5hga8-nix-2.8.1
URL: nar/1rryqxbx49mdpk0bwk93r8nlzl75lw6af3kndpxx77qrzjdvpn5k.nar.xz
Compression: xz
FileHash: sha256:1rryqxbx49mdpk0bwk93r8nlzl75lw6af3kndpxx77qrzjdvpn5k
FileSize: 3281096
NarHash: sha256:1pmwslm3q78dps40hnb68zajv6kih2f7zq096n96f5cabhgq7xcq
NarSize: 16058632
References: (several, omitted)
Deriver: wy4lpx9qr8nla90y51219icx0969pvxn-nix-2.8.1.drv
Sig: cache.nixos.org-1:(some base 64 stuff)

You look at that URL field and then download the .nar.xz at https://cache.nixos.org/nar/1rryqxbx49mdpk0bwk93r8nlzl75lw6af3kndpxx77qrzjdvpn5k.nar.xz.

Then you can decompress that .nar.xz file and parse out the files.

The .narinfo and stuff

At this point I should mention that the official Nix cache https://cache.nixos.org is served with permissive CORS, and so is the cache I’m building (Building Nix packages and saving them in Glitch assets).

So with that, it’s pretty simple to download it with fetch and do various string operations to get the URL. I actually checked how the nix program itself does this, and it turns out to be as simple as you’d think. There’s no hidden support for commented out lines, multiline values, or the like. Phew!

It would look like you can do some string concatenation to get the .nar.xz URL from the URL field and the cache URL, but I must resist the urge. In my own binary cache, I had to upload some .nar.xz files to a different host (Building Nix packages and saving them in Glitch assets), so I need to support at least absolute URLs as well. (The nix program supports this too, thank goodness.)

It turns out there’s not too much fiddly work to be done to support these different cases of relative and absolute URLs. The URL constructor URL() - Web APIs | MDN works it out for you either way:

const narUrl = new URL(urlFromNarinfo, cacheBase).href;

In the next post I’ll talk about decompressing the .nar.xz file.

2 Likes

We’re at the point where we have a .nar.xz file downloaded from the cache and saved in memory. Now we need to decompress it.

Decompressing

I really took the scenic route getting this done.

I couldn’t find any JavaScript implementations of XZ decompression. There’s a library that uses Web Assembly though, GitHub - SteveSanderson/xzwasm: XZ decompression for the browser via WebAssembly. If you generally need to get an xz-compressed file, that library is probably your best bet.

Contrary to my own advice though, I ended up trying to build my own library from the same materials: the xz-embedded implementation XZ Embedded and wasi-sdk GitHub - WebAssembly/wasi-sdk: WASI-enabled WebAssembly C/C++ toolchain.

Doing our own stunts

It seemed doable: the xzwasm project showed that it took no patching to make xz-embedded build. And I was going to settle for a simpler non-streaming workflow. Moreover, we actually know the size of the decompressed .nar file beforehand, as it’s listed in the .narinfo:

NarSize: 16058632

Unfortunately, I wasn’t able to run the prebuilt wasi-sdk release on Glitch. I think it had to do with the glibc being too old. I ended up compiling it on a different computer.

It compiled fine. I’m using the built in dlmalloc allocator, resulting in a ~28 KB .wasm file. The xzwasm project claims to be under 8 KB–notably they use a more compact walloc. Good on them.

But wouldn’t it be fun to be able to build Web Assembly programs in Glitch? At this point, without any warning, I’ll be sharing a story from five years ago.

Digging up ancient history

I’ve written about some security-related findings on Glitch before, but my first such post Root access on Glitch (already fixed) actually wasn’t about the first privilege escalation vulnerability I found on Glitch.

The very first one I found was back in 2017. I was in contact with Fog Creek Customer Service about it, in case FC3186524. A part of the discussion went as follows:

(me) Is this [privilege escalation] a security concern? If not, it would mean a lot to me if I could just install packages through the package manager.
(support) Certainly, an authenticated user should not be able to run commands as root in the project container.
(support) If there’s a package you feel is missing from the Glitch container, please let me know and we’ll make sure we install it.

That Glitch would let me have a package added to the project container image. This was before I was a big show-off on the forums, so I had never brought this up, but I’ve always thought of it as one of my most prized non-tangible sort of possessions.

I know that Fog Creek has since been on a long journey, becoming Glitch and becoming part of Fastly. And I know some of the key people from back then no longer work there (:pray:). But I have to try. Today’s the day that I say the magic words.

 

Angelo, I feel that wasi-sdk is missing from the Glitch container.

 

Glitch, if there’s mean to be any sense of continuity, please consider this request :bowing_man:


In the next post, I’ll comment on the fate of the other technologies I mentioned in a post from a few days ago Community Open Thread 2 - September 16, 2022 - #11 by wh0 : web workers and the cache API.

4 Likes

Alright the fate of those two things—web workers and cache API—was that I ended up not using them ):

Web workers

Goal
Running the decompression Web Assembly code takes time for large .nar.xz files. It would be nice to run it without freezing the page.

Findings
It turned out to take about half a second to launch the web worker, compared to ~4 ms to run the decompression (for small files). I also tried launching a nearly empty web worker from a data: URI, but I didn’t see any measurements less than about 300 ms.

The additional memory copying to get inputs and outputs through postMessage was also spiritually harming me.

Conclusion
So I lost motivation, and now the decompression just runs in the main thread. We don’t give people anything to do on the page while it’s decompressing, so is there even much lost if the page freezes?

Cache API

Goal
Alright so if it was going to be a little painful to decompress the file, how about we cache the result? It might even happen that a user comes back to the same package twice to get different files from it or something.

Findings
Turns out the Cache API is just storage. It’s storage that you can pass a Response object with no fuss, but there’s no such functionality of managing frequently used records and discarding others, or similar. Here’s my post on that and some other things I didn’t like about it

https://wh0.github.io/2022/09/18/cache-storage.html

Conclusion
I didn’t want to store the decompressed .nar file outright, so I proceeded without using the Cache API and indeed without doing any caching.


So as a result, users will be waiting at a frozen page when it does the decompression. And it’ll decompress it each time. So the project gets a frowny face for today’s assessment.

But you know what, so does the web platform :slightly_frowning_face: .


In the next post, I’ll talk about what it was like to parse the long awaited .nar file.

5 Likes

Finally, we have a .nar file. Now let’s slice out the directory structure and files.

Understanding .nar

First, is there a specification of the format somewhere?

No wait, zeroth, is there already a library for this? Maybe. But I just want to note that it’s better not to go search for “nar” on npm at this time, because the predominant meaning is some unrelated tool for packaging node.js applications.

Ok so back to that specification. Yes: it is specified somewhere. In the author’s PhD thesis

Scroll yourself down to section 5.2.1 on page 90 :face_with_spiral_eyes: if you want to take a peek. I’m going to summarize it here though, as the target audience is people who haven’t worked with this format before.

One. Everything is 8-byte aligned. Numbers are stored as 64-bit little-endian. String data is padded out to a multiple of 8 bytes. Other structures are compositions of these, so they’re thus aligned too.

Two. Strings are stored as a number with the byte length, followed by the padded string data. Actually it’s not defined how they map to characters, so maybe think of them as byte arrays instead? And everything is made of strings.

Three. There’s a dictionary-like structure delimited by parentheses strings:

"(" key value key value")"

Keys are strings; values are strings or dictionaries. Duplicate keys can appear for when there can be multiple of something, e.g. multiple entries in a directory object.

Note: this isn’t written in the spec, I’ve only inferred it from common elements of various definitions.

Four. The representation of a filesystem tree is pretty much how you’d design it yourself if you were working with nestable dictionaries:

{
  type: 'directory',
  entry: {
    name: 'bin',
    node: {
      type: 'directory',
      entry: {
        name: 'nix',
        node: {
          type: 'regular',
          executable: '',
          contents: '\x7fELF...',
        },
      },
      entry: {
        name: 'nix-store',
        node: {
          type: 'symlink',
          target: 'nix',
          contents: '\x7fELF...',
... and closed by however many }'s it takes

(Visualized in a JS-like way)

Directories can have multiple entry fields to represent the multiple other files/directories within.

For regular files, there are no booleans in the data structure, so executable-ness uses an empty string. Non-executable files lack the executable field entirely.

Implementation

I used DataView and TextDecoder (this project assumes UTF-8). For file contents, it was easiest to use the new Uint8Array(buffer, start, length) to create a view of the .nar ArrayBuffer containing just the file contents.

And I parse it into a structure similar to the JS-like visualization above, except that executable is a boolean, and instead of repeated entry fields, it’s a entries field which has an array.

UI

:person_shrugging:

<details>
  <summary>n entr(ies)</summary>
  <dl>
    <dt>name</dt>
    <dd>node</dd>
  </dl>
</details>

You could probably do better.

And there was this problem, how do you let the user download a Uint8Array? I’m using URL.createObjectURL(new Blob([that])) and a plain old <a href="that">.


In the next post, I’ll talk about the “derivation” files mentioned way back in the first post. And the segue into that topic will be in that next post too because I’m too lazy do do it here.

1 Like

yo, small update on running wasi-sdk:

wasi-sdk actually used to be built to run on ubuntu 16.04, the version used in glitch project containers. version 12 of it Release wasi-sdk-12 · WebAssembly/wasi-sdk · GitHub, the last one built for 16.04, does work.

it’s little less polished though, giving an error about main not being defined, even when using --no-entry. I tried defining an empty main function, and it seems to compile fine after that. seems pretty un-disruptive of a workaround overall

2 Likes

We’re able to slice out a Uint8Array of a file within a .nar. In the last post, I described creating a web page that lets you explore and download those files. But there was one more thing I wanted to do with the innards of these .nar.xz files.

And that’s to get at the specifications inside “derivation” files.

Derivations in binary caches

I mentioned up in the first post:

Nix also uses “derivation” files, which formally specify the instructions for building a package, and you can also distribute these online similar to the packages that they describe how to build. And they’re also in .nar files.

Well now that we can locate a .nar.xz file in a binary cache, download it, decompress it, and extract the .drv file within, all we have to do is to parse it.

Side note before we go on to parsing it, I originally wrote

The derivation files usually aren’t shared online. It’s more common to share a “flake” …

It turns out they are available on cache.nixos.org. That being the main public cache, the contrary is true: the derivation files are usually shared online as well :roll_eyes: Behold: https://cache.nixos.org/wy4lpx9qr8nla90y51219icx0969pvxn.narinfo

Parsing a .drv

From that PhD thesis, section 5.4 on page 105,

The function printDrv returns a byte sequence that represents the store derivation.
The contents of the byte sequence is a textual ATerm. ATerms are a simple format
for the exchange of terms. …

I will not formalise the translation to ATerms here, which is straightforward.

Thanks. Not like I wasn’t just going to go over to npm anyway.

image

image

So I ended up looking at the source code https://github.com/NixOS/nix/blob/2.8.1/src/libstore/derivations.cc#L367 of the nix command line tool to learn about what’s in a .drv file. It writes out a file like this: (whitespace added, stuff shorened, etc.)

Derive(
  [
    ("debug","/nix/store/xi...h8-nix-2.8.1-debug","",""),
    ...
  ],
  [
    ("/nix/store/0g...aw-editline-1.17.1.drv",["dev"]),
    ...
  ],
  [
    "/nix/store/9k...5b-default-builder.sh",
    "/nix/store/ik...9g-separate-debug-info.sh"
  ],
  "x86_64-linux",
  "/nix/store/xb...il-bash-5.1-p16/bin/bash",
  ["-e","/nix/store/9k...5b-default-builder.sh"],
  [
    ("NIX_HARDENING_ENABLE","fortify ..."),
    ...
  ]
)

It’s a bunch of "strings" in nested [lists] and (tuples). And outside it all, there’s something that looks like a FunctionCall().

I was thinking it looked a lot like JSON. If we can make a few small changes, we can hack together a parser that mostly uses JSON.parse. Those changes would be:

  1. Change the function call to an array of the arguments: Derive(...)[...]
  2. Change tuples to arrays: (...)[...]

And that seemed pretty doable.

If you’re up for a programming puzzle, see if you can come up with a plan for how to do it. Assume the following:

  1. Everything’s syntactically correct.
  2. There’s only one function call, that Derive(...) on the very outside.
  3. Strings are escaped like a subset of JSON, with \" quotes, \\ backslashes, \n newlines, and a few others that are no more complicated.

In the next post, I’ll talk about the approach I used. For a spoiler, see that post I made in community open thread #2 :shushing_face:

6 Likes

Parsing a .drv hackily

To recap, here’s what we needed to do, before we can throw the .drv into JSON.parse and call it a day:

  1. Change the function call to an array of the arguments: Derive(...)[...]
  2. Change tuples to arrays: (...)[...]

That’s stated in an unnecessarily hard way though. It speaks of the function calls and tuples. You may recognize this as a task of parsing a context-free language. Locating tuples in an arbitrarily nestable language, i.e. finding matching pairs, requires a stack. Incidentally, it turns out that .drv files have bounded nesting depth, which makes things easier. But there’s easier.

We don’t have to find matching pairs. We’re replacing all parentheses, so we can be rather indiscriminate. It suffices to do the following:

  1. Remove the word Derive
  2. Replace every ( with [
  3. Replace every ) with ]
  4. Except don’t mess with strings

Formal language theory would have it that this is easier, as you can walk through the tokens alone—without paying attention to the nesting structure—as a regular language. And we have a fine regular expression engine in JS. Let’s take a look.

The bit about not messing with strings is tricky.

The trouble is trying to write a regular expression to match, for example, one of the (s to replace. You make sure it’s not preceded by a quote, unless there was a quote some time before that one… Well they open and close, so there just has to be an even number, right? Is there maybe a variant of * that matches only even numbers? I think I read something about detecting prime numbers in unary before :thinking: And all this is with the exception of escaped quotes, escaped quotes are preceded by a backslash \", which we can check for. Unless that backslash is the second character of an escaped backslash \\", so better look two behind. Oh but if there’s an escaped backslash followed by an escaped quote \\\" then that doesn’t count … It gets complicated.

But it’s all possible, right? You can picture the state machine even.

source

digraph {
rankdir=LR
start [shape=none]
not_in_string [peripheries=2]
start → not_in_string
not_in_string → not_in_string [label=“Derive, (, ), [, ], comma”]
not_in_string → in_string [label=“"”]
in_string → not_in_string [label=“"”]
in_string → escape [label=“\”]
escape → in_string [label=“\, ", n, r, t”]
}

Uh hold on a second.

Do they really …

… use such exotic parentheses etc inside strings? I mean, we’re talking about package names, file paths, version numbers, etc.

Yes.

nixpkgs routinely puts some entire shell scripts into the environment variables. It’s how they can have one general purpose “default builder” script that’s flexible enough, with hooks for doing package-specific stuff at certain steps.

Okay fine back to the regex

Here’s where a technique comes in handy, which a website on it describes as “the best regex trick.” And that technique is, as I would summarize it:

  1. Proactively match the things you don’t care about.
  2. If all those irrelevant things can’t match, it’s gotta be something you care about.
  3. Use a capturing group to tell if the thing you care about matched.

Here’s that trick in action. We can write an expression to match strings, and if it can’t do that, then match stuff we care about:

"(?:string innards)*"|(Derive)|(\()|(\))

And for string innards, we use the same trick: we have expressions for unspecial characters and various escaped characters, and if those don’t match, then we allow the next thing to be the closing quote:

[^"\\]|\\.

Putting this all together:

const drvMunged = drvText.replace(/"(?:[^"\\]|\\.)*"|(Derive)|(\()|(\))/g, (orig, derive, lparen, rparen) => {
  if (derive) return '';
  if (lparen) return '[';
  if (rparen) return ']';
  return orig;
});

And then yeah, we JSON.parse it and put the bits into an HTML page. Phew!


That’s it leading up to this basic implementation.

Someone from the NixOS community suggested that I put this up in a Git repo (ya Glitch projects can be accessed over Git, but I think you and I both know what they meant), so here it is.

I’ll be opening some issues describing future plans in the coming days. Stop by later if you’re looking for stuff to do for Hacktoberfest.

6 Likes

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.