Experience report: building an administrative connection to Glitch

The connection loss problem: no backpressure

socket.io has a debug logging system where you set an environment variable to have it print out what’s going on internally. I tried running a long upload with this enabled. Here’s what showed up:

  1. Many large chunks of the file would be ‘sent’ very quickly.
  2. 25 seconds into the program, the socket.io client itself would send a ‘ping’ message.
  3. 5 seconds after that, the client gets mad that it doesn’t receive a ‘pong’ message.
  4. The client determines that the connection is unresponsive and disconnects it.
  5. The program exits abnormally because it can’t handle the connection loss.

The local side reads the large-ish file pretty quickly. I had been running this experiment a few times, and the whole file was probably in cache. The code to bring it over to socket.io was simple. It was something like this:

const socket = io(...);
const src = fs.createReadStream(...);
src.on('data', (chunk) => {
  io.emit('input', whateverEncodingThereWas(chunk));

There was no backpressure. We would give this poor socket a huge amount of data more or less all at once. Some layer below that obviously wouldn’t be able to send it over the network instantaneously, so that data would be waiting in line somewhere.

Anyway, later, on an interval, socket.io would try to send a ping to check if the connection still works. That ‘ping’ message gets in the back of the line, but it’s so far back it won’t stand a chance. The five seconds that the client is willing to wait for a pong will pass, and the line will have only inched forward.

Remember that time we were looking down on those other options for reading the whole file into memory? That’s us right now. And we’re not just wasting memory, the deluge is starving out another job that’s on a deadline. We have some new problems.

Addendum to problems to solve

  1. Uncontrolled reading will make memory usage grow.
  2. Uncontrolled writing will starve out the pinging and kill our connection.

So it looks like we’ll need to get backpressure working.

But you couldn’t do the usual Node.js thing where you check if write (or emit in this case) returns false and wait for a drain event. socket.io Client API | Socket.IO doesn’t provide you with that information. The next level down, called engine.io, has a drain event, but I found out that engine.io-client/websocket.js at 3.5.0 · socketio/engine.io-client · GitHub it’s faked for the web socket transport. Reaching into even lower levels of abstraction seemed too complicated, because one of engine.io's jobs is to switch around between transports as needed.

Idea 7: Acknowledge each chunk at the application layer

Increase the complexity by running a custom program in the Glitch project that receives data chunk-by-chunk and sends back a little something to acknowledge that it received each one. Also increase the complexity by having our program watch for these acknowledgements.

And we can’t send just one chunk at a time. This sort of ‘stop and wait’ design was one of the issues with XMODEM XMODEM - Wikipedia that later protocols notably improved upon. For performance, we’d need to count out a ‘window’ of chunks that are out at the same time.

What went wrong: I couldn’t settle on the perfect window size. The window needs to be large enough that the network always has something to send. The window needs to be small enough that when a ‘ping’ gets in the back of the line, it can get to the front, get sent, and have a reply come back in time. And that all depends on the network connection, and we’re building on a library that’s not transparent about that stuff.

From backpressure to any-irrelevant-message

When I was experimenting with some different window sizes on the last prototype, I tried larger and larger window sizes. I even tried a size so large that the entire ‘long upload’ file could be sent without any waiting. It seemed like it ought to have died for the same starvation reason as before, but these transfers succeeded.

I found out that the client wasn’t waiting for a ‘pong’ message per se–engine.io-client/socket.js at 3.5.0 · socketio/engine.io-client · GitHub “any packet counts.”

Idea 8: ...; dd count=1234 status=progress | ...

If all we needed to keep our connection from dying was for there to be some messages flowing the other direction, we could just use a ‘noisier’ off-the-shelf program in our pipeline. This one replaces head with dd in a mode where it shows a little progress line.

What went wrong: Nothing. Well, except for the memory growing as it reads the file faster than it can send. Nothing new.

Giving up on the detail

I decided not to solve problem (5). I had been working extensively with prototypes for the upload direction, and it would be a problem in the download direction as well. The code on the Glitch project side wetty/term.ts at 496db5e5632517052fb9abaeddd5ee769e77e296 · etamponi/wetty · GitHub for putting data into the socket was also “simple,” as I had said of my own code. It would take yet more complexity to have such a custom program on the Glitch project side to watch for acknowledgements. Heck, maybe I wouldn’t even end up using this tool to transfer that large of files in the first place. People have brought up how Glitch had this weird thing where a project gets more RAM than it gets disk anyway.

Pivoting on the big picture

There was this whole second part of how to use a data transport primitive to make a file transfer utility. I was tired from all this stuff with the transport. And this second part isn’t necessarily trivial either:

  • What if the user puts the name of a directory as the destination?
  • What if the user wants to copy a file to a different filename at the destination?
  • Should we support recursive copying?
  • Might people want a prompt for overwriting?
  • Should we offer to replicate the file permissions?

And so on. It seemed that having an 8-bit clean administrative connection was enough to cut a release. Maybe we could later use that connection to run rsync and have all those fiddly little questions above settled with “yes, and many more features too.” Maybe we could use it to run Visual Studio Code’s remote plugin. Lots of possibilities.

Introducing snail t pipe

In the end, I polished up the prototype to support full duplex communication and to encode each chunk separately so that they can be sent immediately. Instead of a user interface for copying files, it lets you run a command on the Glitch project. With all the problems we solved above, you’re free to use something as simple as cat >dst as that command and pipe data into the snail invocation. It provides separate stdout and stderr, and it also forwards the command’s return value.

See the resulting code here:

I left a big note about the remaining problems snail-cli/index.js at 6e6ca11a7605d0c15783fbc0ff6a2985967307d1 · wh0/snail-cli · GitHub. Notably, in the download direction, data piles up in the project container’s WeTTY process, which is controlled by root. Intriguing. That alone is why I posted this thread in the Feedback section, in case anyone was wondering.

Concluding remarks

I’ve subjected you to a rambling 3,785-word report just to get to this point: Dear Glitch, runaway printing in the terminal can elevate a root process’s memory usage, seemingly without bound. Please check that out.