Project container shutdown signal

2017: Glitch will send your program a SIGTERM. You can handle it if you want to run some things before you exit. (source)

2021: Reportedly, sometimes it doesn’t. (source)

today: I kind of feel like it never does. it does when you’re in the middle of editing and glitch auto-restarts the program without taking down the container, that’s separate. but when the container goes down, I think it never gives you SIGTERM anymore.

an analysis of in-container mechanisms:

  1. the start-container script listens for SIGHUP. we can’t see what, if anything, sends SIGHUP. it’s probably something outside the container. when start-container receives SIGHUP, it sends a SIGHUP to its child process, a runsv /etc/service/watcher.

    this SIGHUP looks like a mistake. runsv doesn’t do anything to handle SIGHUP, it just exits (as that’s the default consequence of SIGHUP), leaving the process it manages to be reparented to start-container and to continue running. maybe this is done on accident, if the the runsv used to be a runsvdir. runsvdir is a program designed to run multiple runsv, and when it receives SIGHUP, it sends SIGTERM to its runsv’s. glitch can restore the termination functionality by having _term send SIGTERM runsv instead of SIGHUP.

    that start-container script, by the way, appears to implement the 5-second grace period as well. it loops for up to 5 seconds, checking if the processes it manages have exited. if not, it exits, which I believe causes the container runtime to terminate everything else in the container.

  2. runsv, from the runit suite (http://smarden.org/runit/runsv.8.html), runs a “run” script, in this case, the one for the ‘watcher’ service. and as mentioned above, when it receives SIGHUP, it simply gets killed and the run script that it was managing becomes a child of start-container.

    runsv watches for SIGTERM, which causes it to send SIGTERM and SIGCONT to the child process that it manages and then waits for its child process to exit. this, however, I believe doesn’t happen in glitch’s container shutdown.

  3. the “run” script for the watcher service has interpreter /usr/bin/dumb-init /bin/bash, so it runs dumb-init (reference). dumb-init can forward signals it receives to its child process. but to my understanding, there is no signal to be forwarded. but if it would receive a SIGTERM, it would forward it.

  4. then there’s the actual service’s “run” script /etc/service/watcher/run written in bash. it runs in bash, and bash runs the actual watcher node program. this extra layer of bash can stop signals too, as it won’t pass on signals to the child process(es), and there’s no signal trap-ing done in this script. glitch can restore the termination functionality by having the script exec node ... to replace the whole bash process so that signals from above will be delivered directly to node instead.

  5. the actual watcher service looks like it actually has two codepaths for forwarding SIGTERMs to the user’s program: once in runner.js in start() and once in index.js in main(). so we can be pretty confident that glitch intends to stop the user’s program when stopping the watcher service.

    although currently it looks like nothing will send SIGTERM to the watcher service.

  6. what’s referred to above as “the user’s program” isn’t really your node program. instead, there are a few more layers before we get to that. first is the “app type” start script. this script is written in bash, but it has several features to make it actually forward signals.

    it is meant to be run as the leader of a process group, which gives us a systematic way to address all of its descendant processes (as long as those aren’t also forked as leaders of new process groups).

    it sets up signal handlers for SIGINT and SIGTERM (there’s also a piece of functionality that uses SIGUSR1, which seems unrelated to container shutdown). it broadcasts SIGTERM to its entire process group. this functionality is in /opt/watcher/app-types/utils.sh file sourced from the app type script, if you want to take a look. (side question: wouldn’t that cause it to receive another SIGTERM itself? there appears to be a _killing flag to prevent actually looping.)

    just in case you did launch a process in a new process group, when starting your app, it tries to kill any child processes of the container’s init process (start-container) belonging to the app user. note that if the watcher runsv process gets reparented, it still would not match this filter, because that process is owned by root. also, be warned that this functionality is only present in node type apps. python and custom app types don’t get this extra killing. it’s also run at the beginning of the app launch process, not at the end, so it won’t help during container shutdown.

    these features in this script get used when you’re editing and glitch restarts your project without stopping the container, which makes it confusing when you see your programs getting SIGTERM and shutting down gracefully during development. but as far as I can tell, it doesn’t happen during project shutdown, possibly due to the multiple issues in the above layers.

  7. the way the “app type” start script runs your program, it does eval your-package-json-scripts-start &, which the & and eval together cause there to be an additional subshell between the app type start script and your actual node program. but I think this doesn’t cause any extra problems because the start script signals its whole process group.

  8. finally it’s your node process, where you may have a process.on('SIGTERM'). but again, I haven’t detected this being triggered on container shutdown.

questions for whoever visits this thread:

  • have you had SIGTERM sent to your program on container shutdown?
  • could you review the above to see if it makes sense?
1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.