(Practical note: the installation above doesn’t have email set up, so if you register an account, you won’t be able to activate it. Contact me somehow and I’ll activate your account manually.)
The venerable Discourse project https://www.discourse.org/ has long been one of those things we all knew couldn’t be hosted on Glitch. Not in an officially supported way, at least. It requires Docker, which you wouldn’t be able to run in a Glitch container. And if it could run without docker, it needs PostgreSQL for its database, which isn’t installed. Even if we could install things, the packages in our old version of Ubuntu are out of date. Also, for a free project, the required resources–1 GB of RAM and 10 GB of disk–are way over the quota.
But I felt like we should see if we could anyway. It would be like building a ship in a bottle: it’s intentionally harder than it has to be, and you won’t be able to sail away on the result. And that’s the fun of it.
The result is a network of three projects: the frontend with the actual Discourse server, the PostgreSQL database plus the Redis store, and a background job runner using some gem called Sidekiq. It’s flaky, but I was able to try several things without any catastrophic failures. And on a good day, it doesn’t even hit any memory usage warnings!
In the coming days I’ll be flipping these projects to public and walking through the techniques I used–several of them new to me–to make this work. Although “work” might be a strong word for what it does.
We don’t have permission to use apt-get as root, but we can download the .deb files and extract them.
The postgresql-10 package has a lot of PostgreSQL-related dependencies, but it turns out it runs fine without them. For the dependency on the client, it makes sense, and I don’t know why it’s even listed as a dependency. I looked into what the postgresql-common package does, and it seems to be a bunch of wrapper scripts to determine the right version to use. In this project I go without the wrappers, and the scripts access .../usr/lib/postgresql/10/bin/... directly.
This distribution of PostgreSQL normally tries to open a UNIX domain socket in /var/run/postgresql, which we don’t have permission to write to in our project containers. I changed the configuration not to bother with the UNIX domain sockets. We’ll use TCP to connect to it, which I’ll describe later in this segment.
Setting up Redis
I originally tried compiling Redis from source, but I later found that they also publish a prebuilt version for Ubuntu 16.04. Lucky!
I made a copy of the config file (from the wrong version I realize now 🤦) and edited it not to daemonize (discussed later), to disconnect idle clients (discussed later), and to put its logs and data into the project’s .data directory.
Connectivity between Glitch projects
There won’t be enough RAM, disk, and CPU in one Glitch project to run all of Discourse, so we’ll need to run some components on separate projects and let them access the services in this project. But as far as I know, there is no secret passageway that lets them communicate with each other. That leaves only the great big front door that is the open Internet.
These services have TCP interfaces. In previous work Cramming a Terraria server into a free project container, I showed that Glitch allows arbitrary streaming communication after an HTTP upgrade. I use that technique here too. This time I added some logic to route requests for different paths to different ports.
And because it’ll be connecting over a public HTTP server, I also added some simple authentication. PostgreSQL and Redis probably have their own authentication, and it would be prudent to set those up for defense in depth. But in this project they aren’t, and I really hope I wrote this part correctly . (Disclosure: I am aware that the secret token is not compared in constant time.)
One half of the result is in services/doors/doors.js. The other half is roughly the opposite, and we’ll see it in a later post in this writeup.
An annoying thing about this is that we can’t let connections be idle for too long, or Glitch will close the connection for us. The programs don’t like this. So I’ve added some configuration in various places to have the programs themselves close connections that are idle for ~25 seconds. Maybe a better version of this could add some kind of keep alive system, but you’d have to be careful how you do it, or it could interfere with project container sleep.
Running multiple processes
We have three things, really. There’s a PostgreSQL server, a Redis server, and a little program to accept connections over upgraded HTTP. Now we need to run all three of them from our start script.
I found that there’s already a suite of programs for managing services, runit http://smarden.org/runit/. Glitch uses it to manage some services in the container, and I think it’s not uncommon in the Docker scene where there’s a desire for a simple daemon supervisor. From its manual,
runsv switches to the directory service and starts ./run.
runsvdir starts a runsv (8) process for each subdirectory … and restarts a runsv (8) process if it terminates.
How’s that for simple? So we make a bunch of directories and write up some run scripts. That’s the rest of this project in the services directory.
Because runsv expects the program to run until the service dies, I’ve configured the programs not to daemonize.
Thanks! All we have is our best effort, after all.
Thanks! That makes it all worthwhile, which is important because it’s a little unpractical.
Yup, there’s tunneling in the form of HTTP upgrades. And yet it is exposed to the Internet, with only HTTP authentication between us and oblivion. Well, and TLS.
P. S., I think Glitch is sometimes a little un-gentle when it stops your app, and I sometimes see leftover pidfiles. I’ve rigged up the commands in package.json to delete them carelessly
The counterpart to the HTTP upgrade server on the state and coordination project is an HTTP upgrade client, in services/doorc/doorc.js. With that, we’ll be able to connect to the PostgreSQL and Redis servers on our other project.
Setting up Ruby
We have Ruby 2.3 in our project containers, but Discourse needs Ruby 2.6+. I couldn’t find a prebuilt copy of it for Ubuntu 16.04, so I installed it with the ruby-build plugin https://github.com/rbenv/ruby-build for rbenv.
Installing Ruby this way takes a long time because it has to compile it. To save time one later installations, I archived up the compiled files and uploaded them to Glitch’s asset CDN. The install script downloads the archive and extracts it instead of compiling Ruby from source again.
My tooling for this part could use some work. This time I served up the archive through my project, downloaded it to my computer, and uploaded it through the Glitch editor. That’s unnecessarily slow, because my home Internet isn’t as fast as Glitch going from AWS EC2 to S3. If I have time, I’ll see if I can make a script for uploading files from the container to the CDN.
Setting up Discourse’s dependencies
In all, there are 209 dependency gems, taking up a total of 612 MB. Some gems have native dependencies that have to be compiled, and the whole installation takes a long time. I did the same thing as I did with Ruby for this: I archived up the compiled gems and uploaded them to the CDN. The install script downloads the archive and extracts it so that it doesn’t have to compile them from source.
At this point, we can try a little something with a piece of Discourse interacting with our PostgreSQL and Redis servers. There’s a background worker component to Discourse, which amounts to running a Ruby program called Sidekiq. With a lot of configuration though.
# You may be surprised production is not here, it is sourced from application.rb using a monkey patch
# This is done for 2 reasons
# 1. we need to support blank settings correctly and rendering nothing in yaml/erb is a PITA
# 2. why go from object -> yaml -> object, pointless
I tried to follow the path down to whatever “sourced from application.rb using a monkey patch” meant, but they brought out the big guns in making the configuration hard to follow: dynamically constructed ungreppable environment variable names, reflection for accessing the configuration model, reflection for declaring the configuration model, dynamic dispatch to multiple duck typed implementations of configuration, it’s all there. I’m just kidding around, of course. Except that to this day, I still don’t know what calls registerhttps://github.com/discourse/discourse/blob/v2.6.0/app/models/global_setting.rb#L5 here.
With the esoteric knowledge that the production configuration ultimately comes from subtly differently named environment variables, the next step is to figure out how we’re supposed to configure the database even if we had been able to write YAML directly. And if you were to consult the Rails documentation for that, you’d find out how poorly the authors think of, well, anyone trying to do anything. The guide https://guides.rubyonrails.org/v6.0/configuring.html#configuring-a-database says as much as this:
Using the config/database.yml file you can specify all the information needed to access your database:
The lack of any reference material led me to joke that my configuration wasn’t working right:
by_the_way: "don't connect over UNIX socket. go for the TCP port"
Rails is just one of those projects that likes ‘implicit’ over ‘explicit,’ including in documentation.
Anyway, I’ve put the variables that I divined into source.sh so that I could source it from a runit script as well as from an interactive shell. Putting it in .env might work too, but rbenv uses an eval which I’ve never tried putting in .env, and this stuff is non-secret anyway and it could probably save you some time to be able to see it.
I had earlier tried running Discourse in development mode, but it was too slow. We don’t have enough RAM to run many Unicorn processes in parallel, so all the resources on a page have to be loaded a few at a time. And each one takes a couple of seconds because it has to package it from source. The result is that you couldn’t practically load anything because the requests for the styles and scripts from the bottom of a page would time out.
So for now it runs in production mode, which notably
runs the multithreaded Puma server,
operates on a different database, and
prepares static assets in advance
I learned about using diff and patch to apply a change to the hardcoded () configuration in Discourse. But at one point when I was trying to patch a file through a symlinked directory the patch program would fail due to a symlink loop. Couldn’t figure out why. Anyway, now I have it cd into the Discourse directory run a patch relative to that. That’s in discourse-config.patch.
Setting up a production database
It turned out that the development database alone was over 100 MB already. So I instead configured it to use the same development database in production .
It’s basically a rake command. For some reason it needs to connect to Redis and PostgreSQL before it’s willing to precompile the assets. Because of that, I couldn’t automate this step because I would have expected to do this during the ‘install’ step, but we don’t start the doorc service until the start step.
For now it’s done manually, and I’ve archived up the results and uploaded it to the CDN. The install step downloads and extracts it instead of running this weird rake command.
Stuff to try
Now this whole thing is flipped to public. It’s over to you to figure out where to go from here. There’s a lot that needs doing.
Set up email so people can actually register.
Or turn on some SSO solution.
Come up with some non-abusive way to run the background worker on demand.
Actually rally up a community to use a Discourse instance.
Wait, don’t. Because it’ll be really slow and flaky and they’ll resent that.
Contribute sane info to Rails docs.
Oh and make a better websocket version of doors/doorc.
Make discourse less centered around email. I’d always consider forking discourse/discourse or talking to the folks on meta.discourse.org if there is a way to do it. Not everyone has access to a mail server (and not all users want to have thier inbox filled) and a captcha or challenge can take the place of email verification.
Make discourse less centered around email. I’d always consider forking discourse/discourse or talking to the folks on meta.discourse.org if there is a way to do it. Not everyone has access to a mail server (and not all users want to have their inbox filled) and a captcha or challenge can take the place of email verification.
Also, I recommend you share this on meta.discourse.org, they would be fascinated that you can operate software that requires 2 gigs of ram on hardware with a limit of 600 mb.
“fascinated” nice use of words.
I wonder if this could be how some advanced installs work to distribute load. For example I wonder if multiple discourse instances can be run connecting to the same database and still function
About discourse without email, why not add something like github/discord/google authentication?
This segment is more like a set of instructions to get started with your own copy of this stuff. It’s definitely not a sequence of steps that I can verify will work. What we’re starting out with is just a bunch of things that I can think of that I’m pretty sure need to be done. Maybe with your help, we can get this into a working sequence of steps.
And fortunately a lot of the stuff I had written about above are either things that don’t need to be redone on each remix (e.g. compiling Ruby from source) or are automated with scripts that should copy over fine. When, for example, Discourse releases a new version, I imagine we’ll all band together and figure something out.
Go to the .env file, transfer the same AUTH_... values from your remix of spotted-hot-swift and put the project name of your remix of spotted-hot-swift in the DOMAIN_... variable values (don’t include .glitch.me).
Set up the variables the same way as for your remix of wealthy-noon-agreement, but use the project name of your remix of wealthy-noon-agreement in the DOMAIN_DISCOURSE variable value (don’t include .glitch.me).
And when this inevitably flies off the uh… what’s the name of those metal strips that trains drive on? Somehow that word is blocked from my memory. But when that happens, let’s discuss what we need to fix with these steps.
edit: oops forgot to add a section for replies!
I want to but the instructions for turning on SSO are really intimidating.
I saw something saying they need 1 GB of RAM, and we’re running it with 1.5 GB across all the projects. Also I think there’s too much tension between my remark about the team not caring about people who don’t pay for the hosted version and the “I’m joking” for me to do that .
Cool idea. Discourse is designed to have a reverse proxy in front of it. I forgot to mention this–I had to turn on some obscure switch to make it actually serve static assets in production mode. The expectation is that you’d have nginx or something do that. Maybe a more advanced setup could have yet another project reverse proxy to a number of frontend replicas. And serve the darned static assets.
For the static assets, oOo we can do some interesting things in front of that. For example we can first try to load from ipfs and then fallback to something like github pages, vercel, netlify, or just glitch static sites.
I just found out that node_modules is copied over, so it’ll be easier if you wait for the project you’re remixing to finish their installation. You should be able to visit the source projects’ web URLs to see if they’re up, which if they’re up, then they’ve finished installing. Avoid remixing during installation, because then you’ll get a copy that’s partially installed, and my scripts aren’t fancy enough to work through that case.
In the frontend project, create services/discourse/down before filling in .env so that it doesn’t waste memory and CPU when you’re doing it. Glitch also restarts the project when you change .env, so it makes it easier for runsv to pick up the new down file. I’ve updated the instructions to reflect this.
I’ve updated the instructions so that you don’t do any cds other than in subshells. I think this will make writing these notes easier.
In the frontend project, ./init.sh runs a database migration, which takes a long time. If it fails partway due to network flakiness, run this from the same terminal to retry it:
I have a copy of the finished structure.sql in the project and copied into the discourse directory. If you’re up for an experiment, try bundle exec rake db:setup instead of the ... db:create and ... db:migrate in init.sh.
In the frontend project, if you’ve done all the steps and puma is ready with this in top:
and the ‘App Status’ button still shows a spinner, and the project still shows some kind of ‘waking up’ page, and there’s no wait-for-ports.sh running, try running refresh in the terminal. I don’t know why this happens sometimes, but it seems we can make Glitch forget that it’s broken.
I fixed a couple of dumb things in the frontend project that would prevent these steps from working even if you did everything right. (1) It turns out that, while the contents of .data isn’t copied on remix, the existence of the .data directory propagates. So that mkdir .data is now a mkdir -p .data . (2) That horrible thing in package.json to rm the pidfiles would crash and burn on the first run when there are no leftover pidfiles to delete. It’s a rm -f now .
Thanks to the testers who have been working to smooth out these remixing instructions.