I backed up every topic on the Glitch Forum

I backed up all 17,000 topics from the forum so you didn’t have to. I created a simple script that took about 4-5 hours to run.

Download (~11mb):
https://cdn.riverside.rocks/backups/data.json

Simple PHP script:
https://haste.red-panda.red/wumokopuxo.php

Some more details:

I ran the script on a Raspberry Pi with 4gbs of ram. At it’s peak, the script used about <1% of the machine’s ram. I have about a 350 download and 12 upload internet connection, so if anyone has faster internet you can probably run this a bit faster than I did.

If you plan to try this on your own, don’t do it too often as its quite harsh on the Discourse servers (generates ~50k requests). Be sure to keep your computer online for the full duration of the script to prevent SSH disconnects.

Enjoy, I would love to see what everyone creates with this data.

5 Likes

Wait so all the precious knowledge of this forum is in a single array?

1 Like

Every topic from the forum, I didn’t include replies.

1 Like

now that’s how you say you didn’t use nodejs without saying that you didn’t use nodejs

3 Likes

you should keep it from adding null objects to the array.

1 Like

A wonderful attraction for data scientists who want to analyze what the users are saying on the forums.
Maybe run this through a frequency counter and see what is the most said word besides common words like the and obivously the name Glitch

2 Likes

That file is way too big even for Firefox to pretty-print it, what would be an ideal way to use that file in code? Splitting the JSON data into parts?

What’s with all these null elements?

Deleted/hidden threads>

Each null thread is a post. I didn’t record the posts (replies), however I should have.

Xfinity - Pay 300 dollars a month for little to no upload speed.

2 Likes

I’m attempting to make a much more detailed JSON backup of every topic on this forum as @RiversideRocks’ version only gives the content of each thread.

1 Like

On a sidenote, my second email has mailing list mode on, so it technically has a month or twos archive of topics including deleted topics, unless of course your topic had the misfortune to end up in the spam folder.

1 Like