Temporary service interruption

We’ve encountered an issue during a deploy to update our docker rbd plugin. Some application containers are failing to start. We expect to resolve everything in the next few minutes and will update this topic with more info as we have it.

1 Like

App containers are now 95% up, currently investigating the remaining 5%.

We encountered another issue with the update and are rolling back to the previous version.

Thanks for the updates, just want to point out that the system status page doesn’t show any incidents: http://status.glitch.com/

All app containers are currently down during the rollback. We’ll restart the docker workers after the code is deployed to all machines (~5 min) sorry for the interruptions!

Still have a partial interruption, but most containers are successfully running on the reverted version. We’re restoring the remaining nodes and expect to have everything at 100% in the next ~10 minutes.

There were some containers that we had to suspend and manually restore the volumes from backups because they got corrupted when some docker nodes got stuck in an inbetween state. This final couple of hours was resolving the last issues for those ~10 applications.

We’ve tied up any loose ends and now things are running normally :slight_smile:

Let us know if you see anything amiss, and thanks again for your patience!