Stateless Nextcloud on Kubernetes

Published July 17 2020
Nextcloud and Kubernetes logos on a blue background.

Since early 2018, I have used Nextcloud in a production capacity, providing domestic, privacy-respecting cloud file storage for myself and my family. Over the years, I have spent a considerable amount of time finding a way to run it safely, ensuring that decades of documents, photos, and other critical data do not go missing.

In recent years, Kubernetes has powered my production infrastructure, and so finding a way to force Nextcloud into the stateless container model has been a goal of mine for a while, as that would provide the desired reliability and horizontal scaling capabilities.

Like many legacy-ridden PHP applications, Nextcloud is heavily filesystem driven, and demands a mutable filesystem to operate in any capacity. This filesystem dependence was a large problem in the migration to Kubernetes, but with a bit of late-night research, a few experiments, and a couple shell scripts, I am finally content with the redundancy and elasticity of my current solution.

As of now, I have achieved all the goals I set as a baseline for me to trust Nextcloud:

  • All user content is stored in object storage, removing "storage limit anxiety" and offering unmatched data resiliency compared to a single disk or an array of physical drives.
  • Pods can be scaled with haste to meet demand during heavy usage periods, and utilise existing autoscaling concepts like HPA's.
  • Containers are stateless, and do not share a filesystem with each other or the user data, meaning rollbacks and scaling carry minimal risk.

Over the past year I've slowly worked toward these goals, introducing one at a time to ensure stability.

Stage One: Object Storage

Before even considering moving to Kubernetes, I had to first get object storage working. I had previously experimented with the "External Storages [sic]" support, which seemed to work well but wasn't as transparent as I wanted.

Luckily, there is official documentation outlining how to set this up. It's quite simple, and just requires a few lines in config.php:

config.php
<?php
[...]
  'objectstore' => array (
    'class' => 'OC\\Files\\ObjectStore\\S3',
    'arguments' => array (
      'bucket' => 'my-nextcloud-data-bucket',
      'autocreate' => false,
      'use_path_style' => true,
      'port' => 9000,
      'key' => 'AKIAxxx',
      'secret' => 'asd+fx/xxxx=',
      'hostname' => 'nxc-minio',
      'use_ssl' => false,
      'region' => 'ca-central-1',
    ),
  ),
[...]

Astute readers may notice I am not pointing to S3 directly, but more on that later.

The only thing I was unsure about was migrating existing data to a new primary backend, which I'm still not sure is possible. Nextcloud stores files in object storage as flat data-only blobs, so it's not just as simple as copying some files around.

Ultimately, I ended up simply having a downtime event and manually re-syncing all the files for each user. This was still early on in our usage of Nextcloud--only dozens of gigabytes at most--so it wasn't a huge deal.

With this, I now had a single server running Nextcloud, but with the benefits of more resilient storage, and an unlimited amount of it. I ran things this way for a while, but eventually the perceived risk of that single server wore me down. It was time to scale.

Stage Two: Kubernetes

Obviously with Nextcloud being primarily filesystem-driven, just putting it on Kubernetes or using their official Docker images didn't really help much as we would still be bound to a single PVC containing the webroot.

There were maybe a dozen different things I tried, from Azure Storage's CIFS integration in AKS, to running my own NFS server, and eventually settling on what I am currently running: webroot snapshots.

Everything starts with a custom Docker image, which is mostly just a normal container with Apache HTTPD and PHP, running on an Alpine Linux base image. It was incredibly simple to get working, and contains a few dozen PHP extensions to maximize compatibility. I settled on this more generic approach because a lot of the world runs on PHP and while I try my best to avoid it, I may need to run some other PHP software in the future, and this leaves a door open.

It's also probably worth noting this container currently sits at just over 200MB, which is 73% smaller than the official Nextcloud images.

Inside of this PHP container I then place a couple of shell scripts: a container entrypoint, and one which generates a tarball of the webroot and stores it in object storage.

This means that the container entrypoint fetches the webroot, extracts it, and only then starts HTTPD. By late-starting HTTPD synchronously, we can rely on Kubernetes' readinessProbes to achieve zero-downtime rollouts. That entrypoint looks something like this:

start.sh
#!/bin/sh
set -e

if [ "$WEBROOT_S3_URL" = "" ] ; then
	# [snip] some error message printing
	exit 1
fi

echo "Extracting webroot..."
aws s3 cp "$WEBROOT_S3_URL" /tmp/webroot.tar.gz
tar -xzf /tmp/webroot.tar.gz -C /srv/www
rm /tmp/webroot.tar.gz
chown -R apache:www-data /srv/www

echo "Starting httpd..."
/usr/sbin/httpd -D FOREGROUND -f /etc/apache2/httpd.conf

With this, we need only set WEBROOT_S3_URL in the container environment to the full URL to the webroot tarball on S3. Then on boot it fetches and extracts that tarball, ensures that HTTPD can access it, and finally starts HTTPD.

This is paired with another script for updating the tarball in S3 with what is currently in the webroot:

update-source.sh
#!/bin/sh
set -e

cd /srv/www
	echo "Building source tarball..."
	tar -czf /tmp/webroot.tar.gz ./
	aws s3 cp /tmp/webroot.tar.gz "$WEBROOT_S3_URL"
cd -

This script requires twofold:

  1. You must make whatever changes you need on a single pod only, which if what you're doing involves load balanced UI means you either need more complex dynamic routing configuration, or accept the possibility of degraded performance as you scale replicas to 1.
  2. You must exec into the pod yourself, and manually execute this script.

Updating is still a bit of a pain point, because I still just scale everything to 1 replica to make sure my traffic hits the same pod every time. I am planning on investigating some sort of "sticky routing" with cookies or alternative hostnames so that I can target a specific pod without compromising the performance of the entire service. As of right now, that's still a future improvement, however.

Regarding point #2 above, I am additionally considering building a small Nextcloud plugin which would let me trigger the update script from the Admin UI, thereby eliminating the need to manually exec into a pod. Again, as of right now, this is only a theoretical future endeavour.

These very simple little scripts are what made all the difference for me, and were the final piece of the puzzle that allowed me to deploy Nextcloud with confidence.

Stage Three: Minio Gateway

The move from the official Nextcloud Docker images to my own Alpine-based image provided a noticable improvement in performance. Intoxicated by this unexpected gain, I then looked at other improvements that could be made. One such change, which I am now running in production, is Minio Gateway.

Minio Gateway for S3 can be configured to store local caches of objects, which can dramatically increase performance on heavy workloads such as Nextcloud. It also can lead to lower transfer costs, as hot objects served from the cache don't have to be repeatedly downloaded from S3, which can become expensive.

Currently, Minio Gateway is running alongside Nextcloud in the same Kubernetes namespace, and is only available internally. Right now, I am taking advantage of the somewhat-generous temporary disk provided on the Kubernetes nodes I am running, and am mounting a temporary emptyDir into Minio's container. Since this data is ephemeral and its performance is not critical to running the app, the potential limits of temporary storage and the potential loss of the cache is unimportant. This gives me approximately 70GB of space to work with, which is plenty.

Minio Gateway has a few other features that I have not yet experimented with, but plan to in the future. Namely, it can provide S3-compliant access to non-S3 object storage such as GCS or Azure Storage -- this means I am finally decoupled from AWS and can mirror or lift-and-shift my Nextcloud data to any object storage service supported by Minio Gateway, and have to change very little on the Nextcloud side.

... Profit?

Now that everything is working and has been running smoothly for a while, I have been reaping the benefits. Nextcloud has been operating faster than it has ever been, and my trust in it has strengthened greatly.

This new "decoupled" infrastructure also has helped to simplify and hone my backup strategies and disaster recovery scenarios, as all important data -- including configuration and the webroot -- can just be cloned from object storage to an off-site location.

This also means I have reached the point where I can redeploy the service to any Kubernetes cluster in seconds, and restore the data to any object storage backend supported by Minio Gateway (I plan to document my DR strategies in a future article).

I have always hesitated to commit fully to Nextcloud, simply because of how difficult it is to scale long-term without a multi-million dollar NetApp SAN. This collection of Docker containers, shell scripts, and object storage buckets now has now solved my concerns around scaling and performance, and I have never felt more confident about its resiliency.

That said, I still have at least four backups at all time, in three different locations, and in two different formats. Just in case. I am confident, not foolish.