Goodby Google, Hello Matomo

What's in This Post

When I redeployed my blog and put it through it's security paces earlier this year, I wrote about it. In that post I mentioned a few things I still wasn't happy with. One of those was using Google Analytics and having unsafe-inline entries in my Content Security Policies.

I'll be tackling comment and font addons in the future. For now I want to start using a more privacy friendly form of analytics.

Log Analytics vs Javascript

Most modern analytics tools use a javascript snippet on each page of your site to send information about browsing activity to a dedicated analytics end point. In the majority of cases, the analytics end point is owned and operated by someone other than the website owner. Google Analytics fits this pattern. The alternate method is to rely on webserver logs to provide the analytics. I'll make a sweeping generalisation by saying that Javascript based analytics tools offer better overall analytics options, while it is far easier to provide a privacy respecting experience with log based analytics.

While there are some very interesting privacy respecting Javascript web analytics projects, the asymmetry of a web log analysing tool appeals to my desire to keep things simple. But I'll be keeping an eye on offen for the future.

I chose Matomo because of the project maturity and it's support for both Javascript and log based analytics.

Matomo on Kubernetes

Because everything is better when it's on Kubernetes, I decided I'd deploy Matomo on my local cluster. The official Matomo github site contains a docker repository that I was able to use to mock up some k8s manifests.

Preparing to Deploy

Before we deploy Matomo, there's a few items we should prepare first. The db.env available in the Matomo examples folder provides some hints as to what we need.

As we deploy mysql/mariadb we'll need to provide a root password, a database that will be created if it doesn't exist, a user with access to that database and that user's password. We'll be using the same user and database as the official Matomo docker examples: MATOMO_USER: matomo, MATOMO_DATABASE: matomo

REMINDER

The base64 encoding used to create kubernetes secrets should be treated as clear text for the purposes of security. Follow your organisation's secure practices for the generation and storage of secrets. At the very least, ensure that any manifest you create has limited access and any console or terminal history is cleared.

It's convenient to store the user as an additional variable within the same secret as the password, so we'll encode that as well:

1~ echo -n 'my root pass'| base64
2bXkgcm9vdCBwYXNz
3~ echo -n 'matomo'| base64
4bWF0b21v
5~ echo -n 'my user pass'| base64
6bXkgdXNlciBwYXNz

Here's the resulting manifest to create the secrets we'll need.

 1---
 2apiVersion: v1
 3kind: Secret
 4metadata:
 5  name: mysql-root-passwd
 6  namespace: matomo
 7data:
 8  MARIADB_ROOT_PASSWORD: bXkgcm9vdCBwYXNz
 9---
10apiVersion: v1
11kind: Secret
12metadata:
13  name: mysql-user-secret
14  namespace: matomo
15data:
16  MARIADB_USER: bWF0b21v
17  MARIADB_PASSWORD: bXkgdXNlciBwYXNz

Deploy Matomo

For security and my own sanity, I'll be deploying into a dedicated namespace for Matomo.

1---
2apiVersion: v1
3kind: Namespace
4metadata:
5  name: matomo

I'm using small persistent storage claims, but you can make them as big as you need.

 1---
 2apiVersion: v1
 3kind: PersistentVolumeClaim
 4metadata:
 5  name: matomo-claim
 6  namespace: matomo
 7spec:
 8  accessModes:
 9    - ReadWriteOnce
10  resources:
11    requests:
12      storage: 500Mi
13---
14apiVersion: v1
15kind: PersistentVolumeClaim
16metadata:
17  name: db-claim
18  namespace: matomo
19spec:
20  accessModes:
21    - ReadWriteOnce
22  resources:
23    requests:
24      storage: 500Mi

Next, I created a service via which we'll hit the database.

 1---
 2apiVersion: v1
 3kind: Service
 4metadata:
 5  name: matomo-db
 6  namespace: matomo
 7spec:
 8  selector:
 9    app: matomo-db
10  ports:
11    - port: 3306
12      name: mariadb

I'll also need a service for the matomo app.

 1---
 2apiVersion: v1
 3kind: Service
 4metadata:
 5  name: matomo-svc
 6  namespace: matomo
 7spec:
 8  ports:
 9  - port: 80
10    protocol: TCP
11  selector:
12    app: matomo-app

Finally, here are the pod definitions. Note that I've used the MARIADB prefix for the database environment variables. These are functionally equivalent to the matching MYSQL environment variables, with the exception that if both are defined the MARIADB variables will be preferred.

 1---
 2apiVersion: v1
 3kind: Pod
 4metadata:
 5  name: matomo-db
 6  namespace: matomo
 7  labels:
 8    app: matomo-db
 9spec:
10  containers:
11    - name: matomo-db
12      image: mariadb:latest
13      env:
14        - name: MARIADB_ROOT_PASSWORD
15          valueFrom:
16            secretKeyRef:
17              name: mysql-root-passwd
18              key: MARIADB_ROOT_PASSWORD
19        - name: MARIADB_DATABASE
20          value: matomo
21        - name: MARIADB_USER
22          valueFrom:
23            secretKeyRef:
24              name: mysql-user-secret
25              key: MARIADB_USER
26        - name: MARIADB_PASSWORD
27          valueFrom:
28            secretKeyRef:
29              name: mysql-user-secret
30              key: MARIADB_PASSWORD
31
32      volumeMounts:
33        - mountPath: /var/lib/mysql
34          name: dbv
35  volumes:
36    - name: dbv
37      persistentVolumeClaim:
38       claimName: db-claim
39---
40apiVersion: v1
41kind: Pod
42metadata:
43  name: matomo-app
44  namespace: matomo
45  labels:
46    app: matomo-app
47spec:
48  containers:
49    - name: matomo-app
50      image: matomo:latest
51      env:
52        - name: MATOMO_DATABASE_HOST
53          value: matomo-db
54        - name: MATOMO_DATABASE_ADAPTER
55          value: mysql
56        - name: MATOMO_DATABASE_TABLES_PREFIX
57          value: matomo_
58        - name: MATOMO_DATABASE_USERNAME
59          valueFrom:
60            secretKeyRef:
61              name: mysql-user-secret
62              key: MARIADB_USER
63        - name: MATOMO_DATABASE_PASSWORD
64          valueFrom:
65            secretKeyRef:
66              name: mysql-user-secret
67              key: MARIADB_PASSWORD
68        - name: MATOMO_DATABASE_DBNAME
69          value: matomo
70      volumeMounts:
71        - mountPath: /var/www/html
72          name: matomov
73  volumes:
74    - name: matomov
75      persistentVolumeClaim:
76       claimName: matomo-claim

Ingress

In my k8s Self Signing and Trusting your CA post I created a CA which I could use to issue certificates to my ingress resources. Other than the TLS configuration, the rest of the ingress definition is quite normal.

ingressClassname

K8s 1.22+ now requires that the ingressClassName be defined. If you're using an older version you can leave it out.

 1apiVersion: networking.k8s.io/v1
 2kind: Ingress
 3metadata:
 4  name: name-virtual-host-ingress-matomo
 5  namespace: matomo
 6  annotations:
 7    cert-manager.io/clusterissuer: my-lab-root-issuer
 8spec:
 9  ingressClassName: nginx
10  tls:
11  - hosts:
12    - matomo.example.com
13    secretName: mat-ing-cert
14  rules:
15  - host: matomo.example.com
16    http:
17      paths:
18      - pathType: Prefix
19        path: "/"
20        backend:
21          service:
22            name: matomo-svc
23            port:
24              number: 80

Uploading Log files

Now that we have a working version of Matomo, I'll need to send the log files to the Matomo API for processing. Since I use nginx which is supported out of the box, there's no file manipulation required.

Matomo API Token

In order to push logs via the API you'll need to generate a token. Follow these instructions. {% notice warning "Warning" %} Pay attention to the Security considerations section and treat the token as a password. {% /notice %}

The import_logs.py tool

I scheduled an ansible job to ensure that I had the latest version of the import_logs.py tool.

 1---
 2  - name: Fetch Matomo import log tool
 3    hosts: myhost
 4    become: no
 5    vars:
 6      myuser: "myuser"
 7      matomo_release: "4.x-dev"
 8      matomo_repo: "https://github.com/matomo-org/matomo-log-analytics.git"
 9      matomo_local: "/home/{{myuser}}/matomo-import/"
10    tasks:
11
12      - name: "Clone Matomo release locally."
13        ansible.builtin.git:
14          repo: "{{matomo_repo}}"
15          dest: "{{matomo_local}}"
16          version: "{{matomo_release}}"
17          force: yes

Fetching Logs and Importing them

I wrote a simple bash script to fetch logs from my webservers and import them to Matomo. Before you can import any logs, you'll need to know the site ID of each website whose logs you'll be sending to the Matomo API. You can find the ID in the Matomo user interface under Administration (the cogwheel icon)/Websites/Manage.

Because my web server logs rotate daily, I always import the log labelled access.log.1 which is yesterday's log. This allows me to run the job once a day. If you want to update Matomo more often, then consider sync'ing the live log (in my case that would be access.log) and running this script several times a day.

 1#!/bin/bash
 2# Fetch web logs and push to matomo
 3
 4MTOKEN="1234567890ABCDEFG"
 5MYUSER="myuser"
 6MYNAME=$(basename -s .sh ${0})
 7DAILY=$(date +'%Y%m%d')
 8TIME=$(date +'%H%M%S')
 9WORKDIR="/opt/weblogs"
10LOGDIR=${WORKDIR}/log
11LOGFILE=${LOGDIR}/${MYNAME}-${DAILY}.log
12LOGENTRY="${DAILY} ${TIME} :"
13WEBLOGS="/var/log/nginx/"
14IDSITE="1"
15MATOMO_DIR="/home/${MYUSER}/matomo-import"
16MATOMO="${MATOMO_DIR}/import_logs.py"
17MATOMO_URL="https://matomo.example.com"
18
19function help_me () {
20  echo "usage ${MYNAME} <web server> <domain> <matomo idsite>"
21  echo "\n"
22  echo "**** Additional parameters will be ignored ****"
23}
24function logme () {
25  if [ ! -d ${LOGDIR} ]; then
26    script_err "Log directory not found. Exiting."
27
28  fi
29
30  if [ -z "${1}" ]; then
31    echo "${LOGENTRY} --------- " >> ${LOGFILE}
32  else
33    echo "${LOGENTRY} ${1}" >> ${LOGFILE}
34  fi
35
36}
37function push_logs () {
38logme "--------- Starting import"
39# options which I've left out: --enable-bots --enable-http-errors
40python3 ${MATOMO} --token-auth=${MTOKEN} --url=${MATOMO_URL} --enable-static --enable-reverse-dns --recorders=4 --idsite=${IDSITE} ${WORKDIR}/${DOMAIN}/access.log.1 >> ${LOGFILE}
41logme "--------- Import finished"
42}
43function script_err () {
44  if [ -z "${1}" ]; then
45    echo "Unknown error. Exiting" >&2
46    exit 1
47  else
48    echo "Experienced the following error: "
49    echo ${1} >&2
50    exit 1
51  fi
52}
53function sync_logs () {
54logme "starting sync from ${WEB_SERVER} for ${DOMAIN}"
55rsync -a ${MYUSER}@${WEB_SERVER}:${WEBLOGS} ${WORKDIR}/${DOMAIN}/
56}
57
58function check_args () {
59  if [ -z "${1}" ]; then
60    help_me
61    script_err "No arguments received"
62  fi
63  case ${1} in
64    help)
65      help_me
66      exit 0
67    ;;
68    *)
69      if [[ $# -lt 3 ]]; then
70        help_me
71        script_err "Not enough arguments, need exactly 3"
72      fi
73      WEB_SERVER=${1}
74      DOMAIN=${2}
75      IDSITE=${3}
76      sync_logs
77      push_logs
78    ;;
79  esac
80}
81# $1= server $2= domain $3= matomo site id.
82check_args $@
83exit 0

Get the Code

All code shown in this post can be found here