Goodby Google, Hello Matomo
What's in This Post
When I redeployed my blog and put it through it's security paces earlier this year, I wrote about it. In that post I mentioned a few things I still wasn't happy with. One of those was using Google Analytics and having unsafe-inline
entries in my Content Security Policies.
I'll be tackling comment and font addons in the future. For now I want to start using a more privacy friendly form of analytics.
Log Analytics vs Javascript
Most modern analytics tools use a javascript snippet on each page of your site to send information about browsing activity to a dedicated analytics end point. In the majority of cases, the analytics end point is owned and operated by someone other than the website owner. Google Analytics fits this pattern. The alternate method is to rely on webserver logs to provide the analytics. I'll make a sweeping generalisation by saying that Javascript based analytics tools offer better overall analytics options, while it is far easier to provide a privacy respecting experience with log based analytics.
While there are some very interesting privacy respecting Javascript web analytics projects, the asymmetry of a web log analysing tool appeals to my desire to keep things simple. But I'll be keeping an eye on offen for the future.
I chose Matomo because of the project maturity and it's support for both Javascript and log based analytics.
Matomo on Kubernetes
Because everything is better when it's on Kubernetes, I decided I'd deploy Matomo on my local cluster. The official Matomo github site contains a docker repository that I was able to use to mock up some k8s manifests.
Preparing to Deploy
Before we deploy Matomo, there's a few items we should prepare first. The db.env
available in the Matomo examples folder provides some hints as to what we need.
As we deploy mysql/mariadb we'll need to provide a root password, a database that will be created if it doesn't exist, a user with access to that database and that user's password. We'll be using the same user and database as the official Matomo docker examples:
MATOMO_USER: matomo
, MATOMO_DATABASE: matomo
The base64 encoding used to create kubernetes secrets should be treated as clear text for the purposes of security. Follow your organisation's secure practices for the generation and storage of secrets. At the very least, ensure that any manifest you create has limited access and any console or terminal history is cleared.
1~ echo -n 'my root pass'| base64
2bXkgcm9vdCBwYXNz
3~ echo -n 'matomo'| base64
4bWF0b21v
5~ echo -n 'my user pass'| base64
6bXkgdXNlciBwYXNz
Here's the resulting manifest to create the secrets we'll need.
1---
2apiVersion: v1
3kind: Secret
4metadata:
5 name: mysql-root-passwd
6 namespace: matomo
7data:
8 MARIADB_ROOT_PASSWORD: bXkgcm9vdCBwYXNz
9---
10apiVersion: v1
11kind: Secret
12metadata:
13 name: mysql-user-secret
14 namespace: matomo
15data:
16 MARIADB_USER: bWF0b21v
17 MARIADB_PASSWORD: bXkgdXNlciBwYXNz
Deploy Matomo
For security and my own sanity, I'll be deploying into a dedicated namespace for Matomo.
1---
2apiVersion: v1
3kind: Namespace
4metadata:
5 name: matomo
I'm using small persistent storage claims, but you can make them as big as you need.
1---
2apiVersion: v1
3kind: PersistentVolumeClaim
4metadata:
5 name: matomo-claim
6 namespace: matomo
7spec:
8 accessModes:
9 - ReadWriteOnce
10 resources:
11 requests:
12 storage: 500Mi
13---
14apiVersion: v1
15kind: PersistentVolumeClaim
16metadata:
17 name: db-claim
18 namespace: matomo
19spec:
20 accessModes:
21 - ReadWriteOnce
22 resources:
23 requests:
24 storage: 500Mi
Next, I created a service via which we'll hit the database.
1---
2apiVersion: v1
3kind: Service
4metadata:
5 name: matomo-db
6 namespace: matomo
7spec:
8 selector:
9 app: matomo-db
10 ports:
11 - port: 3306
12 name: mariadb
I'll also need a service for the matomo app.
1---
2apiVersion: v1
3kind: Service
4metadata:
5 name: matomo-svc
6 namespace: matomo
7spec:
8 ports:
9 - port: 80
10 protocol: TCP
11 selector:
12 app: matomo-app
Finally, here are the pod definitions. Note that I've used the MARIADB prefix for the database environment variables. These are functionally equivalent to the matching MYSQL environment variables, with the exception that if both are defined the MARIADB variables will be preferred.
1---
2apiVersion: v1
3kind: Pod
4metadata:
5 name: matomo-db
6 namespace: matomo
7 labels:
8 app: matomo-db
9spec:
10 containers:
11 - name: matomo-db
12 image: mariadb:latest
13 env:
14 - name: MARIADB_ROOT_PASSWORD
15 valueFrom:
16 secretKeyRef:
17 name: mysql-root-passwd
18 key: MARIADB_ROOT_PASSWORD
19 - name: MARIADB_DATABASE
20 value: matomo
21 - name: MARIADB_USER
22 valueFrom:
23 secretKeyRef:
24 name: mysql-user-secret
25 key: MARIADB_USER
26 - name: MARIADB_PASSWORD
27 valueFrom:
28 secretKeyRef:
29 name: mysql-user-secret
30 key: MARIADB_PASSWORD
31
32 volumeMounts:
33 - mountPath: /var/lib/mysql
34 name: dbv
35 volumes:
36 - name: dbv
37 persistentVolumeClaim:
38 claimName: db-claim
39---
40apiVersion: v1
41kind: Pod
42metadata:
43 name: matomo-app
44 namespace: matomo
45 labels:
46 app: matomo-app
47spec:
48 containers:
49 - name: matomo-app
50 image: matomo:latest
51 env:
52 - name: MATOMO_DATABASE_HOST
53 value: matomo-db
54 - name: MATOMO_DATABASE_ADAPTER
55 value: mysql
56 - name: MATOMO_DATABASE_TABLES_PREFIX
57 value: matomo_
58 - name: MATOMO_DATABASE_USERNAME
59 valueFrom:
60 secretKeyRef:
61 name: mysql-user-secret
62 key: MARIADB_USER
63 - name: MATOMO_DATABASE_PASSWORD
64 valueFrom:
65 secretKeyRef:
66 name: mysql-user-secret
67 key: MARIADB_PASSWORD
68 - name: MATOMO_DATABASE_DBNAME
69 value: matomo
70 volumeMounts:
71 - mountPath: /var/www/html
72 name: matomov
73 volumes:
74 - name: matomov
75 persistentVolumeClaim:
76 claimName: matomo-claim
Ingress
In my k8s Self Signing and Trusting your CA post I created a CA which I could use to issue certificates to my ingress resources. Other than the TLS
configuration, the rest of the ingress definition is quite normal.
K8s 1.22+ now requires that the ingressClassName be defined. If you're using an older version you can leave it out.
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: name-virtual-host-ingress-matomo
5 namespace: matomo
6 annotations:
7 cert-manager.io/clusterissuer: my-lab-root-issuer
8spec:
9 ingressClassName: nginx
10 tls:
11 - hosts:
12 - matomo.example.com
13 secretName: mat-ing-cert
14 rules:
15 - host: matomo.example.com
16 http:
17 paths:
18 - pathType: Prefix
19 path: "/"
20 backend:
21 service:
22 name: matomo-svc
23 port:
24 number: 80
Uploading Log files
Now that we have a working version of Matomo, I'll need to send the log files to the Matomo API for processing. Since I use nginx
which is supported out of the box, there's no file manipulation required.
Matomo API Token
In order to push logs via the API you'll need to generate a token. Follow these instructions.
{% notice warning "Warning" %}
Pay attention to the Security considerations
section and treat the token as a password.
{% /notice %}
The import_logs.py
tool
I scheduled an ansible job to ensure that I had the latest version of the import_logs.py
tool.
1---
2 - name: Fetch Matomo import log tool
3 hosts: myhost
4 become: no
5 vars:
6 myuser: "myuser"
7 matomo_release: "4.x-dev"
8 matomo_repo: "https://github.com/matomo-org/matomo-log-analytics.git"
9 matomo_local: "/home/{{myuser}}/matomo-import/"
10 tasks:
11
12 - name: "Clone Matomo release locally."
13 ansible.builtin.git:
14 repo: "{{matomo_repo}}"
15 dest: "{{matomo_local}}"
16 version: "{{matomo_release}}"
17 force: yes
Fetching Logs and Importing them
I wrote a simple bash script to fetch logs from my webservers and import them to Matomo. Before you can import any logs, you'll need to know the site ID of each website whose logs you'll be sending to the Matomo API. You can find the ID
in the Matomo user interface under Administration (the cogwheel icon)/Websites/Manage.
Because my web server logs rotate daily, I always import the log labelled access.log.1
which is yesterday's log. This allows me to run the job once a day. If you want to update Matomo more often, then consider sync'ing the live log (in my case that would be access.log
) and running this script several times a day.
1#!/bin/bash
2# Fetch web logs and push to matomo
3
4MTOKEN="1234567890ABCDEFG"
5MYUSER="myuser"
6MYNAME=$(basename -s .sh ${0})
7DAILY=$(date +'%Y%m%d')
8TIME=$(date +'%H%M%S')
9WORKDIR="/opt/weblogs"
10LOGDIR=${WORKDIR}/log
11LOGFILE=${LOGDIR}/${MYNAME}-${DAILY}.log
12LOGENTRY="${DAILY} ${TIME} :"
13WEBLOGS="/var/log/nginx/"
14IDSITE="1"
15MATOMO_DIR="/home/${MYUSER}/matomo-import"
16MATOMO="${MATOMO_DIR}/import_logs.py"
17MATOMO_URL="https://matomo.example.com"
18
19function help_me () {
20 echo "usage ${MYNAME} <web server> <domain> <matomo idsite>"
21 echo "\n"
22 echo "**** Additional parameters will be ignored ****"
23}
24function logme () {
25 if [ ! -d ${LOGDIR} ]; then
26 script_err "Log directory not found. Exiting."
27
28 fi
29
30 if [ -z "${1}" ]; then
31 echo "${LOGENTRY} --------- " >> ${LOGFILE}
32 else
33 echo "${LOGENTRY} ${1}" >> ${LOGFILE}
34 fi
35
36}
37function push_logs () {
38logme "--------- Starting import"
39# options which I've left out: --enable-bots --enable-http-errors
40python3 ${MATOMO} --token-auth=${MTOKEN} --url=${MATOMO_URL} --enable-static --enable-reverse-dns --recorders=4 --idsite=${IDSITE} ${WORKDIR}/${DOMAIN}/access.log.1 >> ${LOGFILE}
41logme "--------- Import finished"
42}
43function script_err () {
44 if [ -z "${1}" ]; then
45 echo "Unknown error. Exiting" >&2
46 exit 1
47 else
48 echo "Experienced the following error: "
49 echo ${1} >&2
50 exit 1
51 fi
52}
53function sync_logs () {
54logme "starting sync from ${WEB_SERVER} for ${DOMAIN}"
55rsync -a ${MYUSER}@${WEB_SERVER}:${WEBLOGS} ${WORKDIR}/${DOMAIN}/
56}
57
58function check_args () {
59 if [ -z "${1}" ]; then
60 help_me
61 script_err "No arguments received"
62 fi
63 case ${1} in
64 help)
65 help_me
66 exit 0
67 ;;
68 *)
69 if [[ $# -lt 3 ]]; then
70 help_me
71 script_err "Not enough arguments, need exactly 3"
72 fi
73 WEB_SERVER=${1}
74 DOMAIN=${2}
75 IDSITE=${3}
76 sync_logs
77 push_logs
78 ;;
79 esac
80}
81# $1= server $2= domain $3= matomo site id.
82check_args $@
83exit 0
Get the Code
All code shown in this post can be found here