tim olson

A site reliability engineer targeting all things Kubernetes. Coming from SWE I love working all parts of the stack. I specifically provide Kubernetes as a platform along with the administration, onboarding, and life-cycle of applications. Long live Linux.

St. Paul, MN

townwatch: an alerting tool for log line regex matches

Sat, May 1, 2021

Estimated reading time: 5 min

go development

I recently completed a project for watching server log messages; townwatch. I named it for an upgrade in AoE2 that isn’t very popular. Townwatch, my tool, is not a replacement for other monitoring or threat prevention tools, its aim is to provide some easy visibility into service log files on a linux host.

This project was inspired by Dave from tutoriaLinux. Some of his very early sysadmin basics videos were very helpful for me as a college student, and new intern in DevOps organization. I’ve stayed as a subscriber because many of his videos newer videos are interesting to have in my queues, as well as his occasional reminders of O’Reilly humble bundles. He produced a video on building an SSH alerting tool, rather coincidentally almost a year later, I completed my own version. Anyways, check Dave’s stuff out, he’s great.

My version of the tool came about from a couple of things.

Necessity. At work I was diving back into Go after being away for 6-8 months of go programming. I had forgotten many constructs and other bits of the language; specifically interfaces, go routines, and channels (which I used for the work project but didn’t need here).

Panic! …Like many in Tech, I’m often conscripted by my elders (parents/grandparents/family friends) into solving their tech challenges. A few years ago I setup a IP camera stack for my parent’s cabin. Back in February, their NVR server went offline, with me the closest person at 4.5h away. I could resolve the host, so DynDNS was working. I could ping the host, so it was up. I could hit the SSH port, but was being denied. So what was wrong? The server was using a non-standard ssh port, had fail2ban in place, etc. After being stumped and with time to think, I felt my box was p0wned.

A little later I thought of other options. Could be a bad ram module or disk. Malice might not be in play. Ready to act; I assembled a crash cart. It included; monitor, keyboard, mouse, spare disk, ram, CPU, PSU, finally my trusty laptop. I headed north.

Upon arrival the problem became clear. One of the case fans had burned out. Which must have cascaded to over work the CPU fans, which also burned out. The server looked mostly healthy but it had thermally throttled itself. This explained my SSH failures because the PC was essentially soft locked after I connected my peripherals. Further along in my debugging, I noticed, one of the disks turned to dust. Quite frankly I’m not sure what the problem was/is. The NVR is a Frankenstein’s monster of old gaming hardware, and it’s likely it’s time had come. I replaced the PSU assuming over-volting nuked the gear.

Parts repaired, OS working, and I ventured home.

So how does townwatch help here? Ironically it doesn’t. Since running into this issue, I’ve added a node exporter to this box, and added it to my homelab monitoring. However. That brief “have I been p0wned” panic was still unsettling. I connect to my parent’s cabin for offsite backups. If I was p0wned, could someone have crawled over to my other box? I don’t know. If I had something to alert me of SSH logins, Bans, or auth to ZoneMinder (the NVR of choice), this would go along way to help me sleep better until could remedy the issues.

And so this is where the main function of townwatch comes in. It’s not for server health checks. It’s for server visibility. Specifically, visibility into certain logs.

I won’t retread the readme at the top of this post, but I’ll describe how a snippet of the config works and how I deploy this tool.

I have arm and amd64 builds available. I pull this onto the host along with the townwatch.service file in the repo:

[Unit]
Description=Townwatch service - Alerts for logline regex matches

[Service]
Type=simple
# Of course, change this to your sandbox or leave out for default /etc/townwatch/townwatch.yaml
Environment="TOWNWATCH_CONFIG_ARGS=--config=/home/zonebox/townwatch.yaml"
ExecStart=/usr/local/bin/townwatch patrol $TOWNWATCH_CONFIG_ARGS
Restart=always

[Install]
WantedBy=multi-user.target

With this we’re off to the races. Townwatch supports a variety of alert notifiers in the repo. I quite like Gotify, can’t say enough good things about it. Maybe I’ll post about it sometime.

watchers:
  - name: SSHD Login
    regex: '(?P<Line>(?P<Method>sshd|su(?:do)?)(?:\[\d+\])?: pam_unix\(\w+:session\): session opened for user (?P<User>\w+))'
    path: /var/log/auth.log
    examples:
      - 'Apr 27 11:01:19 edoras sudo: pam_unix(sudo:session): session opened for user root by theoden(uid=0)'
    title: '{{.User}} has logged into server with SSH'
    message: |
      The user [{{.User}}] has logged in via method: {{.Method}}

This here will create a child process that will monitor the auth.log file. This is a common default sshd location, your milage may vary.

The regex we look for is defined quite simply as a session open. I don’t care about failed attempts because I will track those with my fail2ban logline. More or less, if someone gets a root shell on my box, SSH or even from su, I want to know, that’s it. It could be fun to use an ESP32 smart plug to shut off the box from another box, that way if I need to pull the rip cord, I could, fun.

Title and message will be used in the notification. You must use these and they can use the regex capture groups for formatting, but also changing content. {{ if (eq .User "root")}}URGENT MESSAGE:{{end}} is one such example. It would be cool, if in the case of SMTP, you could mark the email as “URGENT” or Gotify, change priority, but maybe another time.

Finally I think it’s useful to run an example at startup, to ensure the regex works. You can supply as many as you want, if a single example fails, townwatch will fatal backoff.

So there you go! My light panic turned into an excuse to do a fun project I’ve had in my list for a while. If you try out townwatch, have comments, or questions, hit me up at: tolson@tolson.io, or go right to github issues. I watch both :)