Thinkcentre Watchdog
A Docker-based monitoring solution for detecting and auto-rebooting hung Kubernetes machines via Home Assistant integration.
Overview
This watchdog monitors a target service URL for 502 Bad Gateway errors (indicating a hung machine). When a service fails:
- A 5-minute grace period begins (allowing for deployment recoveries)
- If the service recovers within 5 minutes, the error is cleared (normal deployment scenario)
- If still failing after 5 minutes, an automatic power-cycle is triggered via Home Assistant
- The machine powers off for 10 seconds, then powers back on
All activity is logged with timestamps for monitoring and troubleshooting.
Prerequisites
- Docker and Docker Compose installed
- Home Assistant instance running with network access
- A power switch entity configured in Home Assistant
- Long-lived access token from Home Assistant
Installation
1. Download/Organize Files
Clone or download this repository to your machine:
git clone <repository-url>
cd Thinkcentre-watchdog
The directory should contain:
Dockerfile- Container definitionthinkcenter_monitor.sh- Monitoring scriptdocker-compose.yml- Docker Compose configuration.env.example- Environment variable templateREADME.md- This file
2. Create Configuration File
Copy the example environment file and edit it with your actual values:
cp .env.example .env
Edit .env and configure:
# Your target service URL
TARGET_URL=http://your-kubernetes-service:8080
# Home Assistant configuration
HA_URL=http://homeassistant:8123
HA_TOKEN=your_long_lived_access_token_here
HA_ENTITY=switch.your_power_switch_entity
# Optional: Adjust timing if needed
GRACE_PERIOD=300 # 5 minutes
CHECK_INTERVAL=30 # Check every 30 seconds
3. Generate Home Assistant Token
- Open Home Assistant web interface
- Go to Settings → Developer Tools → Long-Lived Access Tokens
- Click Create Token
- Name it (e.g., "Thinkcentre Watchdog")
- Copy the token and paste it in your
.envfile asHA_TOKEN
4. Configure Power Switch in Home Assistant
Ensure you have a switch entity in Home Assistant that controls the machine's power. Common options:
- Smart Outlet/Relay: If using a smart power outlet
- IPMI/Redfish: For datacenter machines
- Smart Plug: Like Tasmota, Zigbee, or Z-Wave devices
Configure the entity ID in your .env as HA_ENTITY (e.g., switch.thinkcentre_power)
5. Build and Run
Start the monitoring container:
docker compose up -d
The container will:
- Build from the Dockerfile
- Start with
restart: unless-stoppedpolicy - Mount logs to a named volume
- Apply resource limits (0.1 CPU, 64MB memory)
6. View Logs
Monitor real-time logs:
docker compose logs -f thinkcenter-monitor
Or view persistent logs from the volume:
docker volume inspect thinkcenter_logs
# Look at the Mountpoint directory
7. Stop or Restart
Stop the container:
docker compose down
Restart the container:
docker compose restart thinkcenter-monitor
Deploying Multiple Instances
To monitor multiple machines:
For Machine 2:
Create a separate directory:
mkdir thinkcentre-watchdog-machine2
cd thinkcentre-watchdog-machine2
# Copy files
cp /path/to/original/* .
# Create unique .env
cp .env.example .env
# Edit .env for machine 2
nano .env
# Change: HA_ENTITY=switch.machine2_power
# Change: TARGET_URL to machine 2's service URL
Then run:
docker compose up -d
Using Namespace (Alternative)
Or manage from one directory with unique service names:
docker compose -f docker-compose.yml -f docker-compose.machine2.yml up -d
Configuration Variables
| Variable | Default | Description |
|---|---|---|
TARGET_URL |
http://localhost:8080 |
Service URL to monitor |
HA_URL |
http://homeassistant:8123 |
Home Assistant base URL |
HA_TOKEN |
(required) | Home Assistant long-lived access token |
HA_ENTITY |
switch.thinkcentre_power |
Home Assistant switch entity ID |
GRACE_PERIOD |
300 |
Seconds to wait before power-cycling (5 minutes) |
CHECK_INTERVAL |
30 |
Seconds between health checks |
Troubleshooting
Container won't start
Check if HA_TOKEN is set:
docker compose config | grep HA_TOKEN
No logs appearing
Check the volume mount:
docker volume ls | grep thinkcenter_logs
docker volume inspect thinkcenter_logs
Power-cycle not triggering
- Verify HA_TOKEN is valid (check Home Assistant logs)
- Confirm HA_ENTITY exists in Home Assistant
- Check network connectivity:
docker compose exec thinkcenter-monitor curl -v http://homeassistant:8123
Service not responding correctly
Test the target URL directly:
docker compose exec thinkcenter-monitor curl -v http://your-service:8080
How It Works
- Health Check: Every
CHECK_INTERVALseconds, HTTP response code is checked - Grace Period: First 502 error triggers a 5-minute window for recovery
- Recovery Detection: If service returns non-502 during grace period, error resets
- Power Cycle: After grace period expires with continued 502s, power cycle triggers:
- Send turn_off to HA switch entity
- Wait 10 seconds
- Send turn_on to HA switch entity
- Logging: All events timestamped and logged to
/var/log/thinkcenter_monitor.log
Resource Limits
- CPU: 0.1 cores (limited to prevent resource hogging)
- Memory: 64MB (minimal requirements for bash + curl)
- Logging: JSON file driver, max 10MB per file, keeps 3 files (30MB total)
Debugging
Enable verbose output by checking logs with:
docker compose logs --tail 50 thinkcenter-monitor
To test the script locally (without Docker):
bash thinkcenter_monitor.sh
License
Monitoring solution for Thinkcentre machines.
Support
For issues or improvements, check the logs first and verify all environment variables are correctly set in your .env file.