Prometheus Exporters
Делаю:
17.03.2021
Идеи позаимствованы из курса: “Alerting on Issues with Prometheus Alertmanager”
Alertmanager download - https://prometheus.io/download/#alertmanager
Запускаю Node Exporter
Устанавливаю Alertmanager
$ cd ~/tmp
$ wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz
$ tar xvfz alertmanager-0.21.0.linux-amd64.tar.gz
$ cd alertmanager-0.21.0.linux-amd64
$ ./alertmanager --config.file=alertmanager.yml > alert.out 2>&1 &
Demo: Connecting Prometheus to Alertmanager
В каталоге, где лежит и prometheus.yml
$ vi rules.yml
groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: Instance is down
Дефольный файл выглядит так:
prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['172.31.27.27:9100']
Финальный должен быть таким:
prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.1.9:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- 'rules.yml'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
- job_name: 'node_exporter'
static_configs:
- targets: ['192.168.1.9:9100']
Нужно зайти в контейнер и создать руками файл:
/etc/prometheus/rules.yml
И перестартовать docker-compose
$ docker-compose restart prometheus
Добавить файл как volume в docker-compose.yaml не получилось. Он собака монтировался как директория.
Node Exporter:
http://192.168.1.9:9100/metrics
Alert Manager
http://localhost:9093/
Prometheus
http://localhost:9091/
OK!
Sending Alerts with Receivers
Receiver documentation - https://prometheus.io/docs/alerting/latest/configuration/#receiver
Demo: Slack Receiver
alertmanager.yml for slack receiver
global:
#add your slack integration url below
slack_api_url: 'https://hooks.slack.com…'
route:
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#prometheus-alerts'
send_resolved: true
Reload Alertmanager config after changing alertmanager.yml
sudo killall -HUP alertmanager
Configuring an Email Receiver
alertmanager.yml for email receiver - gmail
global:
route:
receiver: 'my-gmail'
receivers:
- name: my-gmail
email_configs:
- to: [email protected]
send_resolved: true
from: [email protected]
smarthost: smtp.gmail.com:587
auth_username: [email protected]
auth_identity: [email protected]
auth_password: kfeydjkgighudfhe
alertmanager.yml for email receiver - other email service provider
global:
route:
receiver: 'email'
receivers:
- name: email
email_configs:
- to: [email protected]
send_resolved: true
from: [email protected]
smarthost: smtp-relay.sendinblue.com:587
auth_username: [email protected]
auth_identity: [email protected]
auth_password: 3jduJ74JurdkD9Fv
Demo: Webhook Receiver
alertmanager.yml for Zulip (https://zulip.com/integrations/doc/alertmanager)
global:
resolve_timeout: 1m
route:
receiver: 'zulip-notifications'
receivers:
- name: 'zulip-notifications'
webhook_configs:
- url: 'https://yourdomain.zulipchat.com/api/v1/external/alertmanager?api_key=adfaSDFES934asfdas8vasdvU37&stream=alertmanager'
send_resolved: true
Filtering, Managing, and Customizing Alerts
Managing Alerts with Routing
test-rules.yml
groups:
- name: sample-alerts
rules:
- alert: App1Slow
expr: 1
labels:
severity: warning
service: app1
annotations:
summary: App 1 is running slow
- alert: App1Down
expr: 1
labels:
severity: critical
service: app1
annotations:
summary: App 1 is down
- alert: App2Down
expr: 1
labels:
severity: critical
service: app2
annotations:
summary: App 2 is down
- alert: Server1LowDisk
expr: 1
labels:
severity: warning
service: servers
annotations:
summary: Low disk space on Server 1
- alert: Server2LowDisk
expr: 1
labels:
severity: warning
service: servers
annotations:
summary: Low disk space on Server 2
- alert: Server1Down
expr: 1
labels:
severity: critical
service: servers
annotations:
summary: Server1 is down
- alert: Server2Down
expr: 1
labels:
severity: critical
service: servers
annotations:
summary: Server 2 is down
- alert: NetworkDown
expr: 1
labels:
severity: critical
service: network
annotations:
summary: Network is down
You can validate the rules file using promtool
which is included in the prometheus
directory
./promtool check rules test-rules.yml
Make sure to reference the new test-rules.yml
file in prometheus.yml
. Also removed node_exporter
for this example.
prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_timeout: 10s
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "test-rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
Then start or restart prometheus however you configured it.
sudo service prometheus start
alertmanager.yml file
route:
receiver: 'email' #default
routes:
- match:
service: app1
reciever: 'z-dev-team1'
- match:
service: app2
reciever: 'z-dev-team2'
- match:
service: servers
reciever: 'z-server-team'
- match:
service: network
reciever: 'z-network-team'
receivers:
- name: 'z-dev-team1'
webhook_configs:
- url: 'https://yourdomain.zulipchat.com/api/v1/external/alertmanager?api_key=asdf&stream=DevTeam1'
send_resolved: true
- name: 'z-dev-team2'
webhook_configs:
- url: 'https://yourdomain.zulipchat.com/api/v1/external/alertmanager?api_key=asdf&stream=DevTeam2'
send_resolved: true
- name: 'z-server-team'
webhook_configs:
- url: 'https://yourdomain.zulipchat.com/api/v1/external/alertmanager?api_key=asdf&stream=Servers'
send_resolved: true
- name: 'z-network-team'
webhook_configs:
- url: 'https://yourdomain.zulipchat.com/api/v1/external/alertmanager?api_key=asdf&stream=Network'
send_resolved: true
- name: email
email_configs:
- to: [email protected]
send_resolved: true
from: [email protected]
smarthost: smtp-relay.sendinblue.com:587
auth_username: [email protected]
auth_identity: [email protected]
auth_password: 4jdisCHl043S2rNi
Validate file
./amtool check-config alertmanager.yml
Visualization for routing tree (be sure to remove any sensitive data before trying with your routes)
https://prometheus.io/webtools/alerting/routing-tree-editor/
Grouping Alerts
Alerts have some grouping by default. To turn off all grouping add the following under route
route:
group_by ['...']
Group by values of the service
label to the same receiver to create one message for all alerts that have the same value for service
alertmanager.yml
email:
route:
group_by: ['service']
receiver: 'email'
receivers:
- name: email
email_configs:
- to: [email protected]
send_resolved: true
from: [email protected]
smarthost: smtp-relay.sendinblue.com:587
auth_username: [email protected]
auth_identity: [email protected]
auth_password: 4jdisCHl043S2rNi
zulip:
route:
group_by: ['service']
receiver: 'z-alerts'
receivers:
- name: 'z-alerts'
webhook_configs:
- url: 'https://yourdomain.zulipchat.com/api/v1/external/alertmanager?api_key=asdf&stream=Alerts'
send_resolved: true
Add an alert with a new value for the service
label (app3
) and it will get grouped into its own message.
test-rules.yml
groups:
- name: sample-alerts
rules:
- alert: App3Down
expr: 1
labels:
severity: critical
service: app3
annotations:
summary: App 3 is down
Managing Alerts with Throttling and Repetition
Default values for Alertmanager delivery settings
route:
group_by: ['service']
group_wait: 30s #default
group_interval: 5m #default
repeat_interval: 4h #default
receiver: 'z-alerts'
Add an alert to observe group_wait
- alert: App5Down
expr: 1
labels:
severity: critical
service: app5
annotations:
summary: App 5 is down
Filtering Alerts with Inhibition and Silencing
alertmanager.yml
- inhibit_rules:
- source_match:
service: 'network'
target_match:
service: 'servers'
- source_match:
severity: 'critical’
target_match:
severity: 'warning'
equal: ['service']
Taylor Alerts with Notification Templates
Default Alertmanager template: https://github.com/prometheus/alertmanager/blob/master/template/default.tmpl
Notification template reference:
https://prometheus.io/docs/alerting/latest/notifications/
Adding an info line referencing the expression value
- alert: NetworkDown
expr: 1
labels:
severity: critical
service: network
annotations:
summary: Network is down
info: 'Expression evaluating at '
Override template values
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#prometheus-alerts'
send_resolved: true
text: 'Custom text message in Slack notification'
Creating a template file
/yourpath/alertmanager/templates/custom.tmpl
Custom text message in Slack notification from template file
alertmanager.yml
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
text: ''
templates:
- '/yourpath/alertmanager/templates/custom.tmpl'