Prometheus 是由 SoundCloud 开发的开源监控报警系统和时序列数据库 (TSDB)。Prometheus 使用 Go 语言开发,是 Google BorgMon 监控系统的开源版本。
2016 年由 Google 发起 Linux 基金会旗下的原生云基金会 (Cloud Native Computing Foundation), 将 Prometheus 纳入其下第二大开源项目。
Prometheus 目前在开源社区相当活跃。
Prometheus 和 Heapster(Heapster 是 K8S 的一个子项目,用于获取集群的性能数据。) 相比功能更完善、更全面。Prometheus 性能也足够支持上万台规模的集群。
多维度数据模型
高效灵活的查询语句
不依赖分布式存储,单个服务器节点是自主的
通过基于 HTTP 的 pull 方式采集时序数据
可以通过中间网关进行时序列数据推送
通过服务发现或者静态配置来发现目标服务对象
支持多种多样的图表和界面展示,比如 Grafana 等
promethues server:主要获取和存储时间序列数据
exporters:主要作为 agent 收集数据发送到 prometheus server,不同的数据收集由不同的 exporters 实现,如监控服务器有 node-exporters,redis 有 redis-exporter。
更多 exporters 可参看EXPORTERS AND INTEGRATIONS
对应端口号Default port allocations · prometheus/prometheus Wiki · GitHub
pushgateway:允许短暂和批处理的 jobs 推送它们的数据到 prometheus;由于这类工作的存在时间不长,需要他们主动将数据推送到 pushgateway,然后由 pushgateway 将数据发送给 prometheus。
alertmanager:实现 prometheus 的告警功能。
展示方式:快速灵活的客户端图表,面板插件有许多不同方式的可视化指标和日志,官方库中具有丰富的仪表盘插件,比如热图、折线图、图表等多种展示方式;
数据源:Graphite,InfluxDB,OpenTSDB,Prometheus,Elasticsearch,CloudWatch 和 KairosDB 等;
通知提醒:以可视方式定义最重要指标的警报规则,Grafana 将不断计算并发送通知,在数据达到阈值时通过 Slack、PagerDuty 等获得通知;
混合展示:在同一图表中混合使用不同的数据源,可以基于每个查询指定数据源,甚至自定义数据源;
注释:使用来自不同数据源的丰富事件注释图表,将鼠标悬停在事件上会显示完整的事件元数据和标记;
过滤器:Ad-hoc 过滤器允许动态创建新的键/值过滤器,这些过滤器会自动应用于使用该数据源的所有查询
wget https://github.com/prometheus/prometheus/releases/download/v*/prometheus-*.*-amd64.tar.gz
tar xvf prometheus-*.*-amd64.tar.gz
cd prometheus-*
nohup ./prometheus --config.file=./prometheus.yml &
wget https://dl.grafana.com/oss/release/grafana-7.5.6-1.x86_64.rpm
sudo yum install grafana-7.5.6-1.x86_64.rpm
cd /opt
mkdir -p prometheus/config/
mkdir -p grafana/data
chmod 777 grafana/data
mkdir -p /data/prometheus
chmod 777 /data/prometheus
prometheus.yml
配置文件cd /opt/prometheus/config/
touch prometheus.yml
prometheus.yml
配置文件#my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
#scrape_timeout is set to the global default (10s).
#Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
#- alertmanager:9093
#Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
#- "first_rules.yml"
#- "second_rules.yml"
#A scrape configuration containing exactly one endpoint to scrape:
#Here it's Prometheus itself.
scrape_configs:
#The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
#metrics_path defaults to '/metrics'
#scheme defaults to 'http'.
static_configs:
- targets: ['192.168.9.140:9090']
- job_name: "node"
static_configs:
- targets: ["192.168.9.140:9100"]
- job_name: "qianmingyanqian"
static_configs:
- targets: ["11.12.108.226:9100","11.12.108.225:9100"]
## config for the multiple Redis targets that the exporter will scrape
- job_name: "redis_exporter_targets"
scrape_interval: 5s
static_configs:
- targets:
- redis://192.168.9.140:6379
- redis://192.168.9.140:7001
- redis://192.168.9.140:7004
metrics_path: /scrape
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.9.140:9121
docker-compose_prometheus_grafana.yml
配置文件cd /opt
mkdir docker-compose
touch docker-compose_prometheus_grafana.yml
docker-compose_prometheus_grafana.yml
文件并键入version: '2'
networks:
monitor:
driver: bridge
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /opt/prometheus/config:/etc/prometheus
- /data/prometheus:/prometheus
ports:
- "9090:9090"
expose:
- "8086"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--log.level=info'
- '--web.listen-address=0.0.0.0:9090'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention=15d'
- '--query.max-concurrency=50'
networks:
- monitor
grafana:
image: grafana/grafana:latest
container_name: grafana
hostname: grafana
restart: always
volumes:
- /opt/grafana/data:/var/lib/grafana
ports:
- "3000:3000"
- "26:26"
networks:
- monitor
depends_on:
- prometheus
docker-compose -p prometheus_grafana -f docker-compose_prometheus_grafana.yml up -d
http://192.168.9.140:9090/
访问 prometheus 服务http://192.168.9.140:9090/config
访问 prometheus 配置信息wget https://github.com/prometheus/node_exporter/releases/download/v*/node_exporter-*.*-amd64.tar.gz
tar xvfz node_exporter-*.*-amd64.tar.gz
cd node_exporter-*.*-amd64
nohup ./node_exporter &
docker-compose_node-exporter.yml
配置文件cd /opt/docker-compose
touch docker-compose_node-exporter.yml
docker-compose_node-exporter.yml
配置文件如下:---
version: '3.8'
services:
node_exporter:
image: quay.io/prometheus/node-exporter:latest
container_name: node_exporter
command:
- '--path.rootfs=/host'
network_mode: host
pid: host
restart: unless-stopped
volumes:
- '/:/host:ro,rslave'
docker-compose -p node_exporter -f docker-compose_node-exporter.yml up -d
curl http://192.168.9.140:9100/metrics
# HELP node_xfs_read_calls_total Number of read(2) system calls made to files in a filesystem.
# TYPE node_xfs_read_calls_total counter
node_xfs_read_calls_total{device="dm-1"} 10196
node_xfs_read_calls_total{device="dm-2"} 17401
node_xfs_read_calls_total{device="dm-3"} 970
node_xfs_read_calls_total{device="dm-4"} 10
node_xfs_read_calls_total{device="dm-5"} 19
node_xfs_read_calls_total{device="dm-6"} 132
node_xfs_read_calls_total{device="sda2"} 16378
node_xfs_read_calls_total{device="sda3"} 2.67817784e+09
node_xfs_read_calls_total{device="sda6"} 1.053587e+06
prometheus.yml
追加以下内容- job_name: "node"
static_configs:
- targets: ["192.168.9.140:9100"]
prometheus
容器docker restart CONTAINER ID
http://192.168.9.140:9090
,如图所示,并保存https://grafana.com/grafana/dashboards/1860
,并 load 成功wget https://github.com/oliver006/redis_exporter/releases/download/v1.23.1/redis_exporter-v1.23.1.linux-386.tar.gz
tar zxvf redis_exporter-v1.23.1.linux-386.tar.gz
nohup ./redis_exporter -redis.addr 192.168.9.140:6379 -redis.password 111111 &
docker run -d --name redis_exporter -p 9121:9121 oliver006/redis_exporter --redis.addr=192.168.9.140:6379 --redis.password=111111
prometheus.yml
追加以下内容## config for scraping the exporter itself
- job_name: 'redis_exporter'
scrape_interval: 5s
static_configs:
- targets:[192.168.9.140:9121]```
- redis集群,`prometheus.yml`追加以下内容
```yaml
## config for the multiple Redis targets that the exporter will scrape
- job_name: "redis_exporter_targets"
scrape_interval: 5s
static_configs:
- targets:
- redis://192.168.9.140:6379
- redis://192.168.9.140:7001
- redis://192.168.9.140:7004
metrics_path: /scrape
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.9.140:9121
prometheus
容器docker restart CONTAINER ID
https://grafana.com/grafana/dashboards/11835
,并 load 成功root@localhost 14:43: [(none)]>CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'mysql_exporter';
Query OK, 0 rows affected (0.04 sec)
root@localhost 14:43: [(none)]>GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
Query OK, 0 rows affected (0.03 sec)
my.cnf
配置文件cd /opttouch .my.cnf
vim .my.cnf
[ client ]
user = exporter
password = mysql_exporter
wget https://github.com/prometheus/mysqld_exporter/releases/download/v*/mysqld_exporter-*.*-amd64.tar.gz
tar xvfz mysqld_exporter-*.*-amd64.tar.gz
cd mysqld_exporter-*.*-amd64
nohup ./mysqld_exporter --config.my-cnf=/opt/.my.cnf &
docker-compose_mysqld-exporter.yml
配置文件cd /opt/docker-composetouch docker-compose_mysqld-exporter.yml
docker-compose_mysqld-exporter.yml
配置文件如下:version: '2'
networks:
monitor:
driver: bridge
services:
mysql-exporter:
image: prom/mysqld-exporter
container_name: mysql-exporter
hostname: mysql-exporter
restart: always
ports:
- "9104:9104"
networks:
- my-mysql-network
environment:
DATA_SOURCE_NAME: "exporter:mysql_exporter@(192.168.9.140:3306)/"
networks:
my-mysql-network:
driver: bridge
docker-compose -p mysql_exporter -f docker-compose_mysqld-exporter.yml up -d
prometheus.yml
追加以下内容- job_name: 'mysql'
static_configs:
- targets: ['192.168.9.140:9104']
labels:
instance: mysql
prometheus
容器docker restart CONTAINER ID
https://grafana.com/grafana/dashboards/11323
,并 load 成功如果只想单纯监控 MySQL 或 MongoDB,可以考虑 PMM(Percona Monitoring and Management)监控工具,添加了慢查询收集等额外功能
docker-compose_cadvisor.yml
配置文件cd /opt/docker-compose
touch docker-compose_cadvisor.yml
docker-compose_cadvisor.yml
配置文件如下:version: '3.2'
services:
cadvisor:
image: google/cadvisor:latest
container_name: cadvisor
restart: unless-stopped
ports:
- '18080:8080'
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
docker-compose -p cadvisor -f docker-compose_cadvisor.yml up -d
http://11.12.110.38:18080/containers/
prometheus.yml
追加以下内容- job_name: 'cadvisor'
scrape_interval: 5s
static_configs:
- targets: ['192.168.9.140:18080']
prometheus
容器docker restart CONTAINER ID
https://grafana.com/grafana/dashboards/8321
,并 load 成功Warning: Error fetching server time: Detected 785.6099998950958 seconds time difference between your browser and the server.
ntpdate time3.aliyun.com
prometheus.yml 配置不正确可能会提示 too many redis instances
redis_exporter 启动后通过 Prometheus 服务 web 页面/targets 下scrape也可能会出现以下错误,此时一般为 redis_exporter 启动时密码配置不正确,如部分 redis 实例不需要密码强行配置密码,或者 redis 实例需要密码而没有配置密码
- redis_exporter_last_scrape_error{err="dial redis: unknown network redis"} 1
ERROR 1290 (HY000): The MySQL server is running with the --skip-grant-tables option so it cannot execute this statement
flush privileges;