性能测试工具 实时监控性能测试

king.yu · 2017年05月17日 · 最后由 KD 回复于 2018年11月22日 · 4715 次阅读
本帖已被设为精华帖!

性能实时监控

工具列表

  • logstash : 日志解析与过滤
  • elasticsearch : 日志检索与存储
  • kibana :日志实时查看
  • grafana : 前端报表展示
  • filebeat : 日志上传工具
  • dstat : 系统资源监控 (也可以采用 zabbix 监控)
  • jmxtrans : jmx 监控
  • influxdb : 时序数据库(存储 jmx 数据)

监控指标

  • jmxtrans(GC 数据收集:jconsole->Mbean->java.lang->GarbageCollector 提取 ObjectName)
    • HeapMemoryUsage
    • NonHeapMemoryUsage
    • CMS GC
    • PreNew GC
  • dstat
    • total-cpu-usage : usr、sys、idl、wai、hiq、siq
    • dsk/total : read 、writ
    • paging : in、out
    • interrupts
    • load-avg : 1m、5m、15m
    • memory-usage : used、buff、cach、free
    • net/total : recv、send
    • procs : run、blk、new
    • io/total : read、writ
    • swap : used、free
    • system : int、csw
  • tsung(tsung.log 的全部信息)
    • request,page,session : 10sec_count, 10sec_mean, 10sec_stddev, max, min, mean, count
    • size_sent,size_rcv,error : count(during the last 10sec), totalcount(since the beginning)

部署结构

elasticsearch

docker run --name myes  -p 9200:9200 -v /path/elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -d docker.elastic.co/elasticsearch/elasticsearch:5.4.0
如果是 rpm 包安装:
rpm -ivh elasticsearch.rpm
cp -r /etc/elasticsearch /usr/share/elasticsearch/config
groupadd elsearch
useradd elsearch -g elsearch -p elsearch
cd /usr/share  
chown -R elsearch:elsearch elasticsearch 
su elsearch
sudo -ichmod -R 775 config
bin/elasticsearch

kibana

docker run --name mykibana -v /path/kibana/kibana.yml:/usr/share/kibana/config/kibana.yml -p 5601:5601 -d docker.elastic.co/kibana/kibana:5.4.0

logstash

docker run --name mylogstash -v /path/pipeline/:/usr/share/logstash/pipeline/ -p 5044:5044 -e xpack.monitoring.enabled=false -d docker.elastic.co/logstash/logstash:5.4.0
vim /path/pipline/filebeat-pipline.conf
input {
  beats {
    port => 5044
  }
}

filter {
  if [type] == 'dstat'{
    csv{
      source => "message"
      columns => [ "cpu_usr","cpu_sys","cpu_idl","cpu_wai","cpu_hiq","cpu_siq","dsk_read","dsk_writ","paging_in","paging_out","interrupts_171","interrupts_172","interrupts_173","load_1m","load_5m","load_15m","memory_used","memory_buff","memory_cach","memory_free","net_recv","net_send","procs_run","procs_blk","procs_new","io_read","io_writ","swap_used","swap_free","system_int","system_csw"]
      convert => {  "cpu_usr" => "float"}
      convert => {  "cpu_sys" => "float"}
      convert => {  "cpu_idl" => "float"}
      convert => {  "cpu_wai" => "float"}
      convert => {  "cpu_hiq" => "float"}
      convert => {  "cpu_siq" => "float"}
      convert => {  "dsk_read" => "float"}
      convert => {  "dsk_writ" => "float"}
      convert => {  "paging_in" => "float"}
      convert => {  "paging_out" => "float"}
      convert => {  "interrupts_171" => "float"}
      convert => {  "interrupts_172" => "float"}
      convert => {  "interrupts_173" => "float"}
      convert => {  "load_1m" => "float"}
      convert => {  "load_5m" => "float"}
      convert => {  "load_15m" => "float"}
      convert => {  "memory_used" => "float"}
      convert => {  "memory_buff" => "float"}
      convert => {  "memory_cach" => "float"}
      convert => {  "memory_free" => "float"}
      convert => {  "net_recv" => "float"}
      convert => {  "net_send" => "float"}
      convert => {  "procs_run" => "float"}
      convert => {  "procs_blk" => "float"}
      convert => {  "procs_new" => "float"}
      convert => {  "io_read" => "float"}
      convert => {  "io_writ" => "float"}
      convert => {  "swap_used" => "float"}
      convert => {  "swap_free" => "float"}
      convert => {  "system_int" => "float"}
      convert => {  "system_csw" => "float"}
    }
  }else if [type] == 'tsung'{
    if [message] =~ "dump "{
       grok {
            match => { "message" => "[# stats]\:%{SPACE}[dump]+%{SPACE}%{USERNAME:at}%{SPACE}%{BASE10NUM:dump_time}"}
       }
       mutate {
         convert => { "dump_time" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }else if [message] =~ "cpu"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[{cpu,]+%{SPACE}%{DATA}"}
       }
    }else if [message] =~ "load"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[{load,]+%{SPACE}%{DATA}"}
       }
    }else if [message] =~ "freemem"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[{freemem,]+%{SPACE}%{DATA}"}
       }
    }else if [message] =~ "page"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[page]+%{SPACE}%{BASE10NUM:page_10sec_count}%{SPACE}%{BASE16FLOAT:page_10sec_mean}%{SPACE}%{BASE16FLOAT:page_10sec_stddev}%{SPACE}%{BASE16FLOAT:page_max}%{SPACE}%{BASE16FLOAT:page_min}%{SPACE}%{BASE16FLOAT:page_mean}%{SPACE}%{BASE16FLOAT:page_count}"}
       }
       mutate {
         convert => { "page_10sec_count" => "integer" }
         convert => { "page_10sec_mean" => "integer" }
         convert => { "page_10sec_stddev" => "integer" }
         convert => { "page_max" => "integer" }
         convert => { "page_min" => "integer" }
         convert => { "page_mean" => "integer" }
         convert => { "page_count" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }else if [message] =~ "request"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[request]+%{SPACE}%{BASE10NUM:request_10sec_count}%{SPACE}%{BASE16FLOAT:request_10sec_mean}%{SPACE}%{BASE16FLOAT:request_10sec_stddev}%{SPACE}%{BASE16FLOAT:request_max}%{SPACE}%{BASE16FLOAT:request_min}%{SPACE}%{BASE16FLOAT:request_mean}%{SPACE}%{BASE16FLOAT:request_count}"}
       }
       mutate {
         convert => { "request_10sec_count" => "integer" }
         convert => { "request_10sec_mean" => "integer" }
         convert => { "request_10sec_stddev" => "integer" }
         convert => { "request_max" => "integer" }
         convert => { "request_min" => "integer" }
         convert => { "request_mean" => "integer" }
         convert => { "request_count" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }else if [message] =~ "size_rcv"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[size_rcv]+%{SPACE}%{BASE10NUM:size_rcv_last_10sec}%{SPACE}%{BASE10NUM:size_rcv_totalcount}"}
       }
       mutate {
         convert => { "size_rcv_last_10sec" => "integer" }
         convert => { "size_rcv_totalcount" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }else if [message] =~ "size_sent"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[size_sent]+%{SPACE}%{BASE10NUM:size_sent_last_10sec}%{SPACE}%{BASE10NUM:size_sent_totalcount}"}
       }
       mutate {
         convert => { "size_sent_last_10sec" => "integer" }
         convert => { "size_sent_totalcount" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }else if [message] =~ "error_connect_etimedout"{
       grok {
            match => { "message" => "[stats]\:%{SPACE}[error_connect_etimedout]+%{SPACE}%{BASE10NUM:error_connect_etimedout_last_10sec}%{SPACE}%{BASE10NUM:error_connect_etimedout_totalcount}"}
       }
       mutate {
         convert => { "error_connect_etimedout_last_10sec" => "integer" }
         convert => { "error_connect_etimedout_totalcount" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }else if [message] =~ "connected"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[connected]+%{SPACE}%{BASE10NUM:connected_last_10sec}%{SPACE}%{BASE10NUM:connecte_totalcount}"}
       }
       mutate {
         convert => { "connected_last_10sec" => "integer" }
         convert => { "connecte_totalcount" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }else if [message] =~ "connect"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[connect]+%{SPACE}%{BASE10NUM:connect_10sec_count}%{SPACE}%{BASE16FLOAT:connect_10sec_mean}%{SPACE}%{BASE16FLOAT:connect_10sec_stddev}%{SPACE}%{BASE16FLOAT:connect_max}%{SPACE}%{BASE16FLOAT:connect_min}%{SPACE}%{BASE16FLOAT:connect_mean}%{SPACE}%{BASE16FLOAT:connect_count}"}
       }
       mutate {
         convert => { "connect_10sec_count" => "integer" }
         convert => { "connect_10sec_mean" => "integer" }
         convert => { "connect_10sec_stddev" => "integer" }
         convert => { "connect_max" => "integer" }
         convert => { "connect_min" => "integer" }
         convert => { "connect_mean" => "integer" }
         convert => { "connect_count" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }else if [message] =~ "finish_users_count"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[finish_users_count]+%{SPACE}%{BASE10NUM:finish_users_count_last_10sec}%{SPACE}%{BASE10NUM:finish_users_count_totalcount}"}
       }
       mutate {
         convert => { "finish_users_count_last_10sec" => "integer" }
         convert => { "finish_users_count_totalcount" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }else if [message] =~ "users_count"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[users_count]+%{SPACE}%{BASE10NUM:users_count_last_10sec}%{SPACE}%{BASE10NUM:users_count_totalcount}"}
       }
       mutate {
         convert => { "users_count_last_10sec" => "integer" }
         convert => { "users_count_totalcount" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }else if [message] =~ "users"{
       grok {
            match => { "message" => "%{USERNAME:stats}\:%{SPACE}[users]+%{SPACE}%{BASE10NUM:user_last_10sec}%{SPACE}%{BASE10NUM:user_totalcount}"}
       }
       mutate {
         convert => { "user_last_10sec" => "integer" }
         convert => { "user_totalcount" => "integer" }
         remove_field => [ "SPACE" ]
       }
    }
  }
}

output {
  elasticsearch {
    hosts => "host:port"
    user => elastic
    password => changeme
  }
}  

grafana

docker run --name grafana -p 3000:3000 -v /etc/localtime:/etc/localtime:ro -v /opt/docker_v/grafana/var/lib/garfana:/var/lib/grafana -d  grafana/grafana

filebeat

rpm -ivh filebeat-5.4.0-x86_64.rpm
vim /etc/filebeat/filebeat.yml
- input_type: log
  paths:
    - /path/dstat*.csv
  document_type: dstat
  exclude_lines: ['Dstat','Author','Host','Cmdline','total','usr']

#output.elasticsearch:
  # Array of hosts to connect to.
  # hosts: ["http://host:port"]

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["host:port"] 
sh filebeat.sh -e -c /etc/filebeat/filebeat.yml -d "Publish"

influxdb

yum localinstall influxdb-1.2.2.x86_64.rpm

jmxtrans

rpm -i jmxtrans-265.rpm
vim /var/lib/xx.json
{
    "servers": [
        {
            "port": "port",
            "host": "host",
            "queries": [
                {
                    "obj": "java.lang:type=GarbageCollector,name=PS MarkSweep",
                    "attr": [
                        "CollectionCount",
                        "CollectionTime"
                    ],
                    "resultAlias": "GarbageCollector_PS_MarkSweep",
                    "outputWriters": [
                        {
                            "@class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
                            "url": "http://host:port/",
                            "username": "root",
                            "password": "root",
                            "database": "jmxdb"
                        }
                    ]
                },
                {
                    "obj": "java.lang:type=GarbageCollector,name=PS Scavenge",
                    "attr": [
                        "CollectionCount",
                        "CollectionTime"
                    ],
                    "resultAlias": "GarbageCollector_PS_Scavenge",
                    "outputWriters": [
                        {
                            "@class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
                            "url": "http://host:port/",
                            "username": "root",
                            "password": "root",
                            "database": "jmxdb"
                        }
                    ]
                },
                {
                    "obj": "java.lang:type=Memory",
                    "attr": [
                        "HeapMemoryUsage",
                        "NonHeapMemoryUsage"
                    ],
                    "resultAlias": "JVM_Memory",
                    "outputWriters": [
                        {
                            "@class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
                            "url": "http://host:port/",
                            "username": "root",
                            "password": "root",
                            "database": "jmxdb"
                        }
                    ]
                },
                {
                    "obj": "java.lang:type=ClassLoading",
                    "attr": [
                        "TotalLoadedClassCount",
                        "LoadedClassCount",
                        "UnloadedClassCount"
                    ],
                    "resultAlias": "JVM_ClassLoading",
                    "outputWriters": [
                        {
                            "@class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
                            "url": "http://host:port/",
                            "username": "root",
                            "password": "root",
                            "database": "jmxdb"
                        }
                    ]
                }
            ]
        }
    ]
}
/usr/share/jmxtrans/bin/jmxtrans start

dstat

dstat -cdgilmnprsy --nocolor --noheaders --float --output /path/logs/"dstat-"`date "+%Y_%m_%d_%H_%M_%S"`".csv"  5

页面展示



如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
共收到 11 条回复 时间 点赞

想知道这一套下来,监控本身大概占用多少性能(包括 ELK)

天琴圣域 回复

应用服务器上只有 filebeat 和 dstat 的操作,filebeat 占单核 30% 左右,其他的都是单独虚机的部署

666...刚好需要

思寒_seveniruby 将本帖设为了精华贴 05月18日 01:14

先 mark 一下,备用

赞,dashboard 颜色可以调调,现在有点暗

14楼 已删除

这套工具有什么缺点?

12楼 已删除

RT 是什么

#11 楼 @dancingcat_ 响应时间

—— 来自 TesterHome 官方 安卓客户端

king.yu 回复

额 low 了 主要是下面的英文没看懂 😂

点个赞,回头用的时候再仔细研究学习下

需要 登录 后方可回复, 如果你还没有账号请点击这里 注册