性能测试工具 nGrinder 对监控机器收集自定义数据及源码分析

胡刚 · 2016年03月02日 · 最后由 小新 回复于 2022年04月13日 · 4234 次阅读

0.背景

性能测试工具 nGrinder 支持在无需修改源码的情况下,对目标服务器收集自定义数据,最多支持 5 类;

在性能测试详细报告页,目标服务器->你的机器 ip 便签页下,默认只收集 CPU, Memory, Received Byte/s, Sent Byte Per Secode/s 等 4 类数据;

可能你还需要监控其它的性能统计数据,用于分析 (比如 load, Full Gc);本文先介绍实现方法;再分析 nGrinder 源码,看它是怎么实现的。

1.实现

1-1. 安装 monitor

在你的 nGrinder 系统下,下载监控

这里写图片描述

安装到你测试服务所在的机器,解压 tar 包,执行 sh run_monitor_bg.sh;

其实脚本是启了个 java 服务,以 monitor 模式启动;

之前介绍过 Agent 有 2 种模式:

gent mode: 运行进程和线程,压测目标服务;

monitor mode: 监控目标系统性能 (cpu/memory)。

[root@10 ngrinder-monitor]# cat run_monitor.sh
#!/bin/sh
curpath=`dirname $0`
cd ${curpath}
java -server -cp "lib/*" org.ngrinder.NGrinderAgentStarter --mode monitor --command run $@

Agent 的 home 路径为/root/.ngrinder_agent,你在执行 sh run_monitor_bg.sh 默认获取的配置信息为/root/.ngrinder_agent/agent.conf; 如果加上-o,sh run_monitor_bg.sh 读取你安装 monitor 目录下的__agent.conf, 该配置文件定义了 Agent 的模式,ip, 端口。

[root@10 .ngrinder_agent]# cat agent.conf 
common.start_mode=monitor
#If you want to monitor bind to the different local ip not automatically selected ip. Specify below field.
#monitor.binding_port=hostname_or_ip
monitor.binding_port=13243

自定义数据需放在/root/.ngrinder_agent/monitor/custom.data 文件里,格式如下:

类型1数据,类型2数据,类型3数据,类型4数据,类型5数据

最多支持 5 类,每类数据用 “,” 分隔,注意的是: 数据是实时的写文件,不是累积数据到文件中 (类似 shell 中的>, 不是>>),即同一时刻,只有一行数据。

1-2. 定制收集脚本

以收集 load 和 full GC 为例:

[root@10 bin]# cat updateCustomData.sh 
#!/bin/sh
#@author hugang

customDataRoot=/root/.ngrinder_agent/monitor/custom.data;
#  获取load信息 
load=`/bin/cat /proc/loadavg | awk '{print $1}'`;
#  获取full gc count
if [[ $1 -gt 0  ]]; then
  fgc=`jstat -gcutil $1 | tail -1 | awk '{print $8}'`;
  echo $load,$fgc > $customDataRoot;
else
  echo $load > $customDataRoot;
fi;

开始性能测试时,每秒去执行该脚本,收集数据到 custom.data 中:

watch -n 1 sh updateCustomData.sh 5528

5528 为需监控 java 服务进程 pid;

当你性能测试结束后,monitor 收集的数据会放到/root/.ngrinder/perftest/0_999/${test_id}/report/monitor_system_${ip}.data 文件中:

[root@10 report]# cat monitor_system_10.13.1.139.data 
ip,system,collectTime,freeMemory,totalMemory,cpuUsedPercentage,receivedPerSec,sentPerSec,customValues
10.13.1.139,LINUX,20160302151441,97102768,132112072,26.895683,32954,27897,4.93,49
10.13.1.139,LINUX,20160302151443,97075896,132112072,30.513468,45702,32306,4.93,49
10.13.1.139,LINUX,20160302151445,97034772,132112072,30.411074,110306,65391,5.02,49
10.13.1.139,LINUX,20160302151447,96972504,132112072,22.073017,84813,57503,5.02,49
...

1-3.结果展示

这里写图片描述

2.源码分析

nGrinder 使用 Sigar 工具(https://support.hyperic.com/display/SIGAR/Home)收集系统信息,该工具可以收集以下数据:

System memory, swap, cpu, load average, uptime, logins
Per-process memory, cpu, credential info, state, arguments, environment, open files
File system detection and metrics
Network interface detection, configuration info and metrics
TCP and UDP connection tables
Network route table

sigar 工具(http://download.csdn.net/download/neven7/9450930)示例:

[root@10 testsigar]# ls
libsigar-amd64-linux.so  sigar-1.6.4.jar  sigar-1.6.4.jar.zip
[root@10 testsigar]# 
[root@10 testsigar]# java  -jar ./sigar-1.6.4.jar
sigar> free
             total       used       free
Mem:     132112072   96855372   35256700
-/+ buffers/cache:   34855500   97256572
Swap:      8388600     264980    8123620
RAM:      129016MB
sigar> 

收集系统数据的 java 文件为:
ngrinder-core/src/main/java/org/ngrinder/monitor/collector/SystemDataCollector.java

继承和实现关系:

SystemDataCollector extends DataCollector 

DataCollector implements Runnable

SystemDataCollector 的线程执行体:

public void run() {
        // 初始化sigar
        initSigar();
        SystemMonitoringData systemMonitoringData = (SystemMonitoringData) getMXBean(SYSTEM);
        // execute()通过sigar api获取系统信息
        systemMonitoringData.setSystemInfo(execute());
    }

execute() 获取系统信息 SystemInfo(System info object to save date collected by monitor):

/**
     * Execute the collector to get the system info model.
     *
     * @return SystemInfo in current time
     */
    public synchronized SystemInfo execute() {
        SystemInfo systemInfo = new SystemInfo();
        systemInfo.setCollectTime(System.currentTimeMillis());
        try {
            BandWidth networkUsage = getNetworkUsage();
            BandWidth bandWidth = networkUsage.adjust(prev.getBandWidth());
            systemInfo.setBandWidth(bandWidth);
            systemInfo.setCPUUsedPercentage((float) sigar.getCpuPerc().getCombined() * 100);
            Cpu cpu = sigar.getCpu();
            systemInfo.setTotalCpuValue(cpu.getTotal());
            systemInfo.setIdleCpuValue(cpu.getIdle());
            Mem mem = sigar.getMem();
            systemInfo.setTotalMemory(mem.getTotal() / 1024L);
            systemInfo.setFreeMemory(mem.getActualFree() / 1024L);
            systemInfo.setSystem(OperatingSystem.IS_WIN32 ? SystemInfo.System.WINDOW : SystemInfo.System.LINUX);
            systemInfo.setCustomValues(getCustomMonitorData());
        } catch (Throwable e) {
            LOGGER.error("Error while getting system perf data:{}", e.getMessage());
            LOGGER.debug("Error trace is ", e);
        }
        prev = systemInfo;
        return systemInfo;
    }

其中:getCustomMonitorData() 获取自定义数据,读取 custom.data 文件中一行数据

private String getCustomMonitorData() {
        if (customDataFile != null && customDataFile.exists()) {
            BufferedReader customDataFileReader = null;
            try {
                customDataFileReader = new BufferedReader(new FileReader(customDataFile));
                return customDataFileReader.readLine(); // these data will be parsed at
                // monitor client side.
            } catch (IOException e) {
                // Error here is very natural
                LOGGER.debug("Error to read custom monitor data", e);
            } finally {
                IOUtils.closeQuietly(customDataFileReader);
            }
        }
        return prev.getCustomValues();
    }

综上:类 SystemDataCollector 作用就是作为线程执行体,线程每次执行通过 sigar 获取系统信息:SystemInfo,赋值给 SystemMonitoringData 成员变量 SystemInfo。

前面介绍启动 monitor 时,其实是执行了 org.ngrinder.NGrinderAgentStarter 类,我们再分析下该文件,ngrinder-core/src/main/java/org/ngrinder/NGrinderAgentStarter.java

/**
     * Agent starter.
     *
     * @param args arguments
     */
    public static void main(String[] args) {
        NGrinderAgentStarter starter = new NGrinderAgentStarter();
        final NGrinderAgentStarterParam param = new NGrinderAgentStarterParam();
        checkJavaVersion();
        JCommander commander = new JCommander(param);
        commander.setProgramName("ngrinder-agent");
        commander.setAcceptUnknownOptions(true);
        try {
            commander.parse(args);
        } catch (Exception e) {
            LOG.error(e.getMessage());
            return;
        }
        final List<String> unknownOptions = commander.getUnknownOptions();
        modeParam = param.getModeParam();
        modeParam.parse(unknownOptions.toArray(new String[unknownOptions.size()]));

        if (modeParam.version != null) {
            System.out.println("nGrinder v" + getStaticVersion());
            return;
        }

        if (modeParam.help != null) {
            modeParam.usage();
            return;
        }

        System.getProperties().putAll(modeParam.params);
        starter.init();

        final String startMode = modeParam.name();
        if ("stop".equalsIgnoreCase(param.command)) {
            starter.stopProcess(startMode);
            System.out.println("Stop the " + startMode);
            return;
        }
        starter.checkDuplicatedRun(startMode);
        if (startMode.equalsIgnoreCase("agent")) {
            starter.startAgent();
        } else if (startMode.equalsIgnoreCase("monitor")) {
            starter.startMonitor();
        } else {
            staticPrintHelpAndExit("Invalid agent.conf, '--mode' must be set as 'monitor' or 'agent'.");
        }
    }

monitor 模式执行该方法:starter.startMonitor()


 /**
 * Start the performance monitor.
 */
public void startMonitor() {
    printLog("***************************************************");
    printLog("* Start nGrinder Monitor... ");
    printLog("***************************************************");
    try {
        MonitorServer.getInstance().init(agentConfig);
        MonitorServer.getInstance().start();
    } catch (Exception e) {
        LOG.error("ERROR: {}", e.getMessage());
        printHelpAndExit("Error while starting Monitor", e);
    }
}

MonitorServer.getInstance().start():

/**
 * Start monitoring.
 *
 * @throws IOException exception
 */
public void start() throws IOException {
    if (!isRunning()) {
        jmxServer.start();
        DataCollectManager.getInstance().init(agentConfig);
        DataCollectManager.getInstance().start();
        isRunning = true;
    }
}

DataCollectManager.getInstance().start();

/**
 * start a scheduler for the data collector jobs.
 */
public void start() {
    int collectorCount = MXBeanStorage.getInstance().getSize();
    scheduler = Executors.newScheduledThreadPool(collectorCount);
    if (!isRunning()) {
        Collection<MXBean> mxBeans = MXBeanStorage.getInstance().getMXBeans();
        for (MXBean mxBean : mxBeans) {
            DataCollector collector = mxBean.gainDataCollector(agentConfig.getHome().getDirectory());
            scheduler.scheduleWithFixedDelay(collector, 0L, getInterval(), TimeUnit.SECONDS);
            LOG.info("{} started.", collector.getClass().getSimpleName());
        }
        LOG.info("Collection interval : {}s).", getInterval());
        isRunning = true;
    }
}

scheduler.scheduleWithFixedDelay(collector, 0L, getInterval(), TimeUnit.SECONDS);

线程池周期地执行 SystemDataCollector 中 run() 去获取系统数据。

@Override
public void run() {
    initSigar();
    SystemMonitoringData systemMonitoringData = (SystemMonitoringData) getMXBean(SYSTEM);
    systemMonitoringData.setSystemInfo(execute());
}

3.总结:

后台启动的 monitor, 运行的是一个 java 服务:

java -server -cp lib/* org.ngrinder.NGrinderAgentStarter --mode monitor --command run

通过线程池周期获取系统性信息 (sigar 工具获取),存放在 SystemInfo;

ngrinder-controller/src/main/java/org/ngrinder/perftest/service/samplinglistener/MonitorCollectorPlugin.java 中 startSampling():

@Override
    public void startSampling(final ISingleConsole singleConsole, PerfTest perfTest,
                              IPerfTestService perfTestService) {
        final List<String> targetHostIP = perfTest.getTargetHostIP();
        final Integer samplingInterval = perfTest.getSamplingInterval();
        for (final String target : targetHostIP) {
            scheduledTaskService.runAsync(new Runnable() {
                @Override
                public void run() {
                    LOGGER.info("Start JVM monitoring for IP:{}", target);
                    MonitorClientService client = new MonitorClientService(target, MonitorCollectorPlugin.this.port);
                    client.init();
                    if (client.isConnected()) {
                        File testReportDir = singleConsole.getReportPath();
                        File dataFile = null;
                        try {
                            dataFile = new File(testReportDir, MONITOR_FILE_PREFIX + target + ".data");
                            FileWriter fileWriter = new FileWriter(dataFile, false);
                            BufferedWriter bw = new BufferedWriter(fileWriter);
                            // write header info
                            bw.write(SystemInfo.HEADER);
                            bw.newLine();
                            bw.flush();
                            clientMap.put(client, bw);
                        } catch (IOException e) {
                            LOGGER.error("Error to write to file:{}, Error:{}", dataFile.getPath(), e.getMessage());
                        }
                    }
                }
            });
        }
        assignScheduledTask(samplingInterval);
    }


根据 SystemInfo 写到/root/.ngrinder/perftest/0_999/${test_id}/report/monitor_system_${ip}.data 文件中;

ngrinder-controller/src/main/java/org/ngrinder/perftest/PerfTestService.java 中 getMonitorGraph() 根据/root/.ngrinder/perftest/0_999/${test_id}/report/monitor_system_${ip}.data 获取系统信息数据

 /**
 * Get system monitor data and wrap the data as a string value like "[22,11,12,34,....]", which can be used directly
 * in JS as a vector.
 *
 * @param testId       test id
 * @param targetIP     ip address of the monitor target
 * @param dataInterval interval value to get data. Interval value "2" means, get one record for every "2" records.
 * @return return the data in map
 */
public Map<String, String> getMonitorGraph(long testId, String targetIP, int dataInterval) {
    Map<String, String> returnMap = Maps.newHashMap();
    File monitorDataFile = new File(config.getHome().getPerfTestReportDirectory(String.valueOf(testId)),
            MONITOR_FILE_PREFIX + targetIP + ".data");
    BufferedReader br = null;
    try {

        StringBuilder sbUsedMem = new StringBuilder("[");
        StringBuilder sbCPUUsed = new StringBuilder("[");
        StringBuilder sbNetReceived = new StringBuilder("[");
        StringBuilder sbNetSent = new StringBuilder("[");
        StringBuilder customData1 = new StringBuilder("[");
        StringBuilder customData2 = new StringBuilder("[");
        StringBuilder customData3 = new StringBuilder("[");
        StringBuilder customData4 = new StringBuilder("[");
        StringBuilder customData5 = new StringBuilder("[");

        br = new BufferedReader(new FileReader(monitorDataFile));
        br.readLine(); // skip the header.
        // "ip,system,collectTime,freeMemory,totalMemory,cpuUsedPercentage,receivedPerSec,sentPerSec"
        String line = br.readLine();
        int skipCount = dataInterval;
        // to be compatible with previous version, check the length before
        // adding
        while (StringUtils.isNotBlank(line)) {
            if (skipCount < dataInterval) {
                skipCount++;
            } else {
                skipCount = 1;
                String[] datalist = StringUtils.split(line, ",");
                if ("null".equals(datalist[4]) || "undefined".equals(datalist[4])) {
                    sbUsedMem.append("null").append(",");
                } else {
                    sbUsedMem.append(Long.valueOf(datalist[4]) - Long.valueOf(datalist[3])).append(",");
                }
                addCustomData(sbCPUUsed, 5, datalist);
                addCustomData(sbNetReceived, 6, datalist);
                addCustomData(sbNetSent, 7, datalist);
                addCustomData(customData1, 8, datalist);
                addCustomData(customData2, 9, datalist);
                addCustomData(customData3, 10, datalist);
                addCustomData(customData4, 11, datalist);
                addCustomData(customData5, 12, datalist);
                line = br.readLine();
            }
        }
        completeCustomData(returnMap, "cpu", sbCPUUsed);
        completeCustomData(returnMap, "memory", sbUsedMem);
        completeCustomData(returnMap, "received", sbNetReceived);
        completeCustomData(returnMap, "sent", sbNetSent);
        completeCustomData(returnMap, "customData1", customData1);
        completeCustomData(returnMap, "customData2", customData2);
        completeCustomData(returnMap, "customData3", customData3);
        completeCustomData(returnMap, "customData4", customData4);
        completeCustomData(returnMap, "customData5", customData5);
    } catch (IOException e) {
        LOGGER.info("Error while getting monitor {} data file at {}", targetIP, monitorDataFile);
    } finally {
        IOUtils.closeQuietly(br);
    }
    return returnMap;
}

数据提供给 Controller 端:
ngrinder-controller/src/man/java/org/ngrinder/perftest/controller/PerfTestController.java

private Map<String, String> getMonitorGraphData(long id, String targetIP, int imgWidth) {
    int interval = perfTestService.getMonitorGraphInterval(id, targetIP, imgWidth);
    Map<String, String> sysMonitorMap = perfTestService.getMonitorGraph(id, targetIP, interval);
    PerfTest perfTest = perfTestService.getOne(id);
    sysMonitorMap.put("interval", String.valueOf(interval * (perfTest != null ? perfTest.getSamplingInterval() : 1)));
    return sysMonitorMap;
}

/**
 * Get the monitor data of the target having the given IP.
 *
 * @param id       test Id
 * @param targetIP targetIP
 * @param imgWidth image width
 * @return json message
 */
@RestAPI
@RequestMapping("/api/{id}/monitor")
public HttpEntity<String> getMonitorGraph(@PathVariable("id") long id,
                                          @RequestParam("targetIP") String targetIP, @RequestParam int imgWidth) {
    return toJsonHttpEntity(getMonitorGraphData(id, targetIP, imgWidth));
}
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
共收到 8 条回复 时间 点赞

大神,想请教下,有没有遇到下面这种问题:

2016-03-11 17:15:06,337 INFO agent daemon : worker PVGN50897276A-0 started
2016-03-11 17:15:36,560 INFO agent daemon : received a stop message
2016-03-11 17:15:36,561 INFO agent daemon : Don't start anymore by message from controller.
2016-03-11 17:15:39,289 INFO agent daemon : received a stop message
2016-03-11 17:15:41,807 INFO agent daemon : received a stop message
2016-03-11 17:15:41,807 INFO agent daemon : Terminating unresponsive processes by force
2016-03-11 17:15:41,994 INFO agent daemon : All workers are finished
2016-03-11 17:15:41,994 INFO agent daemon : communication shut down
2016-03-11 17:15:41,994 INFO agent daemon : Test shuts down.
2016-03-11 17:15:41,994 INFO agent controller : Send log for test_3
2016-03-11 17:15:42,010 INFO agent controller : Clean up the perftest logs

就是 controller 端正常启动并在 console 页面添加 test,点击 start 后 test 正常启动并开始生成 report,但是不到 30s 后 agent 端收到异常终止消息,随即立即终止了 test。

可以确定本人没有发送任何终止消息,尝试多次均如此,期间还试过删除本机所有 nGrinder 相关的目录,重新部署启动了 controller 和 agent,但仍出现上述情况。。。

实在找不出是什么原因造成的,stop message 是在什么情况下被发送出去的,或者哪里还有更详细的 log 可以查出原因?
谢谢~~

#1 楼 @yusufchang

nGrinder 自动中断测试任务条件:
TPS 在 1 分钟内小于 0.001
事务错误率在 10s 内大于等于 50%; 详见: https://testerhome.com/topics/4382

nGrinder 原代码,获取 cpu 内存都是通过 Sigar 库来解决,但是有个问题,我去查阅了这个 Sigar 的 cpu 类 api,未能找到获取 cpu load 值的方式,不晓得是我查阅有误还是就是不支持。

#4 楼 @neven7 你好, 我看到你说的 ngrinder 的工具也想自己部署实施,可是我的环境就是起不来:
MAC OSX 10.12.2 ; JDK 1.8 ; TOMCAT 7.075 ; ngrinder 3.4.1(支持 JDK1.8)
tomcat 运行正常,但是运行 ngrinder-controller-3.4.1.war ,访问http://localhost:8080/ngrinderhttp://localhost:8080/ngrinder-controller就 404
日志报错:

ERROR ContextLoader.java:307 : Context initialization failed
org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Failed to import bean definitions from relative location [applicationContext-springdata.xml]
Offending resource: class path resource [applicationContext.xml]; nested exception is org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Failed to read candidate component class: file [/Users/a58/Library/Tomcat/webapps/ngrinder-controller-3.3/WEB-INF/classes/org/ngrinder/home/controller/HomeController$1.class]; nested exception is java.lang.ArrayIndexOutOfBoundsException: 1320
Offending resource: class path resource [applicationContext-springdata.xml]; nested exception is java.lang.ArrayIndexOutOfBoundsException: 1320
    at org.springframework.beans.factory.parsing.FailFastProblemReporter.error(FailFastProblemReporter.java:68) ~[spring-beans-3.1.0.RELEASE.jar:3.1.0.RELEASE]
    at org.springframework.beans.factory.parsing.ReaderContext.error(ReaderContext.java:85) ~[spring-beans-3.1.0.RELEASE.jar:3.1.0.RELEASE]
    at org.springframework.beans.factory.parsing.ReaderContext.error(ReaderContext.java:76) ~[spring-beans-3.1.0.RELEASE.jar:3.1.0.RELEASE]
    at org.springframework.beans.factory.xml.DefaultBeanDefinitionDocumentReader.importBeanDefinitionResource(DefaultBeanDefinitionDocumentRea

#5 楼 @softblank 你说用到了版本 3.4.1,为什么错误栈中"/Users/a58/Library/Tomcat/webapps/ngrinder-controller-3.3" 提示是 3.3 的版本?

#5 楼 @softblank http code 404,标识你的请求不存在,因为你在 tomcat 下 webapp 部署的文件名为 ngrinder-controller-3.3,你请求的 URL 应该是: http://localhost:8080/ngrinder-controller-3.3

胡哥,我这边有个问题,我也不知道该咋办了,我是拉取的源码下来整合的。现在是可以使用了,但我一直没关注过这个监控,就是查看 load、full gc 的数据。步骤是这样的,我先下载了监控后,然后修改了 monitor.binding_ip 改为我自己的 conteroller 的 ip,然后到 root 下将原来的.ngrinder_agent 文件夹删掉,再启动 run_monitor.sh。我查看了一下,端口启动了。我就启动了我 Agent 执行机,开始测试了,测试的时候执行机显示是正常的,执行完毕后我去查看 full gc 信息,我点击他会一闪而过,白屏。这是啥情况啊,这块代码我也没碰过

需要 登录 后方可回复, 如果你还没有账号请点击这里 注册