性能测试工具 nGrinder 源码分析：自动中断测试任务

胡刚 · 2016年03月13日 · 最后由胡刚回复于 2016年06月02日 · 4875 次阅读

1.背景

在运行 nGrinder 任务时，会出现任务被系统中断，为什么会自动中断呢？是 bug 还是一种保护机制？本文通过源码分析的方式来解读。

2.源码分析

之前的一篇文章:性能测试工具 nGrinder 项目剖析及二次开发，介绍了 nGrinder 的整体架构，知道 ngrinder-core/src/main/java/net/grinder/SingleConsole.java 是将测试脚本发布到 Agent 并在其上执行性能测试，该类实现了收集测试过程中的采样数据，将数据分成两部分记录：

写入记录文件，供时序图展示（*.data 和 csv)
写入内存，供详细数据展示（private Map<String, Object> statisticData, 持久化到 DB）

其中将采样数据写到第一部分（*.data 和 csv）的实现方法如下：

/*
     * (non-Javadoc)
     *
     * @see
     * net.grinder.console.model.SampleListener#update(net.grinder.statistics
     * .StatisticsSet, net.grinder.statistics.StatisticsSet)
     */
    @Override
    public void update(final StatisticsSet intervalStatistics,
            final StatisticsSet cumulativeStatistics) {
        try {
            if (!capture) {
                return;
            }
            samplingCount++;
            long currentPeriod = cumulativeStatistics.getValue(getSampleModel()
                    .getPeriodIndex());
            setTpsValue(sampleModel.getTPSExpression().getDoubleValue(
                    intervalStatistics));
            checkTooLowTps(getTpsValues());
            updateStatistics(intervalStatistics, cumulativeStatistics);
            // 将采样数据写入csv数据
            // hugang
            writeIntervalCsvData(intervalStatistics);
            int interval = getSampleModel().getSampleInterval();
            long gap = 1;
            if (samplingCount == 1) {
                lastSamplingPeriod = currentPeriod;
            } else {
                lastSamplingPeriod = lastSamplingPeriod + interval;
                gap = ((currentPeriod - lastSamplingPeriod) / interval);
            }
            // Adjust sampling delay.. run write data multiple times... when it
            // takes longer than 1
            // sec.

            samplingLifeCycleListener
                    .apply(new Informer<SamplingLifeCycleListener>() {
                        @Override
                        public void inform(SamplingLifeCycleListener listener) {
                            listener.onSampling(getReportPath(),
                                    intervalStatistics, cumulativeStatistics);
                        }
                    });
            for (int i = 0; i < (gap + 1); i++) {
                final boolean lastCall = (samplingCount == 1 && i == 0)
                        || (samplingCount != 1 && i == gap);
                // 将采样数据写到*.data文件
                // hugang
                writeIntervalSummaryData(intervalStatistics, lastCall);
                if (interval >= (MIN_SAMPLING_INTERVAL_TO_ACTIVATE_TPS_PER_TEST)) {
                    writeIntervalSummaryDataPerTest(
                            intervalStatisticMapPerTest, lastCall);
                }
                samplingLifeCycleFollowupListener
                        .apply(new Informer<SamplingLifeCycleFollowUpListener>() {
                            @Override
                            public void inform(
                                    SamplingLifeCycleFollowUpListener listener) {
                                listener.onSampling(getReportPath(),
                                        intervalStatistics,
                                        cumulativeStatistics, lastCall);
                            }
                        });
            }
            checkTooManyError(cumulativeStatistics);
            lastSamplingPeriod = lastSamplingPeriod + (interval * gap);
        } catch (RuntimeException e) {
            LOGGER.error("Error occurred while updating the statistics : {}",
                    e.getMessage());
            LOGGER.debug("Details : ", e);
            throw e;
        }
    }

其中：


writeIntervalCsvData(intervalStatistics);

writeIntervalSummaryData(intervalStatistics, lastCall);

分别将采样数据写到 output.csv 和 *.data。

注意，有 2 个 check 方法：

checkTooLowTps(getTpsValues());
checkTooManyError(cumulativeStatistics);

checkTooLowTps(getTpsValues());会判断 1 分钟内 TPS 小于 0.01，如果为 true，将向 ConsoleShutdownListener 监听器发送停止信号。

/**
     * Check if the TPS is too low. the TPS is lower than 0.001 for 1 minutes,
     * It emits a shutdown event to the {@link ConsoleShutdownListener}
     *
     * @param tps
     *            current TPS
     */
    private void checkTooLowTps(double tps) {
        // If the tps is too low, which means the agents or scripts went wrong.
        if (tps < 0.001) {
            if (momentWhenTpsBeganToHaveVerySmall == 0) {
                momentWhenTpsBeganToHaveVerySmall = System.currentTimeMillis();
            } else if (new Date().getTime() - momentWhenTpsBeganToHaveVerySmall >= TOO_LOW_TPS_TIME) {
                LOGGER.warn(
                        "Stop the test because its tps is less than 0.001 for more than {} minitue.",
                        TOO_LOW_TPS_TIME / 60000);
                getListeners().apply(new Informer<ConsoleShutdownListener>() {
                    public void inform(ConsoleShutdownListener listener) {
                        listener.readyToStop(StopReason.TOO_LOW_TPS);
                    }
                });
                momentWhenTpsBeganToHaveVerySmall = 0;

            }
        } else {
            momentWhenTpsBeganToHaveVerySmall = 0;
        }
    }

private void checkTooManyError(StatisticsSet cumulativeStatistics):会判断 10s 内事务数错误率>=50%，如果为 true，通知监听器 ConsoleShutdownListener listener 终止任务。

/**
     * Check if too many error has been occurred. If the half of total
     * transaction is error for the last 10 secs. It notifies the
     * {@link ConsoleShutdownListener}
     *
     * @param cumulativeStatistics
     *            accumulated Statistics
     */
    private void checkTooManyError(StatisticsSet cumulativeStatistics) {
        StatisticsIndexMap statisticsIndexMap = getStatisticsIndexMap();
        long testSum = cumulativeStatistics.getCount(statisticsIndexMap
                .getLongSampleIndex("timedTests"));
        long errors = cumulativeStatistics.getValue(statisticsIndexMap
                .getLongIndex("errors"));
        // testSum 成功事务数， errors 失败事务数
        // hugang
        if (((double) (testSum + errors)) / 2 < errors) {
            if (lastMomentWhenErrorsMoreThanHalfOfTotalTPSValue == 0) {
                lastMomentWhenErrorsMoreThanHalfOfTotalTPSValue = System
                        .currentTimeMillis();
            } else if (isOverLowTpsThreshold()) {
                LOGGER.warn(
                        "Stop the test because the count of test error is more than"
                                + " half of total tps for last {} seconds.",
                        TOO_MANY_ERROR_TIME / 1000);
                getListeners().apply(new Informer<ConsoleShutdownListener>() {
                    public void inform(ConsoleShutdownListener listener) {
                        listener.readyToStop(StopReason.TOO_MANY_ERRORS);
                    }
                });
                lastMomentWhenErrorsMoreThanHalfOfTotalTPSValue = 0;
            }
        }
    }

3.总结

nGrinder 中断测试任务，是一种保护机制；当被测系统性能已经很差，nGrinder 通过自动中断任务，不再继续对该系统产生压力。

判断标准：

TPS 在 1 分钟内小于 0.001
事务错误率在 10s 内大于等于 50%

如果上述条件，某一条为 true，自动中断任务。

附：
nGrinder 的 debug 日志为：
/root/.ngrinder/logs/ngrinder.log
这里写图片描述

4 个赞

如果觉得我的文章对您有用，请随意打赏。您的支持将鼓励我继续创作！

打赏支持

共收到 30 条回复时间点赞

chengaomin #1 · 2016年03月14日

支持！！！望继续出好文

chengaomin #2 · 2016年03月14日

nGrinder 这个来源项目，是不是没人维护了？？最新版本还是 2014 的？

Yusufchang #3 · 2016年03月14日

手动点赞，持续关注大神~~

胡刚 #4 · 2016年03月14日 Author

#1 楼 @chengaomin 谢谢

胡刚 #27 · 2016年03月14日 Author

#2 楼 @chengaomin
还是有更新的，github 上更新记录：
Latest commit 263cf51 on Nov 18, 2015 @GwonGisoo GwonGisoo Merge pull request #86 from GwonGisoo/easy-script-generator

胡刚 #6 · 2016年03月14日 Author

#3 楼 @yusufchang 谢谢，喜欢文章，别忘了赞下文章

chengaomin #7 · 2016年03月15日

请教，比如我有 2 台 agent 在北京，2 台 agent 在上海，我一会儿进行压测，需要使用北京的机器，有办法指定吗？

胡刚 #8 · 2016年03月15日 Author

#7 楼 @chengaomin 测试配置中无法选择具体的代理，可以用一种变通的方法，可以进入代理管理页面，把你上海的 2 个 agent 选择未许可，剩下可用的 agent 就是北京的了。

chengaomin #9 · 2016年03月15日

#8 楼 @neven7 那还涉及到另外一个问题，这个平台能同时多个人运行测试吗？如果只有 2 台 agent。4 个人同时运行脚本，可否？

chengaomin #10 · 2016年03月15日

#8 楼 @neven7 还有，A 用户部署的 agent，B 用户是否可以使用？

chengaomin #11 · 2016年03月15日

#8 楼 @neven7 nGrinder 的集群模式有试过吗？？不知道能不能满足我北京、上海不同机房的问题 http://my.oschina.net/u/939534/blog/103943

胡刚 #12 · 2016年03月15日 Author

#9 楼 @chengaomin 支持多人运行，一个任务对应一或多个 agent，如果 agent 被一个任务占用，则新任务会阻塞，直到 agent 使用结束。2 台 agent 则只能同时执行 2 人同时执行，一人一个 agent。

胡刚 #13 · 2016年03月15日 Author

#10 楼 @chengaomin 可以

胡刚 #14 · 2016年03月15日 Author

#11 楼 @chengaomin 可以，有 2 种方式搭建集群： 1.Easy Clustering Guide； http://www.cubrid.org/wiki_ngrinder/entry/easy-clustering-guide 2.Advanced Clustering Guide； http://www.cubrid.org/wiki_ngrinder/entry/advanced-clustering-guide

chengaomin #15 · 2016年03月15日

#14 楼 @neven7 我上看了 1.Easy Clustering Guide； http://www.cubrid.org/wiki_ngrinder/entry/easy-clustering-guide 2.Advanced Clustering Guide； http://www.cubrid.org/wiki_ngrinder/entry/advanced-clustering-guide 企图搭建试试，然而没有成功。。。。启动会报错，不知道您是否有成功的经验

米阳MeYoung #16 · 2016年03月15日

膜拜大神，一直关注着。作为国内少有人使用的 nGrinder 的，需要更多深入去了解的人。

胡刚 #17 · 2016年03月16日 Author

#15 楼 @chengaomin 1.第一种比较简单，需要安装个 H2, 启个服务： wget http://www.h2database.com/h2-2016-01-21.zip
bin 路径下 java -cp h2-1.4.191.jar org.h2.tools.Server -tcp &，然后依次启；2.第二种比较复杂，需要搭建 NFS 和 Cubrid；
我觉得你如果是希望可以多人用，不够 agent，你可以自己写个 Dockerfile（可以参照它的 github 里的 docker 文件夹，有单机和集群，集群成本有些高），打个 image，不同人需要用的时候，docker run 一下就行。

chengaomin #18 · 2016年03月16日

#17 楼 @neven7 嗯，我还在尝试………还未成功

chengaomin #19 · 2016年03月16日

#17 楼 @neven7 jmeter 上，有个 constant throughput Timer，可以控制每分钟最多请求数来控制吞吐量，不知道 ngrinder 能实现吗

胡刚 #20 · 2016年03月17日 Author

#18 楼 @chengaomin 加油，不断尝试，才会有收获。

胡刚 #21 · 2016年03月17日 Author

#19 楼 @chengaomin QPS 是根据你测试配置里的虚拟用户数控制的，你设置 n 个，QPS 最大就是 n 请求/s；每分钟请求数 / 60 就是你测试配置里的虚拟用户数。

chengaomin #22 · 2016年03月17日

#21 楼 @neven7 在 nGrinder 里面的 TPS，和 jmeter 里面说的 QPS，是一个概念吗？？在我看来，虚拟用户的数量 n，qps 最大值不是 n 请求/s，可能比 n 大，虚拟用户是在不停的请求，说不定 1 个虚拟用户，每秒不一定只请求一下咯

chengaomin #23 · 2016年03月17日

#20 楼 @neven7 第一种 easy 模式终于起来了，一个大坑就是 h2 的数据库，必须是 2014 年那个版本，新版 2016 的会报错

胡刚 #24 · 2016年03月17日 Author

#23 楼 @chengaomin 赞

胡刚 #25 · 2016年03月17日 Author

#22 楼 @chengaomin TPS 系统每秒处理事务数，是用来反映系统的性能指标； QPS 是每秒请求量，模拟真实的请求量，阿里有个公式：日均 pv(万) *（80% / 20%）/ (24 * 3600) = 预估的 QPS; 虚拟用户 n（其实是线程），在 nGrinder 中，是指 n 个线程去执行@Test，可以理解为 n 个请求/s。

chengaomin #6 · 2016年03月17日

#25 楼 @neven7 感谢多日来的耐心答疑！

chengaomin #27 · 2016年03月23日

#25 楼 @neven7 又遇到个问题，python 的测试脚本，如何在本地调试呢

胡刚 #28 · 2016年03月27日 Author

#27 楼 @chengaomin 本地调试，无非是要拿到它依赖的包，你根据 nGrinder 自动生成的脚本文件，它所依赖的包，直接到网上找，下到本地就行。

米阳MeYoung #29 · 2016年04月01日

#27 楼 @chengaomin 你有尝试下载源码，自己打 war 包么？

30楼已删除

胡刚 #29 · 2016年06月02日 Author

#22 楼 @chengaomin 对不起，之前理解有误，更正下 25 楼中的解答 “ 虚拟用户 n（其实是线程），在 nGrinder 中，是指 n 个线程去执行@Test，可以理解为 n 个请求/s。” 如果@Test执行体中只定义了一个请求，并且没有思考时间，Vusers = TPS * RT; TPS 可以理解为系统的吞吐量 (也可以理解为 qps); ；当 TPS 的增长率小于响应时间的增长率时，这就是性能的拐点，也就是最合理的并发用户数；当 TPS 不再增长或者下降时，这个时候的压力就是最大的压力，所使用的并发用户数就是最大的并发用户数。如果此时的 TPS 不满足你的要求，那么就需要寻找瓶颈来优化。

需要登录后方可回复, 如果你还没有账号请点击这里注册。