1.背景

在运行 nGrinder 任务时,会出现任务被系统中断,为什么会自动中断呢?是 bug 还是一种保护机制?本文通过源码分析的方式来解读。

2.源码分析

之前的一篇文章:性能测试工具 nGrinder 项目剖析及二次开发,介绍了 nGrinder 的整体架构,知道 ngrinder-core/src/main/java/net/grinder/SingleConsole.java 是将测试脚本发布到 Agent 并在其上执行性能测试,该类实现了收集测试过程中的采样数据,将数据分成两部分记录:

  1. 写入记录文件,供时序图展示(*.data 和 csv)
  2. 写入内存,供详细数据展示(private Map<String, Object> statisticData, 持久化到 DB)

其中将采样数据写到第一部分(*.data 和 csv)的实现方法如下:

/*
     * (non-Javadoc)
     *
     * @see
     * net.grinder.console.model.SampleListener#update(net.grinder.statistics
     * .StatisticsSet, net.grinder.statistics.StatisticsSet)
     */
    @Override
    public void update(final StatisticsSet intervalStatistics,
            final StatisticsSet cumulativeStatistics) {
        try {
            if (!capture) {
                return;
            }
            samplingCount++;
            long currentPeriod = cumulativeStatistics.getValue(getSampleModel()
                    .getPeriodIndex());
            setTpsValue(sampleModel.getTPSExpression().getDoubleValue(
                    intervalStatistics));
            checkTooLowTps(getTpsValues());
            updateStatistics(intervalStatistics, cumulativeStatistics);
            // 将采样数据写入csv数据
            // hugang
            writeIntervalCsvData(intervalStatistics);
            int interval = getSampleModel().getSampleInterval();
            long gap = 1;
            if (samplingCount == 1) {
                lastSamplingPeriod = currentPeriod;
            } else {
                lastSamplingPeriod = lastSamplingPeriod + interval;
                gap = ((currentPeriod - lastSamplingPeriod) / interval);
            }
            // Adjust sampling delay.. run write data multiple times... when it
            // takes longer than 1
            // sec.

            samplingLifeCycleListener
                    .apply(new Informer<SamplingLifeCycleListener>() {
                        @Override
                        public void inform(SamplingLifeCycleListener listener) {
                            listener.onSampling(getReportPath(),
                                    intervalStatistics, cumulativeStatistics);
                        }
                    });
            for (int i = 0; i < (gap + 1); i++) {
                final boolean lastCall = (samplingCount == 1 && i == 0)
                        || (samplingCount != 1 && i == gap);
                // 将采样数据写到*.data文件
                // hugang
                writeIntervalSummaryData(intervalStatistics, lastCall);
                if (interval >= (MIN_SAMPLING_INTERVAL_TO_ACTIVATE_TPS_PER_TEST)) {
                    writeIntervalSummaryDataPerTest(
                            intervalStatisticMapPerTest, lastCall);
                }
                samplingLifeCycleFollowupListener
                        .apply(new Informer<SamplingLifeCycleFollowUpListener>() {
                            @Override
                            public void inform(
                                    SamplingLifeCycleFollowUpListener listener) {
                                listener.onSampling(getReportPath(),
                                        intervalStatistics,
                                        cumulativeStatistics, lastCall);
                            }
                        });
            }
            checkTooManyError(cumulativeStatistics);
            lastSamplingPeriod = lastSamplingPeriod + (interval * gap);
        } catch (RuntimeException e) {
            LOGGER.error("Error occurred while updating the statistics : {}",
                    e.getMessage());
            LOGGER.debug("Details : ", e);
            throw e;
        }
    }

其中:


writeIntervalCsvData(intervalStatistics);

writeIntervalSummaryData(intervalStatistics, lastCall);

分别将采样数据写到 output.csv 和 *.data。

注意,有 2 个 check 方法:

checkTooLowTps(getTpsValues());
checkTooManyError(cumulativeStatistics);

checkTooLowTps(getTpsValues());会判断 1 分钟内 TPS 小于 0.01,如果为 true,将向 ConsoleShutdownListener 监听器发送停止信号。

/**
     * Check if the TPS is too low. the TPS is lower than 0.001 for 1 minutes,
     * It emits a shutdown event to the {@link ConsoleShutdownListener}
     *
     * @param tps
     *            current TPS
     */
    private void checkTooLowTps(double tps) {
        // If the tps is too low, which means the agents or scripts went wrong.
        if (tps < 0.001) {
            if (momentWhenTpsBeganToHaveVerySmall == 0) {
                momentWhenTpsBeganToHaveVerySmall = System.currentTimeMillis();
            } else if (new Date().getTime() - momentWhenTpsBeganToHaveVerySmall >= TOO_LOW_TPS_TIME) {
                LOGGER.warn(
                        "Stop the test because its tps is less than 0.001 for more than {} minitue.",
                        TOO_LOW_TPS_TIME / 60000);
                getListeners().apply(new Informer<ConsoleShutdownListener>() {
                    public void inform(ConsoleShutdownListener listener) {
                        listener.readyToStop(StopReason.TOO_LOW_TPS);
                    }
                });
                momentWhenTpsBeganToHaveVerySmall = 0;

            }
        } else {
            momentWhenTpsBeganToHaveVerySmall = 0;
        }
    }


private void checkTooManyError(StatisticsSet cumulativeStatistics):会判断 10s 内事务数错误率>=50%,如果为 true,通知监听器 ConsoleShutdownListener listener 终止任务。

/**
     * Check if too many error has been occurred. If the half of total
     * transaction is error for the last 10 secs. It notifies the
     * {@link ConsoleShutdownListener}
     *
     * @param cumulativeStatistics
     *            accumulated Statistics
     */
    private void checkTooManyError(StatisticsSet cumulativeStatistics) {
        StatisticsIndexMap statisticsIndexMap = getStatisticsIndexMap();
        long testSum = cumulativeStatistics.getCount(statisticsIndexMap
                .getLongSampleIndex("timedTests"));
        long errors = cumulativeStatistics.getValue(statisticsIndexMap
                .getLongIndex("errors"));
        // testSum 成功事务数, errors 失败事务数
        // hugang
        if (((double) (testSum + errors)) / 2 < errors) {
            if (lastMomentWhenErrorsMoreThanHalfOfTotalTPSValue == 0) {
                lastMomentWhenErrorsMoreThanHalfOfTotalTPSValue = System
                        .currentTimeMillis();
            } else if (isOverLowTpsThreshold()) {
                LOGGER.warn(
                        "Stop the test because the count of test error is more than"
                                + " half of total tps for last {} seconds.",
                        TOO_MANY_ERROR_TIME / 1000);
                getListeners().apply(new Informer<ConsoleShutdownListener>() {
                    public void inform(ConsoleShutdownListener listener) {
                        listener.readyToStop(StopReason.TOO_MANY_ERRORS);
                    }
                });
                lastMomentWhenErrorsMoreThanHalfOfTotalTPSValue = 0;
            }
        }
    }

3.总结

nGrinder 中断测试任务,是一种保护机制;当被测系统性能已经很差,nGrinder 通过自动中断任务,不再继续对该系统产生压力。

判断标准:

  1. TPS 在 1 分钟内小于 0.001
  2. 事务错误率在 10s 内大于等于 50%

如果上述条件,某一条为 true,自动中断任务。

附:
nGrinder 的 debug 日志为:
/root/.ngrinder/logs/ngrinder.log
这里写图片描述


↙↙↙阅读原文可查看相关链接,并与作者交流