Chaos Toolkit- 简化做混乱工程实验工具

Chaos Toolkit 是一个工具用来简化做混乱工程实验. 以下是官方文档中关于这个这个工具的说明:

The Chaos Toolkit aims to be the simplest and easiest way to explore building your own Chaos Engineering Experiments.

下面通过官方提供的示例应用来说明如何使用 chaos toolkit.

Installation

mkdir demo && cd demo
pipenv --python 3.7
# pipenv install -U pip
pip3 install -U chaostoolkit
pip3 install -U chaostoolkit-kubernetes
pip3 install -U chaostoolkit-reporting
brew install  cairo pango gdk-pixbuf libffi

启动应用

使用chaostoolkit-demo 中的 a-simple-walkthrough中的文件复制到 demo 目录.

安装依赖:

pipenv install astral
pipenv install cherrypy
pipenv install pytz
pipenv install requests

运行:

cp valid-cert.pem cert.pem
nohup python3 astre.py &
nohup python3 subset.py & 

运行 chaos 测试

运行 chaos:

chaos run experiment.json

result:

[2019-05-20 14:35:29 INFO] Validating the experiment's syntax
[2019-05-20 14:35:29 INFO] Experiment looks valid
[2019-05-20 14:35:29 INFO] Running experiment: What is the impact of an expired certificate on our application chain?
[2019-05-20 14:35:29 INFO] Steady state hypothesis: Application responds
[2019-05-20 14:35:29 INFO] Probe: the-astre-service-must-be-running
[2019-05-20 14:35:29 CRITICAL] Steady state probe 'the-astre-service-must-be-running' is not in the given tolerance so failing this experiment
[2019-05-20 14:35:29 INFO] Let's rollback...
[2019-05-20 14:35:29 INFO] Rollback: swap-to-vald-cert
[2019-05-20 14:35:29 INFO] Action: swap-to-vald-cert
[2019-05-20 14:35:29 INFO] Rollback: None
[2019-05-20 14:35:29 INFO] Action: restart-astre-service-to-pick-up-certificate
[2019-05-20 14:35:29 INFO] Rollback: None
[2019-05-20 14:35:29 INFO] Action: restart-sunset-service-to-pick-up-certificate
[2019-05-20 14:35:29 INFO] Pausing after activity for 1s...
[2019-05-20 14:35:30 INFO] Experiment ended with status: failed

有问题实验的信息:

[2019-05-20 14:35:29 CRITICAL] Steady state probe 'the-astre-service-must-be-running' is not in the given tolerance so failing this experiment

生成报告

chaos report --export-format=html journal.json report.html

通过 experiment.json 了解此次实验的目的

关于如果 chaos 实验的都是通过 experirment.json 这个文件来描述的,主要通过如下方式:

本例子的 chaos 实验是想验证如果 ssl 证书过期之后,验证系统是不是还是工作?

{
            "type": "action",
            "name": "swap-to-expired-cert",
            "provider": {
                "type": "process",
                "path": "cp",
                "arguments": "expired-cert.pem cert.pem"
            }
        },
        {
            "type": "probe",
            "name": "read-tls-cert-expiry-date",
            "provider": {
                "type": "process",
                "path": "openssl",
                "arguments": "x509 -enddate -noout -in cert.pem"
            }
        },
        {
            "type": "action",
            "name": "restart-astre-service-to-pick-up-certificate",
            "provider": {
                "type": "process",
                "path": "pkill",
                "arguments": "--echo -HUP -F astre.pid"
            }
        },
        {
            "type": "action",
            "name": "restart-sunset-service-to-pick-up-certificate",
            "provider": {
                "type": "process",
                "path": "pkill",
                "arguments": "--echo -HUP -F sunset.pid"
            },
            "pauses": {
                "after": 1
            }
        }

以上是一次 chaos toolkit 实用的全过程,如何使用还是比较清楚,个人认为关键是如何进行扩展不同 method 的 action 种类,chaostoolkit-incubator这个仓库中定义了不少插件和扩展,这个准备后面再继续研究一下.


↙↙↙阅读原文可查看相关链接,并与作者交流