dirtyhand-tester Chaos Toolkit- 简化做混乱工程实验工具

simonpatrick · 2019年05月20日 · 1099 次阅读

Chaos Toolkit- 简化做混乱工程实验工具

Chaos Toolkit 是一个工具用来简化做混乱工程实验. 以下是官方文档中关于这个这个工具的说明:

The Chaos Toolkit aims to be the simplest and easiest way to explore building your own Chaos Engineering Experiments.

下面通过官方提供的示例应用来说明如何使用 chaos toolkit.

Installation

  • 测试应用本地使用环境
mkdir demo && cd demo
pipenv --python 3.7
# pipenv install -U pip
  • install chaos-toolkit 相关组件
pip3 install -U chaostoolkit
pip3 install -U chaostoolkit-kubernetes
pip3 install -U chaostoolkit-reporting
  • MAC 安装其他向相关组件
brew install  cairo pango gdk-pixbuf libffi

启动应用

使用chaostoolkit-demo 中的 a-simple-walkthrough中的文件复制到 demo 目录.

安装依赖:

pipenv install astral
pipenv install cherrypy
pipenv install pytz
pipenv install requests

运行:

cp valid-cert.pem cert.pem
nohup python3 astre.py &
nohup python3 subset.py & 

运行 chaos 测试

运行 chaos:

chaos run experiment.json

result:

[2019-05-20 14:35:29 INFO] Validating the experiment's syntax
[2019-05-20 14:35:29 INFO] Experiment looks valid
[2019-05-20 14:35:29 INFO] Running experiment: What is the impact of an expired certificate on our application chain?
[2019-05-20 14:35:29 INFO] Steady state hypothesis: Application responds
[2019-05-20 14:35:29 INFO] Probe: the-astre-service-must-be-running
[2019-05-20 14:35:29 CRITICAL] Steady state probe 'the-astre-service-must-be-running' is not in the given tolerance so failing this experiment
[2019-05-20 14:35:29 INFO] Let's rollback...
[2019-05-20 14:35:29 INFO] Rollback: swap-to-vald-cert
[2019-05-20 14:35:29 INFO] Action: swap-to-vald-cert
[2019-05-20 14:35:29 INFO] Rollback: None
[2019-05-20 14:35:29 INFO] Action: restart-astre-service-to-pick-up-certificate
[2019-05-20 14:35:29 INFO] Rollback: None
[2019-05-20 14:35:29 INFO] Action: restart-sunset-service-to-pick-up-certificate
[2019-05-20 14:35:29 INFO] Pausing after activity for 1s...
[2019-05-20 14:35:30 INFO] Experiment ended with status: failed

有问题实验的信息:

[2019-05-20 14:35:29 CRITICAL] Steady state probe 'the-astre-service-must-be-running' is not in the given tolerance so failing this experiment

生成报告

chaos report --export-format=html journal.json report.html

通过 experiment.json 了解此次实验的目的

关于如果 chaos 实验的都是通过 experirment.json 这个文件来描述的,主要通过如下方式:

  • steady-state-hypothesis: 假设: 设置期望的结果,会运行两次一次是实验开始,一次是实验结束
  • method: 修改: 实验进行的方法和步骤,改变系统或者应用的状态
  • rollbacks:恢复状态:实验完成后的回滚操作

本例子的 chaos 实验是想验证如果 ssl 证书过期之后,验证系统是不是还是工作?

  • steady-state-hypothesis: json 的片段,确认系统运行,pid 文件存在,返回 http 响应

    "steady-state-hypothesis": {
        "title": "Application responds",
        "probes": [
            {
                "type": "probe",
                "name": "the-astre-service-must-be-running",
                "tolerance": true,
                "provider": {
                    "type": "python",
                    "module": "os.path",
                    "func": "exists",
                    "arguments": {
                        "path": "astre.pid"
                    }
                }
            },
            {
                "type": "probe",
                "name": "the-sunset-service-must-be-running",
                "tolerance": true,
                "provider": {
                    "type": "python",
                    "module": "os.path",
                    "func": "exists",
                    "arguments": {
                        "path": "sunset.pid"
                    }
                }
            },
            {
                "type": "probe",
                "name": "we-can-request-sunset",
                "tolerance": 200,
                "provider": {
                    "type": "http",
                    "timeout": 3,
                    "verify_tls": false,
                    "url": "https://localhost:8443/city/Paris"
                }
            }
        ]
    }
    
  • method: 修改系统状态

{
            "type": "action",
            "name": "swap-to-expired-cert",
            "provider": {
                "type": "process",
                "path": "cp",
                "arguments": "expired-cert.pem cert.pem"
            }
        },
        {
            "type": "probe",
            "name": "read-tls-cert-expiry-date",
            "provider": {
                "type": "process",
                "path": "openssl",
                "arguments": "x509 -enddate -noout -in cert.pem"
            }
        },
        {
            "type": "action",
            "name": "restart-astre-service-to-pick-up-certificate",
            "provider": {
                "type": "process",
                "path": "pkill",
                "arguments": "--echo -HUP -F astre.pid"
            }
        },
        {
            "type": "action",
            "name": "restart-sunset-service-to-pick-up-certificate",
            "provider": {
                "type": "process",
                "path": "pkill",
                "arguments": "--echo -HUP -F sunset.pid"
            },
            "pauses": {
                "after": 1
            }
        }
  • method 运行完成后实际上会再运行一次 steady-state-hypothesis 进行检查
  • rollbacks: 恢复原始状态

以上是一次 chaos toolkit 实用的全过程,如何使用还是比较清楚,个人认为关键是如何进行扩展不同 method 的 action 种类,chaostoolkit-incubator这个仓库中定义了不少插件和扩展,这个准备后面再继续研究一下.

如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
暂无回复。
需要 登录 后方可回复, 如果你还没有账号请点击这里 注册