性能测试工具使用 locust 遇到的一点问题

fqx13 · October 23, 2025 · Last by fqx13 replied at October 27, 2025 · 3785 hits

最近用 locust 做大模型问答的性能测试，发现使用 locust 下大模型的首 token 时间会长一点。一开始以为是并发下压测机出现瓶颈导致变慢，切换其他压测机还是会出现问题。后面发现，在 locust 单用户的情况下，大模型的首 token 时间也会比普通请求的大模型首 token 长一点。
为了方便各位佬查看，我把脚本最简化，省略了业务逻辑，对比了 locust 脚本和循环脚本，只打印出大模型首 token 的时间。

"""locust脚本"""
import json
import requests
import urllib3
urllib3.disable_warnings()
import time
from locust import TaskSet, task
from locust.contrib.fasthttp import FastHttpUser


class ModelRequestSet(TaskSet):

    def on_start(self):
        data = {"account": "xxxx",
                "password": "e01fe316609bfa4c374abfe4ff9fa081d298c9a1d0102f991557eefdcd6903c1a229e03f24a483b5f63d74677b8c4a66"}
        response = requests.post('http://192.168.224.171:58033/chat/user/login', json=data)
        self.login_token = response.json()['result']['token']

    @task
    def send_request(self):
        data = {
            "inputs": {"rag": "True", "deep_think": "True", "online_search": "False"},
            "query": '十一、请问在建设初期，在解决饮水问题、增加产值、改善生态环境等方面，xxxx的预期效益如何？',
            "conversation_id": "",
            "files": []}

        headers = {"accept": "text/event-stream", "X-Access-Token":self.login_token}
        start_time = time.time()
        has_executed = False
        response = requests.post('http://192.168.224.171:58033/chat/askChat', json=data, headers=headers, verify=False,
                                 stream=True)

        for chunk in response.iter_lines(chunk_size=None):
            chunk = chunk.decode('utf-8')
            if chunk.startswith("data"):
                chunk_str = chunk.split('data:', 1)[1].strip()
                chunk_json = json.loads(chunk_str)
                if chunk_json['event'] == 'message':
                    if not has_executed:
                        first_packet_time = time.time()
                        first_elapsed_time = first_packet_time - start_time
                        # 打印首token时间
                        print(first_elapsed_time)
                        has_executed = True

class ModelUser(FastHttpUser):
    tasks = [ModelRequestSet, ]
    host = 'http://192.168.224.171:58033/'

"""普通脚本"""
import json
import requests
import time

data = {"account": "xxxx",
                "password": "e01fe316609bfa4c374abfe4ff9fa081d298c9a1d0102f991557eefdcd6903c1a229e03f24a483b5f63d74677b8c4a66"}
response = requests.post('http://192.168.224.171:58033/chat/user/login', json=data)
login_token = response.json()['result']['token']
headers = {"accept": "text/event-stream", "X-Access-Token":login_token}
data = {
  "appId": "",
  "conversation_id": "",
  "conversation_type": "chat",
  "files": [],
  "inputs": {"online_search": "True", "deep_think": "True", "rag": "False"},
  "query": "十一、请问在建设初期，在解决饮水问题、增加产值、改善生态环境等方面，xxxx的预期效益如何？"
}
for i in range(0,100):
    start_time = time.time()
    has_executed = False
    response = requests.post('http://192.168.224.171:58033/chat/askChat', json=data, verify=False, headers=headers, stream=True)
    for chunk in response.iter_lines(chunk_size=None):
        chunk = chunk.decode('utf-8')
        if chunk.startswith("data"):
            chunk_str = chunk.split('data:', 1)[1].strip()
            chunk_json = json.loads(chunk_str)
            if chunk_json['event'] == 'message':
                if not has_executed:
                    first_packet_time = time.time()
                    first_elapsed_time = first_packet_time - start_time
                    # 打印首token时间
                    print(first_elapsed_time)
                    has_executed = True

结果如上，因为首 token 时间这个指标请求不会返回只能自己计算。locust 单用户的情况下，首 token 时间整体比普通单用户请求高了 1~2 秒左右。之前几十用户并发的时候，首 token 时间整体甚至高了几秒左右。后面 token 吐字率因为也和时间也关系，估计也是算不准的。我查了一下没合适的答案，问 AI 解释是 locust 内部协程切换需要时间导致。
不知道各位之前使用 locust 遇到这种情况是怎么处理的。

共收到 5 条回复时间点赞

无言 #1 · October 23, 2025

用 jmeter

记夕忆沧澜 #2 · October 23, 2025

已经用了 Locust，内置了请求 client，通过继承相关 User，我记得有 HttpUser/FastHttpUser，这里面都是支持异步协程的，你用 requests 这个同步阻塞三方库才是导致产生这个问题的根因。

from locust import task, FastHttpUser

class MyUser(FastHttpUser):
    @task
    def index(self):
        response = self.client.get("/")

尚酷米 #3 · October 24, 2025

用 locust 自带的发请求试试，先不用 requests 包

我吃香菜了 #4 · October 24, 2025

locust 是用 gevent 协程实现并发的，但是 requests 是同步库你可以使用 locust 的框架类去实现，上面已经提到了 httpUser。顺便一提你问 AI 的问题应该是也有问题的，你应该问为啥你的脚本在 locust 里面为啥没有实现并发的效果，AI 肯定能指出问题原因的。

fqx13 #5 · October 27, 2025 Author

感谢各位的意见，有尝试过切换 locust 自带的 client，后面发现是参数传错导致的，开启 rag 检索速度会慢一点。问题已解决，感谢！

fqx13 关闭了讨论 27 Oct 14:23

需要 Sign In 后方可回复, 如果你还没有账号请点击这里 Sign Up。

性能测试工具 使用 locust 遇到的一点问题

性能测试工具 使用 locust 遇到的一点问题

性能测试工具使用 locust 遇到的一点问题

性能测试工具使用 locust 遇到的一点问题