大家可以再看看阿里这个文章,一起学习: https://mp.weixin.qq.com/s/53KZsrAIGCAdF1_LZ5ORPw
此项目不能完成测试,应该定义个
def answer_consistency_test(agent, test_questions: List[str], repetitions: int = 3) -> Dict:
"""答案一致性测试"""
consistency_results = {}
for question in test_questions:
responses = []
for _ in range(repetitions):
result = agent.process_message(question)
responses.append(result["response"])
# 计算响应间的一致性
consistency_score = calculate_response_similarity(responses)
consistency_results[question] = {
"responses": responses,
"consistency_score": consistency_score,
"is_consistent": consistency_score > 0.8
}
overall_consistency = sum(
result["consistency_score"] for result in consistency_results.values()
) / len(consistency_results) if consistency_results else 0
return {
"overall_consistency": overall_consistency,
"question_consistency": consistency_results,
"consistent_questions": sum(
1 for result in consistency_results.values() if result["is_consistent"]
),
"total_questions": len(test_questions)
}
统计看看能不能解决的你的问题
好,后面有机会就出
1、列问题数据,给出预期数据和预期收益
2、找 +1 沟通寻求支持,和研发老板同步,再和一线开发同步指标
3、定期复盘。
让开发写操作文档,和提供 k8s yaml 文件。
感觉需要 gpu 机器才快点。。。要不要太慢了,一个图片 10s 多才解析完
【老成点的测试不会给确定答案只会写明风险让产品自己判断是否上线】真的 6
都挂了。。。。😆