灌水使用 IQA-PyTorch 进行手机拍照超分 (SR) 的客观质量评估

槽神 · 2026年01月22日 · 3107 次阅读

快捷通道一：不看背景科普废话，直达本文主题

快捷通道二：不想看正文废话，直达运行主程序

图像质量评估-IQA

图像超分辨率（Image Super Resolution）

超分辨率（Super-Resolution）即通过硬件或软件的方法提高原有图像的分辨率，通过一系列低分辨率的图像来得到一幅高分辨率的图像过程就是超分辨率重建。超分辨率重建的核心思想就是用时间带宽（获取同一场景的多帧图像序列）换取空间分辨率，实现时间分辨率向空间分辨率的转换。

图像超分方法

图像超分就是让模糊图片变清晰的技术，目前主流方法分三类：基于插值、基于重建和基于学习。

超分方法	原理	优点	缺点
基于插值	通过数学公式在像素间 “猜” 新值，放大图像	计算快，适合实时处理	边缘易模糊，细节恢复差，可能存在锯齿、模糊、块状效应
基于重建	利用多张低分辨率图像或先验知识优化重建，解决退化模型问题	细节保留更好，适合医学影像等专业领域	计算复杂，耗时较长，可能存在过平滑、边缘失真、振铃效应。
基于学习	用深度学习模型从大量数据中学习低分辨率到高分辨率的映射	细节和纹理恢复最自然，效果最好	需大量数据和算力，模型可能过拟合，可能存在纹理幻觉、结构错位、高频噪声

图像质量评方法估分类

‌主观评估‌：人眼打分，最准但费时费力，比如用平均意见得分（MOS）。
客观评估‌：用算法算，效率高，适合自动化，传统的典型指标有 PSNR、SSIM 等等。

图像超分客观评估方法

评估方法分类

NR-IQA，无参考图像质量评估（No-Reference Image Quality Assessment），完全不用原始图，只分析处理后的图，实用，但难度大。
RR-IQA，半参考图像质量评估（Reduced-Reference Image Quality Assessment），它介于全参考和无参考评估之间。
FR-IQA，全参考图像质量评估（Full-Reference Image Quality Assessment），需要原始图和处理后的图对比，比如算 PSNR、SSIM。

评估应用提示

对于所有的超分方法，NR 指标基本都可以进行质量评估，但对于不同场景和不同观测需求，所使用的指标不尽相同。例如，相同的场景/被摄主体，在不同的焦距下，FOV 并不相同，背景元素内容不相同，图片的高、低频信号量也不相同，在评估光学变焦和电子裁切后的成像一致性时，很难用特定的指标来进行，需要综合考虑。
对于基于重建和学习的方法，超分后的评估除了使用 NR 指标评估其绝对质量之外，更多的是使用 FR 指标来评估超分/生成结果的真实性、纹理幻觉、细节的涂抹损失等问题，因为绝对的清晰度并不代表真实性和用户的满意度。也许不久的将来（没准已经有了）还会有模型、指标来评估拍照超分/生成图片的 “油画感”、“塑料感” 等令人诟病的问题。

IQA-PyTorch 及其应用

官方介绍翻译

IQA-PyTorch 是一个基于纯 Python 和 PyTorch 构建的全面图像质量评估（IQA）工具包。我们重新实现了许多广泛使用的全参考（FR）和无参考（NR）指标，在有官方 MATLAB 脚本的情况下，结果均已针对其进行校准。借助 GPU 加速，我们的实现比其 Matlab 对应版本快得多。

仓库地址

gitcode: https://gitcode.com/gh_mirrors/iq/IQA-PyTorch
github: https://github.com/chaofengc/IQA-PyTorch

环境准备

IQA-PyTorch 首次调试时可能需要从外网下载模型，vscode 或者相关 IDE 使用外网代理。
建议使用 anaconda 或 miniconda 来独立管理 python 环境，具体操作请自行咨询 AI 助手或者搜索引擎，这里不再赘述。
```
pip install pyiqa piexif pillow pillow_heif opencv-python
```
手机拍照超分的输入输出 dump 过程此处省略，因为这涉及到不同系统、不同机型的打桩方式可能各不相同，但最终输入输出都要以 RGB 的格式存储。
预先下载这些 CLIP 相关（如不使用，忽略此步）模型放到 %USERPROFILE%\\.cache\torch\hub\clip 目录下:
开始调试程序（全部的准备工作做完之后）时打开主计算程序中 "pyiqa" 的 logger 到 DEBUG，根据提示下载模型放到提示对应的目录。
- %USERPROFILE%\\.cache\torch\hub\pyiqa
- %USERPROFILE%\\.cache\torch\hub\checkpoints

无法联网（HuggingFace）的环境

import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

或:

# Linux/Mac
export HF_ENDPOINT=https://hf-mirror.com

# Windows
set HF_ENDPOINT=https://hf-mirror.com

做如上设置后，仍然存在类似相关报错（安全起见，建议开始就使用这种方法，全部使用本地部署的方式，哪怕能够连上 HF）: (MaxRetryError("HTTPSConnectionPool(host='hf-mirror.com' Max retries exceeded with url: **/model.safetensors，那么按照下面三个步骤操作：

步骤一手动下载 pth 文件:

在 HF 模型下载网页中选择目标 pth 文件下载到本地，如果无法访问，自行查找科学上网方法或请人帮忙下载

步骤二修改源码:

修改源文件: %USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyiqa\archs\hypernet_arch.py

self.base_model = timm.create_model(
    # 修改 pretrained=True 为 pretrained=False
    base_model_name, pretrained=True, features_only=True
)

步骤三代码用户自行选择加载：

步骤一中下载的权重文件可以直接保存到 %USERPROFILE%\\.cache\torch\hub\pyiqa下，这样程序中可以不用指定路径使用默认的权重文件进行加载。也可以下载到固定的目录，加载时自行指定路径:

metric = pyiqa.create_metric('qualichlip', pretrained=False)
metric.load_weights('your_path/QualiCLIP.pth')

或

metric = pyiqa.create_metric('qualichlip', pretrained_model_path='your_path/QualiCLIP.pth')

解决 load_state_dict 报错问题

找到 %USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\torch\nn\modules\module.py
函数 load_state_dict 在约 2580 行 module._load_from_state_dict 的调用中 strict 参数传值改为 False

运行主程序获取结果

修改 lr_dir, hr_dir, ts_begin_index 参数，将被评估文件放入对应文件夹，将文件名统一对齐。
运行图像评分主计算程序。
```
python iqa_pytorch.py
```
固定若干组典型的测试数据，反复切换指标/指标组合，一般建议 2~5 个指标组合进行，NR 和 FR 至少各包含 1 个，尽量选择 2020 年之后提出的指标。
固定指标，反复更换测试数据，分别用 10 组、50 组或更多的数据验证，场景要足够丰富，没有更换指标的情况下，前期的海量投入可以换来近乎一劳永逸的效果。
反复观察输出结果，进行主客观一致性验证，来确定指标组合。

调校完毕投入使用，有兴趣的话，可以根据需求打印更多信息，例如焦距、ISO 等，也可以按需将结果写入 csv 或者 excel 等：

[2026-01-31 11:43:14,601] [INFO]: ======================= 开始使用 NR 指标 [DETAIL (高分更优)] 评估图像 =======================
[2026-01-31 11:43:14,734] [INFO]: [1 / 6]: [20260121_104109] detail_score: 17.27
[2026-01-31 11:43:14,878] [INFO]: [2 / 6]: [20260127_151153] detail_score: 92.14
[2026-01-31 11:43:15,015] [INFO]: [3 / 6]: [20260127_151202] detail_score: 93.76
[2026-01-31 11:43:15,162] [INFO]: [4 / 6]: [20260127_151220] detail_score: 60.63
[2026-01-31 11:43:15,296] [INFO]: [5 / 6]: [20260127_155004] detail_score: 41.01
[2026-01-31 11:43:15,430] [INFO]: [6 / 6]: [20260127_162141] detail_score: 57.97
[2026-01-31 11:43:15,431] [INFO]: ======================= 开始使用 NR 指标 [ENTROPY (高分更优)] 评估图像 =======================
[2026-01-31 11:43:15,530] [INFO]: [1 / 6]: [20260121_104109] entropy_score: 5.88
[2026-01-31 11:43:15,632] [INFO]: [2 / 6]: [20260127_151153] entropy_score: 7.68
[2026-01-31 11:43:15,728] [INFO]: [3 / 6]: [20260127_151202] entropy_score: 7.68
[2026-01-31 11:43:15,835] [INFO]: [4 / 6]: [20260127_151220] entropy_score: 7.54
[2026-01-31 11:43:15,926] [INFO]: [5 / 6]: [20260127_155004] entropy_score: 7.81
[2026-01-31 11:43:16,022] [INFO]: [6 / 6]: [20260127_162141] entropy_score: 7.78
[2026-01-31 11:43:16,023] [INFO]: ======================= 开始使用 FR 指标 [SFID (低分更优)] 评估图像 =======================
[2026-01-31 11:43:24,925] [INFO]: [1 / 6]: [20260121_104109] sfid_score: 49.11
[2026-01-31 11:43:33,295] [INFO]: [2 / 6]: [20260127_151153] sfid_score: 9.26
[2026-01-31 11:43:41,543] [INFO]: [3 / 6]: [20260127_151202] sfid_score: 11.07
[2026-01-31 11:43:49,958] [INFO]: [4 / 6]: [20260127_151220] sfid_score: 18.74
[2026-01-31 11:43:57,962] [INFO]: [5 / 6]: [20260127_155004] sfid_score: 0.05
[2026-01-31 11:44:06,168] [INFO]: [6 / 6]: [20260127_162141] sfid_score: 3.48
[2026-01-31 11:44:06,175] [INFO]: ======================= 开始使用 NR 指标 [QUALICLIP (高分更优)] 评估图像 =======================
[2026-01-31 11:44:16,503] [INFO]: [1 / 6]: [20260121_104109] qualiclip_score: 0.27
[2026-01-31 11:44:24,962] [INFO]: [2 / 6]: [20260127_151153] qualiclip_score: 0.53
[2026-01-31 11:44:34,411] [INFO]: [3 / 6]: [20260127_151202] qualiclip_score: 0.54
[2026-01-31 11:44:43,818] [INFO]: [4 / 6]: [20260127_151220] qualiclip_score: 0.44
[2026-01-31 11:44:53,056] [INFO]: [5 / 6]: [20260127_155004] qualiclip_score: 0.36
[2026-01-31 11:45:02,344] [INFO]: [6 / 6]: [20260127_162141] qualiclip_score: 0.54
[2026-01-31 11:45:02,356] [INFO]: ======================= 开始使用 FR 指标 [LPIPS+ (低分更优)] 评估图像 =======================
[2026-01-31 11:45:05,060] [INFO]: [1 / 6]: [20260121_104109] lpips+_score: 0.60
[2026-01-31 11:45:07,380] [INFO]: [2 / 6]: [20260127_151153] lpips+_score: 0.18
[2026-01-31 11:45:09,744] [INFO]: [3 / 6]: [20260127_151202] lpips+_score: 0.27
[2026-01-31 11:45:12,128] [INFO]: [4 / 6]: [20260127_151220] lpips+_score: 0.54
[2026-01-31 11:45:14,468] [INFO]: [5 / 6]: [20260127_155004] lpips+_score: 0.00
[2026-01-31 11:45:16,790] [INFO]: [6 / 6]: [20260127_162141] lpips+_score: 0.13
[2026-01-31 11:45:16,790] [INFO]: ======================= 开始计算图像质量综合得分 =======================
[2026-01-31 11:45:16,791] [INFO]: [1 / 6]: [20260121_104109] 综合得分: 27.99
[2026-01-31 11:45:16,791] [INFO]: [2 / 6]: [20260127_151153] 综合得分: 73.22
[2026-01-31 11:45:16,791] [INFO]: [3 / 6]: [20260127_151202] 综合得分: 67.94
[2026-01-31 11:45:16,791] [INFO]: [4 / 6]: [20260127_151220] 综合得分: 51.18
[2026-01-31 11:45:16,791] [WARNING]: [5 / 6]: [20260127_155004] AIGC疑似未生效, 输入和输出图像非常接近
[2026-01-31 11:45:16,791] [INFO]: [6 / 6]: [20260127_162141] 综合得分: 72.67

指标使用说明

使用提示和个人见解

NR-IQA，支持将 LR 和 HR 分别评估打分，用来衡量在不同场景和输入质量、不同超分方法的效果， 本项目综合计算只采用 HR 得分。
FR-IQA，一般考虑使用超分的输入和输出分别作为 LR 和 HR 来进行对比评估，也可以拿不同模型（如端和云）来进行对比，具体需求看使用场景。
如果 HR 图像是 HEIF（heic）等其他格式，必须先转为 RGB 才能参与评估计算，而非 RGB 转 RGB 将带来质量损失，所以拍图尽可能不要使用非 RGB 格式保存。
项目不同阶段，需要调整指标的使用，比如初期重心在前端 ISP 算法、raw 质量等，可以考虑使用传统的 PSNR、NIQE 等指标，或多使用 NR 指标评估 LR。
本质上说，这套方法是用 AI 来评估 AI，数据集还是要根据模型来构建——尤其是自研的基模型，但是传统的数学评估方法在 AI 超分中很难评估其真实性、伪像等问题，只能选择使用模型测模型。
IQA 不单单能为超分结果评分，也能用来帮助大模型训练，例如IQA 模型可以作为 teacher 模型，为超分模型生成更符合人类感知的质量评分的结果——训练时，可以将 IQA 评分作为 loss 函数的一部分，引导超分模型生成视觉效果更自然、细节更丰富的图像；也可以在构建数据集时为数据集质量打分，筛选或评估合成的低质量图像，确保其退化类型和强度分布更贴近真实场景，从而提升超分模型的泛化能力。当然，这些应用暂时不在本文讨论的范围之内，后面有机会再去实践总结。

自定义模型训练与数据集使用

IQA-PyTorch 项目为 CLIPIQA、CNNIQA、DBCNN、HyperNet、NIMA、QualiCLIP、TOPIQ、WaDIQaM 这几个深度学习类神经网络模型提供了可自定义训练的默认数据集配置，在项目 options/train 目录下，以 QualiCLIP 的 KonIQ-10k 数据集为例，配置为 options/train/QualiCLIP/train_QualiCLIP_koniq10k.yml 文件。

训练配置通常包含 学习率及其调度策略、批量大小、训练周期数、优化器选择、损失函数配置，用户可以根据自己的硬件条件和数据集特性调整这些参数，以获得最佳的训练效果。clone 完项目，下载好对应的数据集，修改好训练配置后就可以开始自行训练：

# 在项目根目录下
python ./pyiqa/train.py --opt options/train/train_QualiCLIP_koniq10k.yml

全部可自定义训练的配置：

模型名称	数据集（需用户指定）	配置文件名（示例）
CLIPIQA	KonIQ-10k	train_CLIPIQA_koniq10k.yml
CNNIQA	KonIQ-10k	train_CNNIQA.yml
DBCNN	LIVEC KonIQ-10k TID2008	train_DBCNN.yml train_DBCNN_koniq10k.yml train_DBCNN_tid.yml
HyperNet	KonIQ-10k	train_HyperNet.yml
NIMA	AVA KonIQ-10k SPAQ	train_NIMA.yml train_NIMA_inception_ava.yml train_NIMA_inception_koniq.yml train_NIMA_inception_spaq.yml
QualiCLIP	live KonIQ-10k SPAQ	train_QualiCLIP_clive.yml train_QualiCLIP_flive.yml train_QualiCLIP_koniq10k.yml train_QualiCLIP_spaq.yml
TOPIQ	resnet50_ava CGFIQA GFIQA KonIQ-10k Swin_ava Swin_CGFIQA	train_TOPIQ_res50_ava.yml train_TOPIQ_res50_cgfiqa.yml train_TOPIQ_res50_gfiqa.yml train_TOPIQ_res50_koniq.yml train_TOPIQ_swin_ava.yml train_TOPIQ_swin_cgfiqa.yml
WaDIQaM	general_iqa_dataset KonIQ-10k	train_WaDIQaM_FR_kadid.yml train_WaDIQaM_NR_koniq.yml

附一图像评分主计算程序

import pyiqa
import torch
import os
import cv2
import time
import glob
import shutil
import os.path as osp
import logging
import piexif
import exifread
import metric_conf as mc
from PIL import Image
from datetime import datetime as dt
from pillow_heif import register_heif_opener

# 配置日志
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('[%(asctime)s] [%(levelname)s]: %(message)s')
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.DEBUG)
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)

# 如非调试, 关闭大部分日志, 修改源文件, 删除一些不必要的 print
pyiqa_logger = logging.getLogger('pyiqa')
pyiqa_logger.setLevel(logging.ERROR)

def get_timestamp_list(image_dir, ts_begin_index=0):
    time_set = []
    for ff in os.listdir(image_dir):
        fp = osp.join(image_dir, ff)
        if osp.isdir(fp):
            continue
        ts = ts_from_file_name(ff, ts_begin_index)
        if ts not in time_set:
            time_set.append(ts)

    return time_set

def ts_from_file_name(file_name, begin_index = 0):
    name_arr = file_name.rsplit(".", 1)[0].split("_")
    i_1, i_2, i_3 = begin_index, begin_index + 1, begin_index + 2
    cut_time = name_arr[i_2] if len(name_arr[i_2]) == 6 else name_arr[i_2][0:6]
    timestamp = f"{name_arr[i_1]}_{cut_time}"
    if len(name_arr) > i_3 and name_arr[i_3].isdigit() and len(name_arr[i_3]) == 1:
        timestamp += f"_{name_arr[i_3]}"
    return timestamp

def convert_heif_to_rgb(image_path, delete_heic=False):
    """
    heic 格式转换到 jpg 格式, 画质会大幅损失, 哪怕quality设置为95%
    """
    register_heif_opener()
    image_name = osp.basename(image_path)
    if image_name.lower().endswith('.jpg'):
        return image_path
    jpg_path = image_path.replace('.heic', '.jpg')

    with Image.open(image_path) as image:
        rgb_image = image.convert('RGB')
        exif_bytes = image.info.get('exif')
        save_kwargs = {'quality': 95, 'subsampling': 1}
        if exif_bytes:
            try:
                exif_dict = piexif.load(exif_bytes)
                save_kwargs['exif'] = piexif.dump(exif_dict)
            except Exception as e:
                logger.error(f"error while handling exif info: {e}")
        if osp.exists(jpg_path):
            os.remove(jpg_path)
        rgb_image.save(jpg_path, **save_kwargs)
        logger.warning(f'image {image_name} converted to jpg, quality will be reduced a lot')

    if delete_heic:
        os.remove(image_path)
        logger.info(f'heif image {image_path} removed')

    return jpg_path

def get_jpeg_exif_text(image_path):
    exif = {
        'iso': 0,
        'fl': 0,
        'et': '0',
        'ap': ''
    }
    with open(image_path, 'rb') as f:
        exif_data = exifread.process_file(f)
        for key, val in exif_data.items():
            if key == 'EXIF ISOSpeedRatings':
                exif['iso'] = str(val).rjust(5, ' ')
            if key == 'EXIF FocalLengthIn35mmFilm':
                exif['fl'] = str(val).rjust(4, ' ')
            if key == 'EXIF ExposureTime':
                et_arr = str(val).split("/")
                exposure_time = et_arr[0].rjust(6, ' ')
                if len(et_arr) > 1:
                    exposure_time = f"1/{str(round(int(et_arr[1]) / int(et_arr[0])))}".rjust(6, ' ')
                exif['et'] = exposure_time
            if key == 'EXIF ApertureValue':
                ap_arr = str(val).split("/")
                aperture_value = f'F/{ap_arr[0]}'.rjust(6, ' ')
                if len(ap_arr) > 1:
                    aperture_value = f"F/{round(int(ap_arr[0]) / int(ap_arr[1]), 1)}".rjust(6, ' ')
                exif['ap'] = aperture_value

    return f"AP:{exif['ap']}  ET:{exif['et']}s  FL:{exif['fl']}mm  ISO:{exif['iso']}"

def file_pre_handle(hr_dir, lr_dir, ts_begin_index=0):
    file_dict = dict()
    ts_list = get_timestamp_list(hr_dir, ts_begin_index)
    file_dict['file_info'] = list()

    for ts in sorted(ts_list):
        lr_files = glob.glob(osp.join(lr_dir, f"*{ts}_iso*.jpg"))
        lr_files += glob.glob(osp.join(lr_dir, f"*{ts}_iso*.heic"))        
        if len(lr_files) == 0:
            ts_list.remove(ts)
            logger.warning(f"no lr image found by timestamp {ts}")
            continue
        lr_file = convert_heif_to_rgb(lr_files[0])

        hr_files = glob.glob(osp.join(hr_dir, f"*{ts}_iso*.heic"))
        hr_files += glob.glob(osp.join(hr_dir, f"*{ts}_iso*.jpg"))
        if len(hr_files) == 0:
            ts_list.remove(ts)
            logger.warning(f"no hr image found by timestamp {ts}")
            continue
        hr_file = convert_heif_to_rgb(hr_files[0])

        file_info = {
            'timestamp': ts,
            'lr_file': lr_file,
            'hr_file': hr_file,
            'exif_text': get_jpeg_exif_text(hr_file)
        }

        file_dict['file_info'].append(file_info)
    file_dict['image_count'] = len(ts_list)
    file_dict['count_width'] = len(str(len(ts_list)))

    return file_dict

def metric_score_normalize(score, metric):
    if score == 0:
        return score

    lv = metric['value_range'][0]
    rv = metric['value_range'][1]
    lb = metric['lower_better']

    # 限制0除外的非纯数学评估结果的下限, 下限为最大值的10%
    limit_score = max(score, rv / 10) if score < rv / 10 else min(score, rv)

    # 最大100, 最小: 10
    return 10 * rv / abs(limit_score - lv) if lb else limit_score * 100 / rv


def get_enabled_metrics():
    """
    获取启用的度量指标列表, 如果启用的度量指标的总权重不等于1, 将平均设置每个度量指标的权重
    """
    enabled_metrics = []
    total_weight = 0.0
    for metric in mc.metrics:
        if metric['current_enabled'] and metric['can_be_used']:
            enabled_metrics.append(metric)
            total_weight += metric['score_weight'] if 'score_weight' in metric else 0.0

    enabled_count = len(enabled_metrics)
    if enabled_count == 0:
        return []

    # 如果总分数权重相加不等于 1, 则作废权重直接平均计算
    if total_weight != 1.0:
        logger.warning(f"score weight total is not 1, for each one, reset to {1 / enabled_count}")
        for metric in enabled_metrics:
            metric['score_weight'] = 1 / enabled_count

    return enabled_metrics

def image_math_metric_calc(file_dict, metric_conf, print_lr=False):
    scores = dict()
    metric_name = metric_conf['metric_name']
    function = metric_conf['math_calc_func']
    params = {} if 'math_calc_params' not in metric_conf else metric_conf['math_calc_params']

    image_count = file_dict['image_count']
    count_width = file_dict['count_width']
    for index, file_info in enumerate(file_dict['file_info']):
        ts = file_info['timestamp']
        hr_file = file_info['hr_file']
        lr_file = file_info['lr_file']
        index_text = f'{str.zfill(str(index + 1), count_width)} / {image_count}'
        try:
            scores[ts] = dict()
            exif_text = file_info['exif_text']
            image_gray_hr = cv2.imread(hr_file, cv2.IMREAD_GRAYSCALE)
            score_hr = function(image_gray_hr, **params)
            normalized_score_hr = metric_score_normalize(score_hr, metric_conf)
            scores[ts][metric_name] = (score_hr, normalized_score_hr)
            logger.debug(f"[{index_text}]: [{ts}] [{exif_text}] {metric_name}_score_hr: {score_hr:.2f} normalized : {normalized_score_hr:.2f}")
            if print_lr:
                image_gray_lr = cv2.imread(lr_file, cv2.IMREAD_GRAYSCALE)
                score_lr = function(image_gray_lr, **params)
                normalized_score_lr = metric_score_normalize(score_lr, metric_conf)
                logger.debug(f"[{index_text}]: [{ts}] [{exif_text}] {metric_name}_score_lr: {score_lr:.2f} normalized : {normalized_score_lr:.2f}")
        except Exception as e:
            logger.error(f"[{index_text}]: [{ts}] [{exif_text}] error while calculating {metric_name} score: {e}")
            scores[ts][metric_name] = (0.0, 0.0)
    return scores

def iqa_score_calc(file_dict, metric_conf, print_lr=False):
    metric_class = metric_conf['metric_class'].upper()
    if metric_conf['metric_type'] == 'folder':
        return dir_score_calc(file_dict, metric_conf)

    if metric_conf['metric_type'] == 'math':
        return image_math_metric_calc(file_dict, metric_conf, print_lr)

    metric_name = metric_conf['metric_name']
    device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
    metric = pyiqa.create_metric(metric_name, device=device)
    # 如果用户配置和指定了权重文件就加载
    if 'weights_path' in metric_conf and metric_conf['weights_path']:
        metric.load_weights(metric_conf['weights_path'], weight_keys='params')
    scores = dict()
    image_count = file_dict['image_count']
    count_width = file_dict['count_width']

    for index, file_info in enumerate(file_dict['file_info']):
        ts = file_info['timestamp']
        lr_file = file_info['lr_file']
        hr_file = file_info['hr_file']
        index_text = f'{str.zfill(str(index + 1), count_width)} / {image_count}'

        try:
            scores[ts] = dict()

            exif_text = file_info['exif_text']
            if metric_class == 'NR':
                score_hr = metric(hr_file).item()
                normalized_score_hr = metric_score_normalize(score_hr, metric_conf)
                scores[ts][metric_name] = (score_hr, normalized_score_hr)
                logger.debug(f"[{index_text}]: [{ts}] [{exif_text}] {metric_name}_hr_score: {score_hr:.2f} normalized : {normalized_score_hr:.2f}")
                if print_lr:
                    score_lr = metric(lr_file).item()
                    normalized_score_lr = metric_score_normalize(score_lr, metric_conf)
                    logger.debug(f"[{index_text}]: [{ts}] [{exif_text}] {metric_name}_lr_score: {score_lr:.2f} normalized : {normalized_score_lr:.2f}")
            else:
                score = metric(hr_file, lr_file).item()
                normalized_score = metric_score_normalize(score, metric_conf)
                scores[ts][metric_name] = (score, normalized_score)
                logger.debug(f"[{index_text}]: [{ts}] [{exif_text}] {metric_name}_score: {score:.2f} normalized : {normalized_score:.2f}")
        except Exception as e:
            logger.error(f"[{index_text}]: [{ts}] [{exif_text}] error while calculating {metric_name} score: {e}")
            scores[ts][metric_name] = (0.0, 0.0)

    return scores

def dir_score_calc(file_dict, metric_conf):
    metric_name = metric_conf['metric_name']
    device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
    metric = pyiqa.create_metric(metric_name, device=device)
    # 如果用户配置和指定了权重文件就加载
    if 'weights_path' in metric_conf and metric_conf['weights_path']:
        metric.load_weights(metric_conf['weights_path'], weight_keys='params')

    scores = dict()
    image_count = file_dict['image_count']
    count_width = file_dict['count_width']

    for index, file_info in enumerate(file_dict['file_info']):
        ts = file_info['timestamp']
        index_text = f'{str.zfill(str(index + 1), count_width)} / {image_count}'
        lr_file = file_info['lr_file']
        hr_file = file_info['hr_file']

        try:
            scores[ts] = dict()
            exif_text = file_info['exif_text']
            temp_hr_dir = osp.join(hr_dir, osp.basename(hr_file))
            temp_hr_dir = osp.splitext(temp_hr_dir)[0]
            if osp.exists(temp_hr_dir):
                shutil.rmtree(temp_hr_dir)
            os.makedirs(temp_hr_dir, exist_ok=True)
            # 最少 2 张才能正常运行
            shutil.copy(hr_file, osp.join(temp_hr_dir, '1.jpg'))
            shutil.copy(hr_file, osp.join(temp_hr_dir, '2.jpg'))

            temp_lr_dir = osp.join(lr_dir, osp.basename(lr_file))
            temp_lr_dir = osp.splitext(temp_lr_dir)[0]
            if osp.exists(temp_lr_dir):
                shutil.rmtree(temp_lr_dir)
            os.makedirs(temp_lr_dir, exist_ok=True)
            # 最少 2 张才能正常运行
            shutil.copy(lr_file, osp.join(temp_lr_dir, '1.jpg'))
            shutil.copy(lr_file, osp.join(temp_lr_dir, '2.jpg'))

            score = metric(temp_hr_dir, temp_lr_dir).item()
            score = 0.0 if score < 0.1 else score
            normalized_score = metric_score_normalize(score, metric_conf)
            scores[ts][metric_name] = (score, normalized_score)
            logger.debug(f"[{index_text}]: [{ts}] [{exif_text}] {metric_name}_score: {score:.2f} normalized : {normalized_score:.2f}")
            shutil.rmtree(temp_hr_dir)
            shutil.rmtree(temp_lr_dir)
        except Exception as e:
            logger.error(f"[{index_text}]: [{ts}] [{exif_text}] error while calculating {metric_name} score: {e}")
            scores[ts][metric_name] = (0.0, 0.0)

    return scores

def evaluate_main(lr_dir, hr_dir, ts_begin_index=0, log_dir='', print_lr=False):
    if log_dir:
        now = dt.now().strftime('%Y%m%d_%H%M%S')
        log_file = osp.join(log_dir, f'iqa_evaluation_{now}.log')
        file_handler = logging.FileHandler(log_file)
        file_handler.setLevel(logging.DEBUG)
        file_handler.setFormatter(formatter)
        logger.addHandler(file_handler)

    enabled_metrics = get_enabled_metrics()
    if len(enabled_metrics) == 0:
        logger.error(f"no metric is enabled. please check the [current_enabled] key in metric.conf")
        return

    # 预处理, 剔除所有不符合过滤条件的数据, 返回一个字典
    file_dict = file_pre_handle(hr_dir, lr_dir, ts_begin_index)
    if file_dict['image_count'] == 0:
        logger.error(f"no valid image found. please check the log info above")
        return

    # 计算函数里, scores 在字典各个深度上一定要给个初始默认值, 否则后续需要做额外判断
    composite_scores = dict()

    for metric in enabled_metrics:
        scores = dict()
        metric_class = metric['metric_class'].upper()
        metric_name = metric['metric_name']
        better_mark = 'lower is better' if metric['lower_better'] else 'higher is better'

        logger.info(f"=========================== begin to evaluate image by {metric_class}-metric [{metric_name.upper()} ({better_mark})] ===========================")
        if metric_class in ['NR', 'FR']:
            scores = {**iqa_score_calc(file_dict, metric, print_lr), **scores}
        else:
            logger.error(f"unsupported metric class [{metric_class}] of metric [{metric_name}]")
            continue

        for file_info in file_dict['file_info']:
            ts = file_info['timestamp']
            if ts not in scores:
                scores[ts] = dict()
            if ts not in composite_scores:
                composite_scores[ts] = dict()
            if metric_name not in scores[ts]:
                scores[ts][metric_name] = (0.0, 0.0)

            composite_scores[ts] = {**composite_scores[ts], **scores[ts]}

    logger.info(f"=========================== begin to calculate composite scores ===========================")
    image_count = file_dict['image_count']
    count_width = file_dict['count_width']
    for index, file_info in enumerate(file_dict['file_info']):
        ts = file_info['timestamp']
        if ts not in composite_scores:
            continue
        exif_text = file_info['exif_text']
        index_text = f'{str.zfill(str(index + 1), count_width)} / {image_count}'
        composite_score, file_error, aigc_fail = 0.0, False, False
        for metric_name, (score, normalized_score) in composite_scores[ts].items():
            metric = [m for m in mc.metrics if m['metric_name'] == metric_name][0]
            score_weight = metric['score_weight']
            if score == 0.0:
                if metric['metric_type'] == 'folder':
                    aigc_fail = True
                else:
                    file_error = True
                break
            composite_score += normalized_score * score_weight

        if aigc_fail:
            logger.warning(f"[{index_text}]: [{ts}] [{exif_text}] aigc may not effected, hr is similar to lr")
            continue
        if file_error:
            logger.warning(f"[{index_text}]: [{ts}] [{exif_text}] file error, no permission or mis-rotated")
            continue
        logger.info(f"[{index_text}]: [{ts}] [{exif_text}] composite score: {composite_score:.2f}")


if __name__ == '__main__':
    """
    验证过的指标请参考 README.md, 启用哪些指标来评估, 通过 metric_conf.py 的 current_enabled 来配置
    参数说明:
        lr_dir: 输入/原始图片目录
        hr_dir: 超分后的图片目录
        ts_begin_index: 以下划线分割, 时间戳在文件中出现的位置, 所有文件名格式必须统一, 例如: 1001_20251021_142422_iso100_20.0X_Pixel10Pro_HR.jpg
    """
    lr_dir = r'D:\images\temp\test_LR'
    hr_dir = r'D:\images\temp\test_HR'
    evaluate_main(lr_dir=lr_dir, hr_dir=hr_dir, ts_begin_index=2, log_dir=r'D:\images\temp', print_lr=False)

附一: 经过实测的部分指标（一知半解，出错或不适用勿怪）

全部指标参见官方文档 METRICS

全参考指标名	提出时间	值的大小说明	评估结果说明	结果值范围
sfid	2024	低分代表高质量	准确性较好，主要用于生成模型（如 GAN）的评估，其核心是‌统计学方法	0~100
fid	2024	低分代表高质量	准确性较好，主要用于生成模型（如 GAN）的评估，其核心是‌统计学方法	0~100
lpips	2018	低分代表高质量	神经网络类指标，一张 4k 图使用 CPU 约需要 3s	0~1
lpips+	2020	低分代表高质量	神经网络类指标，准确性远胜 lpips，一张 4k 图使用 CPU 约需要 3s，FR 首选	0~1
stlpips	2020	低分代表高质量	神经网络类指标，准确性较好，一张 4k 图使用 CPU 约需要 10~15s	0~1
lpips-vgg	2018	低分代表高质量	神经网络类指标，32G 内存的 Windows11 PC 无法支撑其内存需求	0~1
lpips-vgg+	2023	低分代表高质量	神经网络类指标，32G 内存的 Windows11 PC 无法支撑其内存需求	0~1
stlpips-vgg	2020	低分代表高质量	神经网络类指标，32G 内存的 Windows11 PC 无法支撑其内存需求	0~1
nlpd	2006	低分代表高质量	一张 4k 图使用 CPU 约需要 2s	0~1
gmsd	2014	低分代表高质量	梯度幅度相似性偏差	0~1
dists	2020	低分代表高质量	神经网络类指标，图像深度相似性，计算所需的空闲内存至少 20GB	0~1
psnr	2002	高分代表高质量	传统峰值信噪比指标，准确性可信但对于 AIGC 来说参考意义不大	0~100
ssim	2004	高分代表高质量	结构相似性，计算所需的空闲内存至少 20GB	0~1
ms_ssim	2003	高分代表高质量	多尺度结构相似性，计算所需的空闲内存至少 12GB	0~1
cw_ssim	2010	高分代表高质量	带权重的结构相似性，计算所需的空闲内存至少 20GB	0~1
fsim	2011	高分代表高质量	特征相似性，一张 4k 图使用 CPU 约需要 2~5s，准确性一般，对于 AIGC 来说参考意义不大	0~1
ahiq	2014	高分代表高质量	神经网络类指标，一张 4k 图使用 CPU 约需要 110~120s	0~1
wadiqam_fr	2018	高分代表高质量	神经网络类指标，加权平均深度图像质量度量，现有模型评估结果为负数，暂不采用	-1~1

无参考指标名	提出时间	值的大小说明	评估结果说明	结果值范围
qualiclip	2025	高分代表高质量	神经网络类指标，清晰度评估相对准确，一张 4k 图使用 CPU 约需要 8~10s，NR 首选	0~1
dbcnn	2019	高分代表高质量	神经网络类指标，清晰度评估相对准确，一张 4k 图使用 CPU 约需要 15~30s	0~1
niqe	2012	低分代表高质量	清晰度评估相对准确，matlab 模型，一张 4k 图使用 CPU 约需要 5s	0~10+
niqe_matlab	2012	低分代表高质量	清晰度评估相对准确，matlab 模型，一张 4k 图使用 CPU 约需要 5s	0~10+
cnniqa	2014	高分代表高质量	神经网络类指标，清晰度评估相对准确，一张 4k 图使用 CPU 约需要 3s	0~1
musiq	2021	高分代表高质量	神经网络类指标，清晰度评估相对准确，一张 4k 图使用 CPU 约需要 70~90s	0~1
ilniqe	2015	低分代表高质量	matlab 模型，准确性一般	0~20
hyperiqa	2020	高分代表高质量	神经网络类指标，现有模型区分度一般，在‌预测准确性‌上领先，适合高精度评估，计算开销大	0~100
nima	2018	高分代表高质量	神经网络类指标，现有模型区分度一般，在‌美学感知‌上最贴近人类偏好，但对技术失真不敏感	0~10
piqe	2015	低分代表高质量	现有模型区分度较差	0~100
arniqa	2023	高分代表高质量	神经网络类指标，现有模型区分度较差	0~1
brisque	2012	低分代表高质量	神经网络类指标，现有模型区分度较差	0~100
pi	2018	低分代表高质量	不可用	0~100
maniqa	2022	高分代表高质量	神经网络类指标，不可用	0~100
nrqm	2016	高分代表高质量	matlab 模型，不可用，没有任何输出	0~1
clipiqa	2023	高分代表高质量	神经网络类指标，准确性尚可，但区分度不高，一张 4k 图使用 CPU 约需要 8s	0~1
maclip	2024	高分代表高质量	神经网络类指标，截至 pyiqa-0.1.14.1，尚未实现	0~1
liqe	2023	高分代表高质量	准确性尚可，一张 4k 图使用 CPU 约需要 1s	0~1
paq2piq	2020	高分代表高质量	神经网络类指标，偏向人类主观感知质量，对真实世界复杂失真（如手机拍摄）具有强泛化能力	0~1
topiq_nr	2023	高分代表高质量	神经网络类指标，偏向衡量图像质量对下游视觉任务性能的影响，不适用	0~1
tres	2023	高分代表高质量	旨在衡量图像质量对下游视觉任务性能的影响，而非人类主观感知，不适用	0~1

附二: metric_conf.py

# coding=utf-8

import cv2
from math import log2
import numpy as np

"""
    'metric_name': 指标名称
    'metric_class': 分为FR-全参考评估、NR-无参考评估
    'current_enabled': 当前项目质量评估是否启用, 其实可以将metrics拆分成启用和未启用的两个, 更方便调试
    'weights_path': 用户自己指定或者训练的权重文件, 不指定会按照官方默认的文件加载
    'metric_type': 评估指标类型, 分为指定文件 (绝大多数指标), 或者指定目录 (如fid、sfid等), 或纯数学计算 (也是指定文件)
    'math_calc_func': 自定义数学指标的计算实现函数, 仅在 metric_type 为math时有效
    'math_calc_params': 自定义数学指标的计算实现函数的参数, 仅在 metric_type 为math时有效
    'score_weight': 得分在最终加权计算中的权重
    'can_be_used': 该指标是否经过验证可用, 有些指标实现上有问题, 有些网络环境不支持, 有些准确性太差
    'lower_better': 是否低分代表高质量
    'value_range': 取值范围, 闭区间
    'created_at': 指标提出/创建年份
    'description': 该指标的一些描述, 建议补全其作用和测试验证的效果信息
"""

def image_entropy_calc(image_gray):
    hist, bins = np.histogram(image_gray.flatten(), 256, [0, 256])
    px = hist / float(image_gray.shape[0] * image_gray.shape[1])
    score = -np.sum([px[i] * log2(px[i] + 1e-10) for i in range(256)])
    return score

def image_detail_calc(image_gray, sobel_ksize=3):
    # 根据Sobel算子调整分数, 缩小到一定范围(200以下, 通常高倍长焦会在100以内)
    sobel_ratios = {1: 4, 3: 1, 5: 0.07, 7: 0.005}
    sobelx = cv2.Sobel(image_gray, cv2.CV_64F, 1, 0, ksize=sobel_ksize)
    sobely = cv2.Sobel(image_gray, cv2.CV_64F, 0, 1, ksize=sobel_ksize)
    sobel = cv2.magnitude(sobelx, sobely)
    score = np.mean(np.abs(sobel)) * sobel_ratios[sobel_ksize]
    return score

metrics = [
    {
        'metric_name': 'detail',
        'metric_class': 'nr',
        'current_enabled': True,
        'weights_path': '',
        'metric_type': 'math',
        'math_calc_params': {'sobel_ksize': 3},
        'math_calc_func': image_detail_calc,
        'score_weight': 0.1,
        'can_be_used': True,
        'lower_better': False,
        'value_range': [0, 100],  # 人工固化
        'created_at': 1968,
        'description': '使用Sobel算子灰度图的梯度幅值, 然后得出梯度幅值的平均值, 平均梯度越大, 表示图像边缘越锐利、细节越丰富, 清晰度越高'
    },
    {
        'metric_name': 'entropy',
        'metric_class': 'nr',
        'current_enabled': True,
        'weights_path': '',
        'metric_type': 'math',
        'math_calc_params': {},
        'math_calc_func': image_entropy_calc,
        'score_weight': 0.1,
        'can_be_used': True,
        'lower_better': False,
        'value_range': [0, 8],
        'created_at': 1948,
        'description': '熵是指图像的平均信息量，它从信息论的角度衡量图像中信息的多少，图像中的信息熵越大，说明图像包含的信息越多，适用于同构图对比'
    },
    {
        'metric_name': 'qualiclip',
        'metric_class': 'nr',
        'current_enabled': True,
        'weights_path': '',
        'metric_type': 'file',
        'score_weight': 0.2,
        'can_be_used': True,
        'lower_better': False,
        'value_range': [0, 1],
        'created_at': 2025,
        'description': '神经网络类指标, 准确性好, 使用CPU一张4k图约需要8~10s, NR首选'
    },
    {
        'metric_name': 'sfid',
        'metric_class': 'fr',
        'current_enabled': True,
        'weights_path': '',
        'metric_type': 'folder',
        'score_weight': 0.4,
        'can_be_used': True,
        'lower_better': True,
        'value_range': [0, 100],
        'created_at': 2024,
        'description': '准确性较好, 主要用于生成模型 (如GAN) 的评估，衡量生成图像与真实图像在‌特征空间分布‌上的相似性，其核心是‌统计学方法'
    },
    {
        'metric_name': 'lpips+',
        'metric_class': 'fr',
        'current_enabled': True,
        'weights_path': '',
        'metric_type': 'file',
        'score_weight': 0.2,
        'can_be_used': True,
        'lower_better': True,
        'value_range': [0, 1],
        'created_at': 2020,
        'description': '神经网络类指标, 准确性较好, 使用CPU一张4k图约需要2~3s, FR首选'
    }
]

如果觉得我的文章对您有用，请随意打赏。您的支持将鼓励我继续创作！

打赏支持

暂无回复。

需要登录后方可回复, 如果你还没有账号请点击这里注册。

灌水 使用 IQA-PyTorch 进行手机拍照超分 (SR) 的客观质量评估

灌水 使用 IQA-PyTorch 进行手机拍照超分 (SR) 的客观质量评估

快捷通道一：不看背景科普废话，直达本文主题

快捷通道二：不想看正文废话，直达运行主程序

图像质量评估-IQA

图像超分辨率（Image Super Resolution）

图像超分方法

图像质量评方法估分类

图像超分客观评估方法

评估方法分类

评估应用提示

相关工具推荐

IQA-PyTorch 及其应用

官方介绍翻译

仓库地址

环境准备

无法联网（HuggingFace）的环境

解决 load_state_dict 报错问题

运行主程序获取结果

指标使用说明

使用提示和个人见解

自定义模型训练与数据集使用

附一 图像评分主计算程序

附一: 经过实测的部分指标（一知半解，出错或不适用勿怪）

附二: metric_conf.py

灌水使用 IQA-PyTorch 进行手机拍照超分 (SR) 的客观质量评估

灌水使用 IQA-PyTorch 进行手机拍照超分 (SR) 的客观质量评估

附一图像评分主计算程序