YOLO 跨类别 NMS：让互斥类别互相抑制

YOLO 默认的 NMS 是按类别分开做的——同一个位置如果检测到一个 A 类目标和一个 B 类目标，两个都会保留。

大部分场景下，这是对的。一只猫和它身边的项圈，本来就该都被检测出来嘛。

但在游戏里识别卡片时就完蛋。同一张技能卡只能是 Skill Card: Active、Skill Card: Mental、Skill Card: Trap 三选一，模型有时候会同时给同一个位置打两个标签，置信度还都挺高。按类别独立 NMS 一处理，俩都保留下来——传到业务层就懵了，一张卡顶俩身份证，处理逻辑当场崩盘。

这种情况需要”跨类别 NMS”：指定一组互斥类别，让它们之间也参与互相抑制。

对比一下

标准 NMS：

- Active 卡: (100, 100, 200, 200), 置信度 0.9
- Mental 卡: (105, 105, 205, 205), 置信度 0.8

→ 两类各自做 NMS
→ Active 保留，Mental 保留

Agnostic NMS（指定 Active/Mental/Trap 为一组）：

- Active 卡: (100, 100, 200, 200), 置信度 0.9
- Mental 卡: (105, 105, 205, 205), 置信度 0.8

→ 三类作为一个整体做 NMS
→ Active 保留，Mental 被高 IoU 抑制掉

从模型元数据构建分组

类别名直接在 YOLO 导出的 meta 里。所以分组逻辑可以放在引擎层，自动识别哪些类别需要互斥：

class YoloModelFromONNX:
    def __init__(self, model_path: str):
        # ...
        self._agnostic_nms_groups = self._build_agnostic_nms_groups()

    def _build_agnostic_nms_groups(self) -> list | None:
        skill_labels = {"Skill Card: Active", "Skill Card: Mental", "Skill Card: Trap"}
        group = set()

        for cid, name in self._model_meta.names.items():
            if name in skill_labels:
                group.add(cid)

        return [group] if len(group) >= 2 else None

只有当组里有两个或以上类别时才返回——单一类别”自己抑制自己”没意义，直接交给标准 NMS 就行。

后处理实现

具体的 NMS 实现思路：把检测框先分桶——属于某个 agnostic 组的丢到同一个桶，普通类别按 class_id 自己一个桶。然后每个桶独立做 cv2.dnn.NMSBoxes：

import cv2
import numpy as np

def _postprocess(
    self,
    results: np.ndarray,
    conf_threshold: float,
    iou_threshold: float,
    agnostic_nms_groups: list | None = None,
) -> YoloResult:
    boxes = []
    scores = []
    class_ids = []

    for detection in results:
        class_scores = detection[5:]
        max_score = np.amax(class_scores)
        if max_score >= conf_threshold:
            class_id = np.argmax(class_scores)
            x, y, w, h = detection[:4]
            boxes.append([x - w/2, y - h/2, w, h])
            scores.append(max_score)
            class_ids.append(class_id)

    if not boxes:
        return YoloResult.empty()

    # 类别 → 组索引
    agnostic_map = {}
    if agnostic_nms_groups:
        for gi, group in enumerate(agnostic_nms_groups):
            for cid in group:
                agnostic_map[cid] = gi

    # 分桶
    grouped = {}
    for i, cid in enumerate(class_ids):
        if cid in agnostic_map:
            key = ("agnostic", agnostic_map[cid])
        else:
            key = ("class", int(cid))
        grouped.setdefault(key, []).append(i)

    # 每个桶独立 NMS
    final_indices = []
    for indices_in_group in grouped.values():
        group_boxes = [boxes[i] for i in indices_in_group]
        group_scores = [scores[i] for i in indices_in_group]

        keep = cv2.dnn.NMSBoxes(
            group_boxes, group_scores, conf_threshold, iou_threshold
        )

        if keep is not None:
            keep = keep.flatten()
            for idx in keep:
                final_indices.append(indices_in_group[idx])

    return YoloResult(
        boxes=np.array([boxes[i] for i in final_indices]),
        scores=np.array([scores[i] for i in final_indices]),
        class_ids=np.array([class_ids[i] for i in final_indices]),
    )

key 用元组 ("agnostic", gi) / ("class", cid)——是为了让两种分桶方式共存而不冲突。要是直接拿数字做 key，agnostic 组索引 1 会和 class_id 1 撞车，又是一个查到怀疑人生的 bug。

几种典型分组

技能卡的三种类型互斥：

1 2	`skill_labels = {"Skill Card: Active", "Skill Card: Mental", "Skill Card: Trap"} agnostic_nms_groups = [skill_labels]`

同一按钮的不同状态互斥（按下/正常/禁用）：

1 2	`button_labels = {"Button: Normal", "Button: Pressed", "Button: Disabled"} agnostic_nms_groups = [button_labels]`

多组可以并存。不同组之间不互相抑制——按钮不会去抑制技能卡：

agnostic_nms_groups = [
    {"Skill Card: Active", "Skill Card: Mental", "Skill Card: Trap"},
    {"Button: Normal", "Button: Pressed"},
]

性能其实没啥好担心的

直觉上”分了 k 个桶要做 k 次 NMS”会觉得慢，但实际上：

总框数没变，每个桶里的框数变少，NMS 是 O(n²) 的，反而每个桶各自更快
分桶本身 O(n)，可以忽略
多出来的常数开销，就是初始化几个 list/dict

实际跑下来开销和标准 NMS 一个量级——别想太多。

没事别乱用

写出来这么方便，但默认 NMS 已经是经过深思熟虑的设计。Agnostic NMS 只在”同一位置物理上不能同时有多个目标”的场景才合适：

一个像素只能属于一个 skill card 类型 → 适合
一只猫和它的项圈在同一位置 → 不适合，会把项圈抑制掉，那就闹笑话了

错用会造成漏检，而漏检比误检更难调试——输出看着啥都没问题，业务逻辑就是间歇性出错，那种感觉，谁查谁知道。