"Where is Segmentation headed?"
Segment Any Concept ?!
We organize concept segmentation into three levels based on the source of semantic identity — intrinsic appearance, contextual environment, or higher-order reasoning structures.
Targets whose identities are determined by intrinsic visual appearance alone. No external reference or reasoning required — recognition is self-contained from a single observation.
Targets defined by their relation to the environment — foreground-background contrast, optical properties, manufacturing deviation, or pathological contrast against healthy tissue.
Targets requiring multi-step reasoning over visual and textual evidence — cross-image correspondence, temporal dynamics, logical compatibility, or functional inference.
A four-step closed-loop paradigm from task induction to promptable segmentation, unifying efficient perception with rule-induced reasoning.
Meta-GRPO infers transferable visual rules from support examples
Proxy queries validate induced rules before target application
CTM maps MLLM hidden states to multi-dimensional concept groups
SAM 3 generates precise masks from enriched concept prompts
We introduce a three-level concept hierarchy covering CI, CD, and CR concepts, pushing segmentation beyond category recognition.
Meta-GRPO enables the model to infer transferable task rules from visual demonstrations and apply them deductively to unseen queries.
We map MLLM reasoning states into implicit concept tokens in the SAM 3 prompt space, enabling reasoning-aware segmentation without fine-tuning SAM 3.
The Shortcut Router dynamically balances SAM 3 efficiency and reasoning depth, enabling fast perception for simple cases and deeper reasoning for complex concepts.
Evaluated on 16 benchmarks spanning natural, industrial, medical, and reasoning-intensive scenarios. ConceptSeg-R1 consistently outperforms all competitors across the full cognitive spectrum.
| Method | CI: Diverse Classes | CD: Camouflage | CD: Saliency | CD: Med. Lesion | CR: Consistency | CR: Difference | Mean | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fω | mIoU | Fω | mIoU | Fω | mIoU | Fω | mIoU | Fω | mIoU | Fω | mIoU | Fω | mIoU | |
| SAM 3 | 89.5 | 91.8 | 51.3 | 61.4 | 38.1 | 59.0 | 36.3 | 48.9 | — | — | — | — | — | — |
| SAM3-Agent-3B | 55.1 | 72.3 | 45.4 | 64.8 | 59.5 | 72.1 | 17.1 | 41.4 | 20.3 | 43.2 | 3.1 | 30.9 | 29.9 | 52.3 |
| SAM3-Agent-7B | 76.8 | 84.5 | 58.9 | 71.7 | 74.4 | 80.2 | 25.1 | 42.5 | 26.7 | 46.8 | 26.7 | 53.1 | 43.1 | 59.3 |
| LENS-3B | 74.4 | 82.7 | 58.1 | 73.1 | 76.9 | 81.1 | 56.0 | 69.1 | 26.7 | 55.2 | 25.6 | 56.8 | 48.8 | 66.7 |
| Seg-Zero-7B | 86.8 | 90.1 | 75.0 | 81.1 | 72.8 | 78.6 | 60.2 | 69.8 | 16.1 | 49.6 | 5.5 | 49.2 | 45.2 | 64.7 |
| ConceptSeg-R1-3B | 89.9 | 92.0 | 83.7 | 87.7 | 89.0 | 90.5 | 69.0 | 77.4 | 63.9 | 77.0 | 52.7 | 71.8 | 70.7 | 80.1 |
| ConceptSeg-R1-7B | 89.9 | 92.1 | 84.8 | 88.3 | 92.7 | 93.5 | 72.3 | 79.3 | 70.1 | 81.0 | 57.0 | 75.0 | 74.9 | 82.8 |
ConceptSeg-R1-3B improves overall mIoU from 60.6 → 62.6 without task-specific training. Positive Δ in red.
| Method | Road | Side. | Build. | Wall | Fence | Pole | T.Light | T.Sign | Veget. | Terra. | Sky | Person | Rider | Car | Truck | Bus | Train | Motor. | Bicycle | Mean |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SAM 3 | 91.4 | 82.2 | 84.4 | 19.1 | 45.4 | 61.1 | 64.7 | 67.2 | 88.4 | 5.7 | 92.6 | 76.2 | 39.3 | 87.6 | 42.6 | 54.3 | 29.2 | 46.6 | 72.9 | 60.6 |
| ConceptSeg-R1-3B | 97.9 | 82.6 | 83.9 | 18.8 | 47.3 | 60.8 | 60.2 | 66.6 | 87.3 | 7.2 | 93.7 | 74.8 | 39.4 | 89.4 | 52.8 | 62.3 | 42.6 | 50.5 | 72.1 | 62.6 |
| Δ Gains | +6.5 | +0.4 | −0.5 | −0.3 | +1.9 | −0.3 | −4.5 | −0.6 | −1.1 | +1.5 | +1.1 | −1.4 | +0.1 | +1.8 | +10.2 | +8.0 | +13.4 | +3.9 | −0.8 | +2.0 |
† fine-tuned on ReasonSeg training set. ConceptSeg-R1-7B achieves best Test gIoU and cIoU without fine-tuning.
| Method | Venue | Val gIoU ↑ | Val cIoU ↑ | Test gIoU ↑ | Test cIoU ↑ |
|---|---|---|---|---|---|
| LISA-7B† | CVPR'24 | 52.9 | 54.0 | 55.6 | 56.9 |
| InstructSeg-3B† | ICCV'25 | 61.9 | 65.2 | — | — |
| LENS-3B† | AAAI'26 | 62.1 | 64.9 | 57.2 | 58.0 |
| SAM4MLLM-7B | ECCV'24 | 46.7 | 48.1 | — | — |
| Seg-Zero-3B | arXiv'25 | 58.2 | 53.1 | 56.1 | 48.6 |
| Seg-Zero-7B | arXiv'25 | 62.6 | 62.0 | 57.5 | 52.0 |
| SAM-R1-7B | NeurIPS'25 | 64.0 | 55.8 | 60.2 | 54.3 |
| SAM3-Agent-7B | ICLR'26 | 62.2 | 49.1 | 63.0 | 53.5 |
| DPAD-7B | CVPR'26 | 63.1 | 61.2 | 57.7 | 54.4 |
| ConceptSeg-R1-3B | — | 62.8 | 54.0 | 61.2 | 49.3 |
| ConceptSeg-R1-7B | — | 64.4 | 55.1 | 63.0 | 59.3 |
If you find this work useful, please consider starring ⭐ the repo and citing our paper.
@article{zhao2026conceptseg, title = {ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning}, author = {Zhao, Yuan and Pang, Youwei and Zuo, Jiaming and Ji, Wei and Zhou, Kailai and Fan, Bin and Cao, Yunkang and Zhang, Lihe and Liu, Xiaofeng and Lu, Huchuan and Lin, Weisi and Tao, Dacheng and Zhao, Xiaoqi}, journal = {arXiv preprint}, year = {2026} }