arXiv 2026  ·  NTU AI4X Team

ConceptSeg-R1:
Segment Any Concept via
Meta-Reinforcement Learning

Yuan Zhao1,2*Youwei Pang3*Jiaming Zuo2Wei Ji4Kailai Zhou3Bin Fan5Yunkang Cao6Lihe Zhang1‡Xiaofeng Liu4Huchuan Lu1Weisi Lin3Dacheng Tao3Xiaoqi Zhao3‡
1 Dalian University of Technology   2 X3000 Inspection Co., Ltd   3 Nanyang Technological University   4 Yale University
5 Northwestern Polytechnical University   6 Hunan University    * Equal contribution   ‡ Corresponding authors
📄 arXiv Paper Code 🤗 Model (7B) 🤗 Dataset
"Where is Segmentation headed?"

Segment Any Concept ?!


Concept Hierarchy

Three-Level Concept Taxonomy

We organize concept segmentation into three levels based on the source of semantic identity — intrinsic appearance, contextual environment, or higher-order reasoning structures.

Level 1 · CI

Context-Independent

Targets whose identities are determined by intrinsic visual appearance alone. No external reference or reasoning required — recognition is self-contained from a single observation.

Living classesArtifactsFine-grainedUltra rare
Level 2 · CD

Context-Dependent

Targets defined by their relation to the environment — foreground-background contrast, optical properties, manufacturing deviation, or pathological contrast against healthy tissue.

SaliencyCamouflageTransparencyAnomalyLesion
Level 3 · CR

Context-Reasoning

Targets requiring multi-step reasoning over visual and textual evidence — cross-image correspondence, temporal dynamics, logical compatibility, or functional inference.

ConsistencyDifferenceLogical rationalitySpatio-temporal
ConceptSeg_Tree_new
Method

ConceptSeg-R1 Framework

A four-step closed-loop paradigm from task induction to promptable segmentation, unifying efficient perception with rule-induced reasoning.

Task Induction

Meta-GRPO infers transferable visual rules from support examples

Rule Verification

Proxy queries validate induced rules before target application

Concept Translation

CTM maps MLLM hidden states to multi-dimensional concept groups

Segmentation

SAM 3 generates precise masks from enriched concept prompts

Architecture
Key Contribution #1

From Objects to Concepts

We introduce a three-level concept hierarchy covering CI, CD, and CR concepts, pushing segmentation beyond category recognition.

Key Contribution #2

From Instance Solving to Rule Induction

Meta-GRPO enables the model to infer transferable task rules from visual demonstrations and apply them deductively to unseen queries.

Key Contribution #3

Latent Concept Tokens for Frozen SAM 3

We map MLLM reasoning states into implicit concept tokens in the SAM 3 prompt space, enabling reasoning-aware segmentation without fine-tuning SAM 3.

Key Contribution #4

From Heavy Reasoning to Adaptive Inference

The Shortcut Router dynamically balances SAM 3 efficiency and reasoning depth, enabling fast perception for simple cases and deeper reasoning for complex concepts.

Results

State-of-the-Art across CI / CD / CR

Evaluated on 16 benchmarks spanning natural, industrial, medical, and reasoning-intensive scenarios. ConceptSeg-R1 consistently outperforms all competitors across the full cognitive spectrum.

CI Concepts
92.1
mIoU — 7B model
SoTA
CD Concepts
83.3
mIoU — 7B model
+23.7 vs SAM3-Agent-7B
CR Concepts
79.2
mIoU — 7B model
+29.8 vs Seg-Zero-7B

Table 1. Concept Segmentation Benchmarks

Method CI: Diverse Classes CD: Camouflage CD: Saliency CD: Med. Lesion CR: Consistency CR: Difference Mean
FωmIoU FωmIoU FωmIoU FωmIoU FωmIoU FωmIoU FωmIoU
SAM 3 89.591.8 51.361.4 38.159.0 36.348.9
SAM3-Agent-3B 55.172.3 45.464.8 59.572.1 17.141.4 20.343.2 3.130.9 29.952.3
SAM3-Agent-7B 76.884.5 58.971.7 74.480.2 25.142.5 26.746.8 26.753.1 43.159.3
LENS-3B 74.482.7 58.173.1 76.981.1 56.069.1 26.755.2 25.656.8 48.866.7
Seg-Zero-7B 86.890.1 75.081.1 72.878.6 60.269.8 16.149.6 5.549.2 45.264.7
ConceptSeg-R1-3B 89.992.0 83.787.7 89.090.5 69.077.4 63.977.0 52.771.8 70.780.1
ConceptSeg-R1-7B 89.992.1 84.888.3 92.793.5 72.379.3 70.181.0 57.075.0 74.982.8

Table 2. Zero-Shot Performance on Cityscapes (mIoU)

ConceptSeg-R1-3B improves overall mIoU from 60.6 → 62.6 without task-specific training. Positive Δ in red.

Method RoadSide.Build.WallFencePole T.LightT.SignVeget.Terra.Sky PersonRiderCarTruckBus TrainMotor.BicycleMean
SAM 3 91.482.284.419.145.461.1 64.767.288.45.792.6 76.239.387.642.654.3 29.246.672.960.6
ConceptSeg-R1-3B 97.982.683.918.847.360.8 60.266.687.37.293.7 74.839.489.452.862.3 42.650.572.162.6
Δ Gains +6.5+0.4−0.5−0.3+1.9−0.3 −4.5−0.6−1.1+1.5+1.1 −1.4+0.1+1.8+10.2+8.0 +13.4+3.9−0.8+2.0

Table 3. Zero-Shot Performance on ReasonSeg

† fine-tuned on ReasonSeg training set. ConceptSeg-R1-7B achieves best Test gIoU and cIoU without fine-tuning.

MethodVenue Val gIoU ↑Val cIoU ↑ Test gIoU ↑Test cIoU ↑
LISA-7B†CVPR'2452.954.055.656.9
InstructSeg-3B†ICCV'2561.965.2
LENS-3B†AAAI'2662.164.957.258.0
SAM4MLLM-7BECCV'2446.748.1
Seg-Zero-3BarXiv'2558.253.156.148.6
Seg-Zero-7BarXiv'2562.662.057.552.0
SAM-R1-7BNeurIPS'2564.055.860.254.3
SAM3-Agent-7BICLR'2662.249.163.053.5
DPAD-7BCVPR'2663.161.257.754.4
ConceptSeg-R1-3B62.854.061.249.3
ConceptSeg-R1-7B64.455.163.059.3
Citation

BibTeX

If you find this work useful, please consider starring ⭐ the repo and citing our paper.

@article{zhao2026conceptseg,
  title   = {ConceptSeg-R1: Segment Any Concept via
              Meta-Reinforcement Learning},
  author  = {Zhao, Yuan and Pang, Youwei and Zuo, Jiaming
              and Ji, Wei and Zhou, Kailai and Fan, Bin
              and Cao, Yunkang and Zhang, Lihe
              and Liu, Xiaofeng and Lu, Huchuan
              and Lin, Weisi and Tao, Dacheng
              and Zhao, Xiaoqi},
  journal = {arXiv preprint},
  year    = {2026}
}