arXiv 2026 · NTU AI4X Team

ConceptSeg-R1:
Segment Any Concept via
Meta-Reinforcement Learning

Yuan Zhao^1,2*, Youwei Pang^3*, Jiaming Zuo², Wei Ji⁴, Kailai Zhou³, Bin Fan⁵, Yunkang Cao⁶, Lihe Zhang^1‡, Xiaofeng Liu⁴, Huchuan Lu¹, Weisi Lin³, Dacheng Tao³, Xiaoqi Zhao^3‡

¹ Dalian University of Technology ² X3000 Inspection Co., Ltd ³ Nanyang Technological University ⁴ Yale University
⁵ Northwestern Polytechnical University ⁶ Hunan University * Equal contribution ‡ Corresponding authors

📄 arXiv Paper Code 🤗 Model (7B) 🤗 Dataset

Concept Hierarchy

Three-Level Concept Taxonomy

We organize concept segmentation into three levels based on the source of semantic identity — intrinsic appearance, contextual environment, or higher-order reasoning structures.

Level 1 · CI

Context-Independent

Targets whose identities are determined by intrinsic visual appearance alone. No external reference or reasoning required — recognition is self-contained from a single observation.

Living classesArtifactsFine-grainedUltra rare

Level 2 · CD

Context-Dependent

Targets defined by their relation to the environment — foreground-background contrast, optical properties, manufacturing deviation, or pathological contrast against healthy tissue.

SaliencyCamouflageTransparencyAnomalyLesion

Level 3 · CR

Context-Reasoning

Targets requiring multi-step reasoning over visual and textual evidence — cross-image correspondence, temporal dynamics, logical compatibility, or functional inference.

ConsistencyDifferenceLogical rationalitySpatio-temporal

Method

ConceptSeg-R1 Framework

A four-step closed-loop paradigm from task induction to promptable segmentation, unifying efficient perception with rule-induced reasoning.

⬡

Task Induction

Meta-GRPO infers transferable visual rules from support examples

✔

Rule Verification

Proxy queries validate induced rules before target application

◈

Concept Translation

CTM maps MLLM hidden states to multi-dimensional concept groups

◎

Segmentation

SAM 3 generates precise masks from enriched concept prompts

Key Contribution #1

From Objects to Concepts

We introduce a three-level concept hierarchy covering CI, CD, and CR concepts, pushing segmentation beyond category recognition.

Key Contribution #2

From Instance Solving to Rule Induction

Meta-GRPO enables the model to infer transferable task rules from visual demonstrations and apply them deductively to unseen queries.

Key Contribution #3

Latent Concept Tokens for Frozen SAM 3

We map MLLM reasoning states into implicit concept tokens in the SAM 3 prompt space, enabling reasoning-aware segmentation without fine-tuning SAM 3.

Key Contribution #4

From Heavy Reasoning to Adaptive Inference

The Shortcut Router dynamically balances SAM 3 efficiency and reasoning depth, enabling fast perception for simple cases and deeper reasoning for complex concepts.

Results

State-of-the-Art across CI / CD / CR

Evaluated on 16 benchmarks spanning natural, industrial, medical, and reasoning-intensive scenarios. ConceptSeg-R1 consistently outperforms all competitors across the full cognitive spectrum.

CI Concepts

92.1

mIoU — 7B model

SoTA

CD Concepts

83.3

mIoU — 7B model

+23.7 vs SAM3-Agent-7B

CR Concepts

79.2

mIoU — 7B model

+29.8 vs Seg-Zero-7B

Table 1. Concept Segmentation Benchmarks

Method	CI: Diverse Classes		CD: Camouflage		CD: Saliency		CD: Med. Lesion		CR: Consistency		CR: Difference		Mean
Method	F^ω	mIoU	F^ω	mIoU	F^ω	mIoU	F^ω	mIoU	F^ω	mIoU	F^ω	mIoU	F^ω	mIoU
SAM 3	89.5	91.8	51.3	61.4	38.1	59.0	36.3	48.9	—	—	—	—	—	—
SAM3-Agent-3B	55.1	72.3	45.4	64.8	59.5	72.1	17.1	41.4	20.3	43.2	3.1	30.9	29.9	52.3
SAM3-Agent-7B	76.8	84.5	58.9	71.7	74.4	80.2	25.1	42.5	26.7	46.8	26.7	53.1	43.1	59.3
LENS-3B	74.4	82.7	58.1	73.1	76.9	81.1	56.0	69.1	26.7	55.2	25.6	56.8	48.8	66.7
Seg-Zero-7B	86.8	90.1	75.0	81.1	72.8	78.6	60.2	69.8	16.1	49.6	5.5	49.2	45.2	64.7
ConceptSeg-R1-3B	89.9	92.0	83.7	87.7	89.0	90.5	69.0	77.4	63.9	77.0	52.7	71.8	70.7	80.1
ConceptSeg-R1-7B	89.9	92.1	84.8	88.3	92.7	93.5	72.3	79.3	70.1	81.0	57.0	75.0	74.9	82.8

Table 2. Zero-Shot Performance on Cityscapes (mIoU)

ConceptSeg-R1-3B improves overall mIoU from 60.6 → 62.6 without task-specific training. Positive Δ in red.

Method	Road	Side.	Build.	Wall	Fence	Pole	T.Light	T.Sign	Veget.	Terra.	Sky	Person	Rider	Car	Truck	Bus	Train	Motor.	Bicycle	Mean
SAM 3	91.4	82.2	84.4	19.1	45.4	61.1	64.7	67.2	88.4	5.7	92.6	76.2	39.3	87.6	42.6	54.3	29.2	46.6	72.9	60.6
ConceptSeg-R1-3B	97.9	82.6	83.9	18.8	47.3	60.8	60.2	66.6	87.3	7.2	93.7	74.8	39.4	89.4	52.8	62.3	42.6	50.5	72.1	62.6
Δ Gains	+6.5	+0.4	−0.5	−0.3	+1.9	−0.3	−4.5	−0.6	−1.1	+1.5	+1.1	−1.4	+0.1	+1.8	+10.2	+8.0	+13.4	+3.9	−0.8	+2.0

Table 3. Zero-Shot Performance on ReasonSeg

† fine-tuned on ReasonSeg training set. ConceptSeg-R1-7B achieves best Test gIoU and cIoU without fine-tuning.

Method	Venue	Val gIoU ↑	Val cIoU ↑	Test gIoU ↑	Test cIoU ↑
LISA-7B†	CVPR'24	52.9	54.0	55.6	56.9
InstructSeg-3B†	ICCV'25	61.9	65.2	—	—
LENS-3B†	AAAI'26	62.1	64.9	57.2	58.0
SAM4MLLM-7B	ECCV'24	46.7	48.1	—	—
Seg-Zero-3B	arXiv'25	58.2	53.1	56.1	48.6
Seg-Zero-7B	arXiv'25	62.6	62.0	57.5	52.0
SAM-R1-7B	NeurIPS'25	64.0	55.8	60.2	54.3
SAM3-Agent-7B	ICLR'26	62.2	49.1	63.0	53.5
DPAD-7B	CVPR'26	63.1	61.2	57.7	54.4
ConceptSeg-R1-3B	—	62.8	54.0	61.2	49.3
ConceptSeg-R1-7B	—	64.4	55.1	63.0	59.3

Citation

BibTeX

If you find this work useful, please consider starring ⭐ the repo and citing our paper.

@article{zhao2026conceptseg,
  title   = {ConceptSeg-R1: Segment Any Concept via
              Meta-Reinforcement Learning},
  author  = {Zhao, Yuan and Pang, Youwei and Zuo, Jiaming
              and Ji, Wei and Zhou, Kailai and Fan, Bin
              and Cao, Yunkang and Zhang, Lihe
              and Liu, Xiaofeng and Lu, Huchuan
              and Lin, Weisi and Tao, Dacheng
              and Zhao, Xiaoqi},
  journal = {arXiv preprint},
  year    = {2026}
}

ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning

Three-Level Concept Taxonomy

Context-Independent

Context-Dependent

Context-Reasoning

ConceptSeg-R1 Framework

Task Induction

Rule Verification

Concept Translation

Segmentation

From Objects to Concepts

From Instance Solving to Rule Induction

Latent Concept Tokens for Frozen SAM 3

From Heavy Reasoning to Adaptive Inference

State-of-the-Art across CI / CD / CR

Table 1. Concept Segmentation Benchmarks

Table 2. Zero-Shot Performance on Cityscapes (mIoU)

Table 3. Zero-Shot Performance on ReasonSeg

BibTeX

ConceptSeg-R1:
Segment Any Concept via
Meta-Reinforcement Learning