Home /Research /FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting

LEARNING

FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting

Yi Liu, Jingyu Song, Vedanth Kallakuri, Katherine A. Skinner

Year: 2025
Access: Open access

Abstract

Analyzing underwater fish imagery is critical for ecological monitoring but remains difficult due to visual degradation and costly annotations. We introduce FishDetector-R1, a unified MLLM-based framework for fish detection, segmentation, and counting under weak supervision. On the DeepFish dataset, our framework achieves substantial gains over baselines, improving AP by 20% and mIoU by 10%, while reducing MAE by 30% and GAME by 35%. These improvements stem from two key components: a novel detect-to-count prompt that enforces spatially consistent detections and counts, and Reinforcement Learning from Verifiable Reward (RLVR) with a complementary scalable paradigm leveraging sparse point labels. Ablation studies further validate the effectiveness of this reward design. Moreover, the improvement generalizes well to other underwater datasets, confirming strong cross-domain robustness. Overall, FishDetector-R1 provides a reliable and scalable solution for accurate marine visual understanding via weak supervision. The project page for FishDetector-R1 is https://umfieldrobotics.github.io/FishDetector-R1.

Keywords

cs.CVcs.CYcs.ROeess.IV

FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting

Abstract

Keywords

Related papers

The Organization of Behavior

Fractional Brownian Motions, Fractional Noises and Applications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A guide to deep learning in healthcare