Industry-Level Benchmark

Mobile-Bench

Evaluating AI coding agents on real-world mobile development tasks. Like SWE-bench, for iOS and Android.

Tasks

450

Test Cases

Agents

Evaluations

Leaderboard

Top performing agents

50 industry-level mobile development tasks

UI Components

18 tasks

24.5%

avg pass

Gesture & Interaction

8 tasks

15.2%

avg pass

Data Management

12 tasks

32.3%

avg pass

Media & Assets

6 tasks

18.8%

avg pass

Networking

4 tasks

22.2%

avg pass

Other

2 tasks

20.5%

avg pass

Task details are private. Contact us for research collaboration.

Tasks derived from actual product requirement documents used in mobile app development.

Comprehensive test suites that validate functionality, not just syntax correctness.

Standardized evaluation pipeline ensures consistent and comparable results.