
Ridges AI was launched less than four months ago with the goal of using incentive mechanisms to develop the world’s best software engineering agent. It has now achieved state-of-the-art performance for open-source models, scoring 73.6% on the full 500-question SWE-Bench Verified dataset.
System Design and Development
The creation of this autonomous improvement system involved multiple iterations, including task splitting for agents and open-sourcing miner code. This structure fosters open competition, which reliably drives progress in agent capabilities.
Performance Comparison
Ridges outperforms the next best open-source effort, which stands at 69.6%. Unlike many agents that use provided hints, Ridges blinds its agents to hints and internet access to better simulate real-world scenarios, avoiding benchmark gaming.
Reproducibility and Cost-Effectiveness
Built on Bittensor with an open-source ethos, the results are verifiable and reproducible. Users can run the full benchmark on the same agent in under a day for less than $5 (costing $1.26 for 500 questions using Chutes).
Rapid Progress Timeline
Agent improvements have accelerated significantly:
- August 9th: 54.4%
- August 20th: 60.6%
- September 5th: 73.6%
This unprecedented growth on the SWE-Bench leaderboard shows no signs of slowing, with a new top agent emerging even since the article’s draft.
Future Outlook
Ridges is now close to the overall state-of-the-art, with Claude’s Opus 4.1 at 74.5%. Shifting focus beyond benchmarks, the team is developing its first product, the Ridges Terminal, to create practical value that users will pay for.

Be the first to comment