
Originally published on Swarm Substack — subscribe for full updates and insights.
Swarm (Subnet 124) is a perpetual, on-chain tournament for drone autonomy. If you already speak Bittensor, think of Swarm as a specialized marketplace where validators continuously mint reproducible flight challenges and miners submit pre-trained policies (frozen autopilots) that are scored headlessly and paid according to performance.
If you don’t speak Bittensor is like a global talent show for AI. Instead of singers or dancers, the contestants are AI models. Validators are the judges: they set challenges and decide what miner performed best. The blockchain is the scoreboard that everyone trusts, and the token rewards are the prizes. Over time, the best models earn more “reputation” and more rewards, while weaker ones fall away. Each subnet specializes in a particular skill — language, vision, or in our case, drone flight. We are subnet #124 in the ecosystem.
Thanks for reading! Subscribe for free to receive new posts and support my work.
What is a policy in Swarm?
A policy is a function observation→action that runs at control rate (frame-by-frame). Inputs generally include drone state (pose, velocity), mission context (goal vector), onboard sensors like LiDAR and GPS, and environmental signals. The outputs translate into low-level motor commands, adjusting the thrust produced by each of the quadcopter’s rotors and the torques that control its behaviour.
Swarm is explicitly pre-trained: miners train offline in their own stacks, freeze weights, and submit their trained models for evaluation. No online learning, no human-in-the-loop. During evaluation, the validator loads your submitted policy file and executes it exactly as provided.
The miners role: bring your best pilot
In Swarm, miners are the participants who create and submit drone autopilot policies. They are free to use any approach that can produce a viable pilot: some rely on reinforcement learning methods like PPO, SAC, or DDPG; others use imitation learning, evolutionary search, or combinations of these approaches (for example, training a PPO agent using imitation learning); and there are those who continue to refine classical control algorithms with new heuristics and planning strategies. Hybrid approaches are also common, where a planner might chart a global route while a neural policy handles local avoidance and micro-adjustments in flight.
Regardless of the method, one requirement stands above all others: the policy must be exportable. Miners need to provide a trained model that validators can execute headlessly. Each submitted model is then recorded on-chain as a PolicyRef, ensuring provenance and reproducibility. Miners must build autopilots that can consistently outperform the competition under Swarm’s evaluation rules.
Validator role: mint the arena, judge the flights
Validators are the counterpart to miners, keeping the entire competition alive and credible. Their role begins with generating MapTasks, which define a mission for the drone: where it starts, where it must go, and under what constraints of geometry, time, or environment it must operate. Each task is seeded and reproducible, guaranteeing that every submitted policy faces the exact same scenario.
Once tasks are created, validators load the miners’ trained models and run them in simulation. This evaluation is strictly headless: the miner has no influence over the process once the model is submitted. The policy is stepped through the physics simulation without any live adjustments, secret knobs, or side channels. After the run, validators compute a score that prioritizes mission success: reaching and stabilizing at the goal. Among those that succeed, additional weight is given to time efficiency. These scores are then written back on-chain, adjusting the miners’ weights and influencing how emissions are split. By design, Swarm follows a winner-takes-all reward mechanism, keeping the competition intense and pushing participants to continually raise their game.
So why pre-trained policies?
The design choice is deliberate:
- Reproducibility. Frozen weights guarantee that a run is a run that anyone can replay policy + seed and match the score.
- Security & fairness. Headless evaluation prevents in-flight adjustments, logging tricks, or environment peeking.
- Iteration speed. The loop is tight: train offline → submit → see on-chain metrics → iterate. Miners improve weekly without changing subnet rules.
- Real-time constraints. Drones must run inference 50 times per second, so the model needs to live onboard during flight, remote inference is not possible.
Headless evaluation: the guardrails
Swarm’s headless rule is the our greatest equalizer:
- Determinism. Same weights + same seed ⇒ same trajectory and score.
- Auditability. Anyone can verify a score with the model and seed.
- Isolation. Validators never execute opaque miner codepaths; they only run the exported policy file inside the standard harness.
Continuous tournament
Unlike other competitions, Swarm never ends.
- Validators keep generating new MapTasks.
- Miners keep submitting updated policies.
- Scores keep adjusting weights on-chain.
Which turns us into an evolutionary pressure cooker for flight intelligence.
Swarm is where drone pilots (the AI kind) get to prove themselves, crash, learn, and come back stronger. The rules are simple, the competition is tough, and the leaderboard doesn’t lie. If you’re curious about the future of autonomous flight, or think you’ve got a model that can out-fly the rest, join the swarm.
Be the first to comment