METHODOLOGY
Step 1 – Data Collection
We refresh and clean publicly available UFC data before each event to ensure accuracy. This includes:
-
Fighter performance: Striking (SLpM, SAPM, accuracy, defense, pace) and Grappling (takedowns, takedown defense, control time, submission attempts)
-
Physical context: Height, reach, stance, weight class, age, and days since last fight
-
Market data: Opening/closing odds and line movement, which capture public perception and late-breaking news
​
Collecting and standardizing this data ensures our model reflects both historical performance and up-to-date context.
​
Step 2 – Feature Engineering
Raw stats only tell part of the story. To capture a fuller picture, we engineer new features such as:
-
Activity & form: time since last fight, win/loss streaks, and recent performance trends
-
Comparative gaps: differences between opponents on key stats (e.g., TD Accuracy − Opponent TDD, SLpM − Opponent SAPM)
-
Market signals: historical/implied probabilities and line movement to account for information flow and public sentiment
We then run feature importance analysis to identify the variables that drive accuracy and remove those that add noise.
​
Step 3 – Machine Learning Predictions
We test a range of algorithms (logistic regression, random forests, gradient boosting) and select the top performer based on backtested accuracy.
-
Current model: Gradient-boosted decision trees (XGBoost) with a logistic objective (predicts win probability)
-
Why XGBoost: Strongest performance on tabular sports data, handles nonlinear interactions and missing values, minimal preprocessing needed
-
Output: A win probability for each fighter
-
Performance: Backtested on 6,000+ historical events at ~70.1% accuracy (with ongoing monitoring to detect drift)
​
Step 4 – Betting Edge Analysis
Predictions are only useful if they beat the market. We compare our model’s win probability against sportsbook implied odds:
-
Edge formula: Edge = model probability − implied probability
-
Action rule: Only flag bets where Edge > 0 (positive expected value)
We then apply the Kelly Criterion, a risk management formula, to suggest how much to stake relative to bankroll.​ All of this serves as a guideline on how to go about betting
​​
​​​
Step 5 – Transparency & Tracking
The model guides us, but we also apply tape analysis to account for stylistic dynamics that stats may miss.
​
We report wins, losses, ROI, and units so followers can judge performance directly.
-
BetMMA: Full verified betting record -- StrikeVision betting performance can be found on BetMMA
-
Performance dashboard: Overall model prediction can be found in the performance tracking view.
​​​​
​
Roadmap & Limitations
Current limitations
-
MMA carries a high degree of variance—small moments such as cuts, slips, or referee decisions can dramatically swing outcomes. Our model also faces blind spots with hidden information like undisclosed injuries, severe weight cuts, or camp and motivation issues that aren’t captured in the data. Confidence is reduced when dealing with low-sample fighters such as prospects, debutants, or short-notice replacements, since limited statistics make outcomes harder to predict. Additionally, odds can shift sharply with late-breaking news, creating market timing challenges that may outpace our snapshots. Finally, while we clean and normalize extensively, public data still carries risks of gaps or name mismatches that can introduce residual noise.
​
What we’re improving next
-
More experimentation with new feature creation
-
Coverage expansion: Adding DWCS/TUF/Road to UFC to strengthen generalization and handle newcomers.
-
Low-data handling: More robust priors and market-aware fallbacks for debut/short-notice scenarios.
.jpg)



