The ADCC Championship happens every two years and is considered the Olympics of submission grappling. The 2019 event concluded with a bang in September and crowned some new and some veteran champions. As an exercise in data science, I thought it would be fun to take the results and see how accurate a machine learning model could predict if a match would end in a win or loss. Since we already have the winners, I won't be predicting a future outcome, but instead exploring which characteristics of the matches and the fighters factor into the results. I'll explain the process as I go.
So, what data do we have to look at? I have collected match data, including fighter name, gender, weight class, match round, points scored, time of match, day, match number, how the match was won, win/loss result, and the medal earned.
What is the basic layout of the event?
A total of 98 fighters competed at ADCC this year. There were 5 weight divisions for the men (-66 kg, -77 kg, -88 kg, -99 kg, +99 kg), plus one Absolute division, and two weight divisions for the women (-66 kg and +66 kg). 16 men competed at each weight and 8 females in both weight classes. The event concluded with a Super Fight between the returning super fight champion against the 2017 Absolute Champion as the challenger. All champions of the 2017 ADCC received a ticket to compete. 38 fighters won ADCC Trials events across the world for their entrance and 58 additional fighters were invited to participate. Invitations were extended based on past performances at ADCC events, trials, or overall world ranking, which made for an interesting mix of fighters across all brackets.
Gordan Ryan and Marcus Buchecha each fought 8 matches, 4 in their weight class and 4 in the absolute. Most fighters had 4 matches, either making it to the finals of their division, fighting for third place, or having a mix of weight and absolute matches.
There were 113 matches total, 15 regular matches across each men's weight division for gold plus one for third place, 16 for the absolute, and 8 matches for each of the two women's divisions. Nearly half (48%) of the matches were won by scoring points, a third (31%) by submission, 19% by referee's decision when the score was tied or no score, and 3% by injury forfeit, mostly in the consolation round for third place. Each male fighter had to win 4 matches to come away with gold and each female fighter had to win 3 matches.
Of the available match data, we can see that there were far more male competitors than females, as the rounds moved forward half of the fighters were eliminated, most matches ended in no score with zero points, and more matches took place on Saturday. It will be more interesting to see if any of these features have a relationship to wins and losses, thus helping us predict the outcome of a match. If you followed the event, there are no surprises here.
Next, let's take a look at which teams had the most fighters and brought home medals at this year's event.
Alliance competed the most most matches at ADCC 2019, followed by Atos, Renzo Gracie, and Checkmat. The best performing teams included Alliance, which came away with 2 gold medals, 2 silvers, and a bronze. Atos brought home 3 gold medals and 2 bronze (one by forfeit and could be 4 golds if you include Bianca Basilio who normally represents Atos at IBJJF tournaments). Renzo Gracie won 2 gold, 1 silver, and 1 bronze - including the most coveted Absolute gold medal by Gordan Ryan.
If you exclude female fighters and just look at the males, also excluding the Super Fight, Renzo Gracie won 2 gold, 1 silver, and 1 bronze. Atos won 2 gold and 2 bronze and Alliance won 1 gold and 2 silver.
Fighter age and experience had a positive correlation.
Most fighters fell between the ages of 25 and 30. The tournament saw two fighters under the age of 20 years old and the oldest fighter was 39 years. For years experience as a black belt, there were a handful of fighters that were blue, purple, and brown belts, as well as the belt rank was missing for several fighters that were primarily MMA fighters or wrestlers. However, fighters predominantly fell between 2.5 and 5 years experience at the black belt level. As it relates to our data science question, I found a positive relationship between age and experience in terms of matches resulting in a win.
On that note, I looked at the length of matches. Most matches were either 10 or 15 minutes resulting in a win by score or referee's decision. When I compared the relationship between the length of a match and the fighter's age, I found that there was a slight negative correlation. As the age of the fighter went down, the length of the match increased.
Previous ADCC experience was very interesting to analyze in relation to matches ending in wins or losses.
From the charts here, we see that if you had previously placed at an ADCC tournament and the more ADCC medals you had, you were more likely to win a match than lose. However, previous ADCC experience without placement did not correlate to wins, in fact it resulted in more in losses. In addition, previous experience at an ADCC trials event and the number of medals from trials events resulted in more losses than wins this year. Therefore, trials winners did not equate to placement in 2019, but a previous ADCC tournament medal did.
In fact, here is the win / loss ratio for the 2017 returning champions compared to the trials winners and invitees. Invitees out-perform the trials winners and returning champions performed the best.
In the last section of exploring the data, several characteristics were categories that I assigned a numeric value solely for the purpose of visualizing the data.
The first thing that drew my attention was the length of a match and the round number.
To understand the chart, the Round of 16 is RoundNo 1, Quarter-Finals is 2, Semi-Finals is 3, Finals is 4, the third place consolation match is round 5, and the super fight is round 6. We can see in round 4, the finals matches had the longest match time at 20 and 30 minute matches. Round 5, or the consolation match, had a number of matches at the zero minute mark where fighters forfeited the match, some due to injury. Round 1 had the most number of matches altogether, represented by the large number of dots in the chart, many lasting 10 or 15 minutes. A handful of matches in the Quarter-Finals, round 2, lasted less than 5 minutes resulting from a submission. Overall, matches of a shorter length of time becomes less frequent as the rounds move forward and the level of difficulty increases.
In the women's -60kg weight class, most matches were won by score or submission, in contrast to the women's +60kg division where most were won by score or referee's decision. The men's -88kg was also predominantly won by score or submission. The men's -66kg was largely won by scoring. The Absolute and men's -88kg had the most submission wins, whereas the men's +99kg was evenly distributed between submissions, scoring, referee's decision, and forfeit.
I segmented the male fighters by those who made it to the semi-finals in all weight class and the quarter-finals in the absolute. All of their match data was included, wins and losses. A forfeit received a score of 0, referee decision for a win received 25 and loss -25, score win received 75 and loss -75, and a submission win received 100 and loss by submission -100.
Finally, the last task I performed to explore the data was to create two heat maps. The first heat map in green represents numerical characteristics using a Pearson's R correlation coefficient. The second heat map in pink uses a Cramer's V correlation coefficient to compare categories only. The correlation coefficient tests the statistical association between each data point. (Yes, I really nerded out here on the data!). I'm not going to go over every relationship, but I will help you read the heat map. On the green heat map, if the number in the box is 0, then there is no relationship. If it has a negative number, like -0.3, it indicates a slight negative relationship. If there is a positive number, like 0.6, it indicates that there is a positive correlation. The scale is -1 to 1. On the pink map, the scale is 0 to 1, so 0 is no relationship and 1 is a positive relationship. Overall, not many features have a strong correlation here to the Result (wins and losses).
That concludes my analysis of the dataset. I explored other relationships, but not all of it was necessarily interesting nor seemed to correlate to my core question for examining the data, which is to uncover which factors play a role in predicting the outcome of a match in terms of a win or loss.
Stay tuned for Part 2 where I will run various machine learning models to see which features have the greatest ability to help predict a win or loss, or if the features in the dataset can even predict an outcome at all.
Methodology: The dataset was collected manually (yes, room for human error, but most was cleaned up in the exploratory data analysis phase). Fighter list and match data is attributed to FloGrappling.com and the competitor data is attributed mainly to BJJHeroes.com and the UAEJJF.com websites, among others for lesser known fighters.
Limitations: This is a small dataset and it does not fully represent how matches are won or lost. The dataset does not encompass all aspects going into a tournament and I am only able to examine the things that I can gather data on, like limited match and fighter details. This article should be viewed in the context of using data science on a fun topic like submission grappling and to gain more insights into the sport than what was previously available, although it is not all encompassing. Please excuse any typos and errors.
Research Question: Can machine learning predict who wins and loses a match at ADCC?
This is a two part series, and in this first article I explore the available data for relationships or associations between the different characteristics. In data science, this is where I clean the dataset and explore the statistics. This stage helps me to understand what the data says about the topic and uncovers characteristics that are related.So, what data do we have to look at? I have collected match data, including fighter name, gender, weight class, match round, points scored, time of match, day, match number, how the match was won, win/loss result, and the medal earned.
What is the basic layout of the event?
Gordan Ryan and Marcus Buchecha each fought 8 matches, 4 in their weight class and 4 in the absolute. Most fighters had 4 matches, either making it to the finals of their division, fighting for third place, or having a mix of weight and absolute matches.
Of the available match data, we can see that there were far more male competitors than females, as the rounds moved forward half of the fighters were eliminated, most matches ended in no score with zero points, and more matches took place on Saturday. It will be more interesting to see if any of these features have a relationship to wins and losses, thus helping us predict the outcome of a match. If you followed the event, there are no surprises here.
Next: Add Additional Competitor Data
To make the dataset more robust, I collected data on each fighter at ADCC 2019 including the team/gym, age, belt rank, time as a black belt, and previous ADCC experience. I added the number of ADCC events at which each fighter competed, the years they competed, total medals won at all ADCC events, medal place, the total number of ADCC trials events, the years they competed at each trials, total medals won at all trials events, and trials medal place.Next, let's take a look at which teams had the most fighters and brought home medals at this year's event.
Alliance competed the most most matches at ADCC 2019, followed by Atos, Renzo Gracie, and Checkmat. The best performing teams included Alliance, which came away with 2 gold medals, 2 silvers, and a bronze. Atos brought home 3 gold medals and 2 bronze (one by forfeit and could be 4 golds if you include Bianca Basilio who normally represents Atos at IBJJF tournaments). Renzo Gracie won 2 gold, 1 silver, and 1 bronze - including the most coveted Absolute gold medal by Gordan Ryan.
If you exclude female fighters and just look at the males, also excluding the Super Fight, Renzo Gracie won 2 gold, 1 silver, and 1 bronze. Atos won 2 gold and 2 bronze and Alliance won 1 gold and 2 silver.
Fighter age and experience had a positive correlation.
Most fighters fell between the ages of 25 and 30. The tournament saw two fighters under the age of 20 years old and the oldest fighter was 39 years. For years experience as a black belt, there were a handful of fighters that were blue, purple, and brown belts, as well as the belt rank was missing for several fighters that were primarily MMA fighters or wrestlers. However, fighters predominantly fell between 2.5 and 5 years experience at the black belt level. As it relates to our data science question, I found a positive relationship between age and experience in terms of matches resulting in a win.
On that note, I looked at the length of matches. Most matches were either 10 or 15 minutes resulting in a win by score or referee's decision. When I compared the relationship between the length of a match and the fighter's age, I found that there was a slight negative correlation. As the age of the fighter went down, the length of the match increased.
Previous ADCC experience was very interesting to analyze in relation to matches ending in wins or losses.
In fact, here is the win / loss ratio for the 2017 returning champions compared to the trials winners and invitees. Invitees out-perform the trials winners and returning champions performed the best.
Finally: Assign Numbers to Categorical Data
In the last section of exploring the data, several characteristics were categories that I assigned a numeric value solely for the purpose of visualizing the data.
The first thing that drew my attention was the length of a match and the round number.
To understand the chart, the Round of 16 is RoundNo 1, Quarter-Finals is 2, Semi-Finals is 3, Finals is 4, the third place consolation match is round 5, and the super fight is round 6. We can see in round 4, the finals matches had the longest match time at 20 and 30 minute matches. Round 5, or the consolation match, had a number of matches at the zero minute mark where fighters forfeited the match, some due to injury. Round 1 had the most number of matches altogether, represented by the large number of dots in the chart, many lasting 10 or 15 minutes. A handful of matches in the Quarter-Finals, round 2, lasted less than 5 minutes resulting from a submission. Overall, matches of a shorter length of time becomes less frequent as the rounds move forward and the level of difficulty increases.
I segmented the male fighters by those who made it to the semi-finals in all weight class and the quarter-finals in the absolute. All of their match data was included, wins and losses. A forfeit received a score of 0, referee decision for a win received 25 and loss -25, score win received 75 and loss -75, and a submission win received 100 and loss by submission -100.
Key Highlights:
- Most of Gordan Ryan's matches ended in submissions with a score of 100 and most of JT Torres' ended by scoring for a match score of 75. These two were the most consistent in their wins, most notably for Ryan as he fought and won both his division and the absolute.
- Young Tye Routolo was never submitted throughout the tournament but also did not get any submissions.
- Most of Lachlan Giles' matches ended in submissions, with very little else in between.
- All of Paulo Miyao's matches were won by points and he lost only one by ref decision.
- Most other male fighters had matches end all over the spectrum.
- Both Ffion Davies and Beatriz Mesquita had submissions and were themselves submitted.
- Elvira Karppinen had no submissions but was also not submitted throughout the day.
- Bianca Basilio's performance was the most consistent with all of her matches ending in submission and one by scoring points.
Finally, the last task I performed to explore the data was to create two heat maps. The first heat map in green represents numerical characteristics using a Pearson's R correlation coefficient. The second heat map in pink uses a Cramer's V correlation coefficient to compare categories only. The correlation coefficient tests the statistical association between each data point. (Yes, I really nerded out here on the data!). I'm not going to go over every relationship, but I will help you read the heat map. On the green heat map, if the number in the box is 0, then there is no relationship. If it has a negative number, like -0.3, it indicates a slight negative relationship. If there is a positive number, like 0.6, it indicates that there is a positive correlation. The scale is -1 to 1. On the pink map, the scale is 0 to 1, so 0 is no relationship and 1 is a positive relationship. Overall, not many features have a strong correlation here to the Result (wins and losses).
That concludes my analysis of the dataset. I explored other relationships, but not all of it was necessarily interesting nor seemed to correlate to my core question for examining the data, which is to uncover which factors play a role in predicting the outcome of a match in terms of a win or loss.
Stay tuned for Part 2 where I will run various machine learning models to see which features have the greatest ability to help predict a win or loss, or if the features in the dataset can even predict an outcome at all.
Methodology: The dataset was collected manually (yes, room for human error, but most was cleaned up in the exploratory data analysis phase). Fighter list and match data is attributed to FloGrappling.com and the competitor data is attributed mainly to BJJHeroes.com and the UAEJJF.com websites, among others for lesser known fighters.
Limitations: This is a small dataset and it does not fully represent how matches are won or lost. The dataset does not encompass all aspects going into a tournament and I am only able to examine the things that I can gather data on, like limited match and fighter details. This article should be viewed in the context of using data science on a fun topic like submission grappling and to gain more insights into the sport than what was previously available, although it is not all encompassing. Please excuse any typos and errors.