Translate

Saturday, November 21, 2020

IPL and Overfitting

After a long wait, Indian Premier League was started and completed for the thirteenth time with Mumbai Indian declared as the undisputed winner. 


However, this post does not do another cricket analysis which I am not an expert anyway. 

We discussed about overfitting by using USA and Sri Lankan President elections and this post is to understand overfitting from a different perspective. Overfitting is your machine learning or predicting model is too accurate. You might think that how come high accuracy is a problem as we always try to increase the accuracy of reciting models. Overfitting can occur due to two reasons. 

  1. Too little data - When you have too little data, it is very likely that the model has high accuracy but it can go wrong with the next data point. Further, with many combinations, you will be making your data less. 
  2. When unnecessary data is collected - We can predict using uncorrelated data, For example, No US president is left-handed who is divorced or No Sri Lankan President is re-elected who has moustache are classical examples for over fittings.  

Now let us look at Overfitting criterias in IPL for the champions over the last thirteen years. 

Year200820092010
ConclusionNo team has won an IPL TitleNo 4th placed team in the preliminary round has won the IPL titleNo team who is lead by an Indian player has won the Championship.
Man of the series never loos the final
EventUntil Rajasthan Royals won the ChampionshipUntil Decan Charges won the Championship who were ranked 4th in the preliminary round.Until Dhoni's Chennai Super Kings won the Championship. Previously this was won by the teams captained by Shane Warne and Adam Gilchrist.
In previous editions, Shane Watson (RR) and Adam Glischist (DC) were the Man of the series who were from the winning team. in 2010, Sachin Tendulkar was the man of series who is not part of Chennai Super Kings.
ReasonIf predicting with less data is bad, what about predicting with no data.Predicting with less data is bad.Tough the captain plays a huge role in winning, it does not make any sense to him become an Indian or Foreigner.
Year201120122013
ConclusionNo team has won the championship in continuously4th ranked team in the preliminary round was never beaten in the finalChennai Super Kings were never beaten twice in finals.
MI never beaten CSK in a final
EventChennai Super Kings won the championship in 2010 and 2011 became the first team to won the championship being the defending champion.Kolkata Knight Riders won the championship by beating the Chennai Super Kings who were 4th in the Preliminary round.in 2012 and 2013 CSK were beaten in the finals by the KKR and MI respectively.
MI and CSK has met in 2010 before in the final where CSK became the winner.
ReasonWith four data points, what you can predict is very limited. Even though historically, your accuracy is 100%.Again this prediction is correct till 2012, only one time 4th ranked made it final of the IPL championship.When you are predicting with combinations of events, it is obvious that your accuracy will be very high as there are the only handful of events.
Year201420152016
ConclusionMost runs player team has never won the IPL titleChennai Super Kings were never beaten twice in finals.
MI was never beaten CSK in a final
The third-ranked team in the preliminary stage has never beaten the second-ranked team in the final.
EventRobin Uthappa of KKR was the highest scorer in the tournament and was a member of eventual winners KKR. In three of the previous occasions, highest scorer represented runner up team but no the champions.in 2012 and 2013 CSK were beaten in the finals by the KKR and MI respectively.
MI and CSK has met in 2010 before in the final where CSK became the winner.
Sunrisers Hyderabad beat Royal challengers Banglore int he final which is the first time third-ranked team beat the second ranked team in the final. in 2010 CSK who was the third-ranked team who became the champions by beating the ranked one team.
ReasonAgain only six events before thus very fewer data do not make very good in predictions.When you are predicting with combinations of events, it is obvious that your accuracy will be very high as there are the only handful of events.Two many combinations in fewer data should not be used for predictions.
Year201720182019
ConclusionEvery MI win resulted in Highest Wicket taker is an Indian bowler. Number one ranked team has never beaten the second team in the final.CSK has never won by ChasingMI has never beaten CSK when MI was ranked 1
EventIn 2013 and 2015 MI has won the championship. in both those years, highest wickets take was Bravo from West Indies. In 2017 Bhuwaneswar Kumar was the highest wicket-taker who is an Indian.in 2010 and 2011 CSK won but by defending. This is the first time that they have won the championship by chasing.MI became champions in 2017 when they were ranked first in the preliminary stage. However, when they beat CSK in 2013 they were the second-ranked team. in 2019 they were ranked first who was able to defeat ranked two CSK.
ReasonWell, this is a combination of data and uncorrelated data.Again, by combining the team and method, you are reducing the data.Again, by combining the multiple teams and ranking, you are reducing the data.
Year20202021    2022
ConclusionMI has never won in even year.
MI hs never won chasing.
The highest number of Six hitter was never in the championship-winning team.
Fair Play team never has won the Title.
EventMI has won the championship in 2013, 2015, 2017 and 2019 and this is the first time that they have won while chasing.
Ishan Kishan (MI) who was the highest six-hitter and became the first time that six-hitter was part of the tournament champs.
Fair Play award initiated in 2012 and this is the first time that Fair Play ward winning has won the championship.
ReasonThough these are somewhat valid prediction, due to fewer data points accuracy is 100%.                                            

So, it is important to collect adequate data for the prediction but even if it is with large data when you are making predication for combinations of your attributes,  you are making your data fewer. Further, though you have data, it does not always make correlated predictions such as Fair Play Winner and Foreign Captain etc, 

No comments:

Post a Comment