Data is everywhere, but?: IPL and Overfitting

Saturday, November 21, 2020

IPL and Overfitting

After a long wait, Indian Premier League was started and completed for the thirteenth time with Mumbai Indian declared as the undisputed winner.

However, this post does not do another cricket analysis which I am not an expert anyway.

We discussed about overfitting by using USA and Sri Lankan President elections and this post is to understand overfitting from a different perspective. Overfitting is your machine learning or predicting model is too accurate. You might think that how come high accuracy is a problem as we always try to increase the accuracy of reciting models. Overfitting can occur due to two reasons.

Too little data - When you have too little data, it is very likely that the model has high accuracy but it can go wrong with the next data point. Further, with many combinations, you will be making your data less.
When unnecessary data is collected - We can predict using uncorrelated data, For example, No US president is left-handed who is divorced or No Sri Lankan President is re-elected who has moustache are classical examples for over fittings.

Now let us look at Overfitting criterias in IPL for the champions over the last thirteen years.

Year	2008	2009	2010
Conclusion	No team has won an IPL Title	No 4th placed team in the preliminary round has won the IPL title	No team who is lead by an Indian player has won the Championship. Man of the series never loos the final
Event	Until Rajasthan Royals won the Championship	Until Decan Charges won the Championship who were ranked 4th in the preliminary round.	Until Dhoni's Chennai Super Kings won the Championship. Previously this was won by the teams captained by Shane Warne and Adam Gilchrist. In previous editions, Shane Watson (RR) and Adam Glischist (DC) were the Man of the series who were from the winning team. in 2010, Sachin Tendulkar was the man of series who is not part of Chennai Super Kings.
Reason	If predicting with less data is bad, what about predicting with no data.	Predicting with less data is bad.	Tough the captain plays a huge role in winning, it does not make any sense to him become an Indian or Foreigner.

Year	2011	2012	2013
Conclusion	No team has won the championship in continuously	4th ranked team in the preliminary round was never beaten in the final	Chennai Super Kings were never beaten twice in finals. MI never beaten CSK in a final
Event	Chennai Super Kings won the championship in 2010 and 2011 became the first team to won the championship being the defending champion.	Kolkata Knight Riders won the championship by beating the Chennai Super Kings who were 4th in the Preliminary round.	in 2012 and 2013 CSK were beaten in the finals by the KKR and MI respectively. MI and CSK has met in 2010 before in the final where CSK became the winner.
Reason	With four data points, what you can predict is very limited. Even though historically, your accuracy is 100%.	Again this prediction is correct till 2012, only one time 4th ranked made it final of the IPL championship.	When you are predicting with combinations of events, it is obvious that your accuracy will be very high as there are the only handful of events.

Year	2014	2015	2016
Conclusion	Most runs player team has never won the IPL title	Chennai Super Kings were never beaten twice in finals. MI was never beaten CSK in a final	The third-ranked team in the preliminary stage has never beaten the second-ranked team in the final.
Event	Robin Uthappa of KKR was the highest scorer in the tournament and was a member of eventual winners KKR. In three of the previous occasions, highest scorer represented runner up team but no the champions.	in 2012 and 2013 CSK were beaten in the finals by the KKR and MI respectively. MI and CSK has met in 2010 before in the final where CSK became the winner.	Sunrisers Hyderabad beat Royal challengers Banglore int he final which is the first time third-ranked team beat the second ranked team in the final. in 2010 CSK who was the third-ranked team who became the champions by beating the ranked one team.
Reason	Again only six events before thus very fewer data do not make very good in predictions.	When you are predicting with combinations of events, it is obvious that your accuracy will be very high as there are the only handful of events.	Two many combinations in fewer data should not be used for predictions.

Year	2017	2018	2019
Conclusion	Every MI win resulted in Highest Wicket taker is an Indian bowler. Number one ranked team has never beaten the second team in the final.	CSK has never won by Chasing	MI has never beaten CSK when MI was ranked 1
Event	In 2013 and 2015 MI has won the championship. in both those years, highest wickets take was Bravo from West Indies. In 2017 Bhuwaneswar Kumar was the highest wicket-taker who is an Indian.	in 2010 and 2011 CSK won but by defending. This is the first time that they have won the championship by chasing.	MI became champions in 2017 when they were ranked first in the preliminary stage. However, when they beat CSK in 2013 they were the second-ranked team. in 2019 they were ranked first who was able to defeat ranked two CSK.
Reason	Well, this is a combination of data and uncorrelated data.	Again, by combining the team and method, you are reducing the data.	Again, by combining the multiple teams and ranking, you are reducing the data.

Year	2020	2021	2022
Conclusion	MI has never won in even year. MI hs never won chasing. The highest number of Six hitter was never in the championship-winning team. Fair Play team never has won the Title.
Event	MI has won the championship in 2013, 2015, 2017 and 2019 and this is the first time that they have won while chasing. Ishan Kishan (MI) who was the highest six-hitter and became the first time that six-hitter was part of the tournament champs. Fair Play award initiated in 2012 and this is the first time that Fair Play ward winning has won the championship.
Reason	Though these are somewhat valid prediction, due to fewer data points accuracy is 100%.

So, it is important to collect adequate data for the prediction but even if it is with large data when you are making predication for combinations of your attributes, you are making your data fewer. Further, though you have data, it does not always make correlated predictions such as Fair Play Winner and Foreign Captain etc,

Data is everywhere, but?

Translate

Saturday, November 21, 2020

IPL and Overfitting

No comments:

Post a Comment