Criticism and opportunities
Kaggle has drawn quite a few criticisms since its start and if participating to data science competitions is still quite debated today by many claiming different negative or positive opinions.
On the side of negative criticism:
- Kaggle provides a false perception of what machine learning really is since it is just focused on leaderboard dynamics
- Kaggle is just a game of hyper-parameter optimization and ensembling many models just for scraping a little more accuracy (while in reality overfitting the test set)
- Kaggle is filled with in-experienced enthusiasts who are ready to try anything under the sky in order to get a score and a spotlight in the hope to be spotted by recruiters
- As a further consequence, competition solutions are too complicated and often too specific of a test set to be implemented
Many perceive Kaggle, as many other data science competition platform, far from what data science is in reality. The point they raise are that business problems are not given from nowhere and you seldom already have a well-prepared dataset to start with since you usually built it along the way based on refining the business specifications and understanding of the problem at hand. Moreover, they emphasize that production is neither considered, since a winning solution cannot be constrained by resource limits or considerations about technical debt (though this is not always true for all competitions).
We cannot but not notice how all such criticism is related in the end about both the fact that Kaggle is a crowdsourcing experience with a purpose (the CTF paradigm) and how Kaggle ranking standings do relate in the data science world in comparison with data science education and work experience. One persistent myth that ailments criticism is in fact that Kaggle competitions may help getting you a job or a better job in data science or that performing in Kaggle competitions may put you on another plane in respect of data scientists that do not participate at all.
Our stance on such a myth is that it is misleading belief that Kaggle rankings do have an automatic value beyond the Kaggle community. For instance, in a job search, Kaggle can provide you with some very useful competencies on modeling data and problems and effective model testing. It can also expose you to many techniques and different data/business problems (even beyond your actual experience and comfort zone), but it cannot supplement you with everything you need to successfully place yourself as a data scientist in a company.
You can use Kaggle for learning and for differentiating yourself from other candidates in a job search; however, how this will be considered will considerably vary from company to company. Anyway, what will learn on Kaggle will invariably prove useful throughout all of your career and will provide you a hedge when you’ll have to solve complex and unusual problems with data modeling because by participating in Kaggle competitions you build up strong competencies in modeling and validating. You also network with other data scientists and that can get you a reference for a job more easily and provide you with another way to handle difficult problems beyond your skills because you will have access to others’ competencies and opinions.
Hence, our opinion is that Kaggle can more indirectly help you in your career as a data scientist and that it can do that in different ways. Of course, sometimes Kaggle will help you directly being contacted as a job candidate based on your competitions’ successes, but more often Kaggle will be helpful by providing you with the intellectual and experience skills you need to succeed first as a candidate then as a practitioner. In fact, after playing with data and models on Kaggle for a while, you’ll have had the chance to see enough different datasets problems and ways to deal with them under the pressure of time, that when faced with similar problems in real settings you’ll get quite skilled in finding solutions quickly and effectively.
Actually, this latter opportunity of a skill upgrade is why we got motivated writing this book in the first place and what this book is actually about. In fact, you won’t exactly find here a guide just about how to win or score high on Kaggle competitions (there are also online resources that can enlighten you on that, actually) but you’ll absolutely will find a guide about how to compete better on Kaggle and how to get back the maximum from your competition experience.
Use Kaggle and other competition platforms in a smart way. Kaggle is not a passepartout, being first on a competition won't assure you a highly paid job or glory beyond the Kaggle community. However, consistently participating in competitions is instead a card to be played smartly to show interest and passion in your data science job search and to improve some specific skills that can differentiate you as a data scientist and not make you obsolete in front of autoML solutions.
If you are going to follow us along this book, we will show you how.