Computational resources
Some competitions do pose limitations in order to render available to production feasible solutions, for instance the Bosh Production Line Performance competition - https://www.kaggle.com/c/bosch-production-line-performance - had strict limits on execution time, model file output and memory limit for your solution. Also Kernel based competitions, when requiring both training and inference to be executed on Kernels, do not pose a problem for the resources you have to use because Kaggle will provide with all the resources you need (and this is also intended as a way to put all participants on the same line for a better competition result).
Problems for you arise when you have kernel competitions just limited to inference time and therefore you can train your models on your own machine and the only limit is then based at test time on the number and complexity of models you produce. Since at moment most competitions require deep learning solutions, you have to consider that you surely need specialized hardware such as GPUs in order to achieve some interesting result in a competition. Anyway, also if you participate in some of the now rare tabular competitions, you’ll soon realize that you need a strong machine with quite a number of processors and memory in order to easily apply feature engineering to data, run experiments and build models quickly.
Standards do change rapidly, therefore it is difficult to mention a standard hardware that you should have in order to compete at least on the same league with others. We can anyway take a hint at such standard by looking at what other competitors are using, as their own machine or as a machine on the cloud.
For instance, recently HP has launched a program where it awarded a HP Z4 or Z8 to a few selected Kaggle participants in exchange with visibility for its brand. For instance, a Z8 machine has 56 cores, 3TB of memory, 48TB of storage (a good share by solid storage hard drives) and a NVIDIA RTX as GPU. We understand that such could be a bit out of reach for many as well as even also renting a similar machine for a short time on a cloud instance such as Google’s GCP or Amazon’s AWS is out of discussion for the consequent expenses for even a moderate usage.
Our suggestion, unless your ambition is to climb to the top rankings of Kaggle participants is therefore to go with the machines provided free by Kaggle, the Kaggle Notebooks (also previously known as the Kaggle Kernels).
Kaggle Notebooks are a versioned computational environment, based on Docker containers running in cloud machines, which allow you to write and execute both scripts and notebooks in R and Python languages. The Kaggle Notebooks are integrated into the Kaggle environment (you can make submissions from them and keep track what submission refers to what Notebook), they come with most data science packages pre-installed, and they allow some customization (you can download files and install further packages). The basic Kaggle Notebook is just CPU based, but you can have versions boosted by a NVIDIA Tesla P100 or a TPU v3-8 (TPUs are hardware accelerators specialized in deep learning tasks). Though bounded by a usage number and time quota limit, Kaggle Notebooks provide the computational workhorse to build your baseline solutions on Kaggle competitions:
- A CPU Notebook owns 4 CPU cores and 16 GB of memory, you can run 10 Notebooks of this kind at a time but you don’t have any time quote for them
- A GPU features 2 CPU cores and 13 GB of memory, you can run 2 Notebooks of this kind at a time and you have a 30 hours weekly quota for such kind of Notebook
- A TPU features 4 CPU cores and 16 GB of memory, you can run 2 Notebooks of this kind at a time you have a 30 hours weekly quota for such kind of Notebook
All Notebooks can run for 9 hours maximum, and have 20 GB disk saving allowance to store your models and results plus an additional scratchpad disk that can exceed 20 GBs for temporary usage during script running.
In certain cases, the GPU enhanced machine provided by Kaggle kernels may not be enough. For instance, the recent Deepfake Detection Challenge (https://www.kaggle.com/c/deepfake-detection-challenge) required to process data consisting of about around 500 GB of videos. That is especially because of the time limit of weekly usage, that at the time of this writing is about 30 hours a week and because of the fact that you cannot have more than two machines with GPU running at the same time (10 machines at a time is the limit for the CPU only instances). Even if you can double your machine time by changing your code to leverage the usage of TPUs instead of GPUs (and you can find some guidance for achieving that easily here: https://www.kaggle.com/docs/tpu), that may still prove not enough for fast experimentation on a data heavy competitions such as the Deepfake Detection Challenge. That’s the reason in the chapter devoted to Kaggle Kernel we are going to provide you with many tips and tricks for successful coping with such limitations with decent results without having to buy a heavy performing machine. We are also going to show you how to integrate Kaggle Kernels with Google Cloud Services (GCP) or simply how to move away all your work on another cloud based solution.