Click-through prediction with decision tree
After several examples, it is now time to predict ad click-through with the decision tree algorithm we just thoroughly learned and practiced. We will use the dataset from a Kaggle machine learning competition Click-Through Rate Prediction (https://www.kaggle.com/c/avazu-ctr-prediction).
For now, we only take the first 100,000 samples from the train file (unzipped from the train.gz file from https://www.kaggle.com/c/avazu-ctr-prediction/data) for training the decision tree and the first 100,000 samples from the test file (unzipped from the test.gz file from the same page) for prediction purposes.
The data fields are described as follows:
id: ad identifier, such as1000009418151094273,10000169349117863715click: 0 for non-click, 1 for clickhour: in the format of YYMMDDHH, for example,14102100C1: anonymized categorical variable, such as1005,1002banner_pos: where a banner is located,1and0site_id: site identifier, such as1fbe01fe,fe8cc448,d6137915...