Click-through prediction with decision tree
After several examples, it is now time to predict ad click-through with the decision tree algorithm we just thoroughly learned and practiced. We will use the dataset from a Kaggle machine learning competition Click-Through Rate Prediction (https://www.kaggle.com/c/avazu-ctr-prediction).
For now, we only take the first 100,000 samples from the train file (unzipped from the train.gz
file from https://www.kaggle.com/c/avazu-ctr-prediction/data) for training the decision tree and the first 100,000 samples from the test file (unzipped from the test.gz
file from the same page) for prediction purposes.
The data fields are described as follows:
id
: ad identifier, such as1000009418151094273
,10000169349117863715
click
: 0 for non-click, 1 for clickhour
: in the format of YYMMDDHH, for example,14102100
C1
: anonymized categorical variable, such as1005
,1002
banner_pos
: where a banner is located,1
and0
site_id
: site identifier, such as1fbe01fe
,fe8cc448
,d6137915...