Chapter 3 – Malware Detection with API Calls and PE Headers
- Load the dataset using the pandas python library, and this time, add the low_memory=False parameter. Search for what that parameter does.
df = pd.read_csv(file_name, low_memory=False)
- Prepare the data that will be used for training.
original_headers = list(df.columns.values) total_data = df[original_headers[:-1]] total_data = total_data.as_matrix() target_strings = df[original_headers[-1]]
- Split the data with the
test_size=0.33
parameter.
train, test, target_train, target_test = train_test_split(total_data, target_strings, test_size=0.33, random_state=int(time.time()))
- Create a set of classifiers that contains
DecisionTreeClassifier()
,RandomForestClassifier(n_estimators=100)
, andAdaBoostClassifier()
:
classifiers = [ RandomForestClassifier(n_estimators=100), DecisionTreeClassifier(), AdaBoostClassifier()]
- What is an
AdaBoostClassifier()
?
An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset.
- Train the model using the three classifiers and print out the metrics of every classifier.
Check the Chapter3-Practice
folder for the solution: https://github.com/PacktPublishing/Mastering-Machine-Learning-for-Penetration-Testing/tree/master/Chapter%203/Chapter3-Practice.