Machine learning malware detection using PE headers
To train our machine learning models to find malware datasets, there are a lot of publicly available sources for data scientists and malware analysts. For example, the following websites give security researchers and machine learning enthusiasts the ability to download many different malware samples:
- Malware-Traffic-Analysis: https://www.malware-traffic-analysis.net/
- Kaggle Malware Families: https://www.kaggle.com/c/malware-classification
- VX Heaven: http://83.133.184.251/virensimulation.org/index.html
- VirusTotal: https://www.virustotal.com
- VirusShare: https://virusshare.com
To work with PE files, I highly recommend using an amazing Python library called pefile
. pefile
gives you the ability to inspect headers, analyze sections, and retrieve data, in addition to other capabilities, like packer detection and PEiD signature generation. You can check out the GitHub project at https://github.com/erocarrera/pefile.
You can also install it with PIP,...