The Collection Process
Once the intelligence requirements have been defined, we can proceed with collecting the raw data we need to fulfill them. For this process we can consult two types of sources: internal (like network and endpoints) and external sources (like blogs, threat intelligence feeds, threat reports, public databases, forums, etc).
In order to carry on the collection process the most effective way possible it is important to use a Collection Management Framework (CMF). The use of a CMF allows you to identify data sources and easily track the type of information you are gathering for each one of them. It can also be of use to rate the data obtained from the source, how long is that same data stored, and to track how trustworthy and complete the source is. It is advisable to use the CMF to track not only the external sources, but also the internal ones. Here’s an example of what one would look like:

Dragos analysts Lee, Miller and Stacey wrote an interesting paper (https://dragos.com/wp-content/uploads/CMF_For_ICS.pdf?hsCtaTracking=1b2b0c29-2196-4ebd-a68c-5099dea41ff6|27c19e1c-0374-490d-92f9-b9dcf071f9b5) about CMF exploring different methodologies and examples. Another great resource available that can be used to design an advance collection process is the Collection Management Implementation Framework (https://studylib.net/doc/13115770/collection-management-implementation-framework-what-does-...) designed by the Software Engineering Institute.
Indicators of Compromise
So far we talk about finding the Intelligence Requirements and the use of the Collection Management Framework. But what data are we are going to collect?
An Indicator of Compromise (IOC), as it name refers, it’s an artifact observed in a network or in an operating system that, with high confidence, indicates that it has been compromised. This forensic data is used to understand what happened, but if collected properly, they can also be used to prevent or detect ongoing breaches.
Typical IOCs may include: hashes of malicious files, URLs, domains, IPS, paths, file names, Registry Keys and malware files themselves.
It is important to remember that, in order to be really useful, it is necessary to provide context to the IOCs collected. At this respect we can follow the mantra quality over quantity, a huge amount of IOCs not always means better data.
Understanding Malware
Malware, short for malicious software, is not everything, but it can be an incredibly valuable source of information. Before getting into the different types of malware, it is important to understand how usually the malware works for what we need to introduce two concepts: the dropper and the Command and Control (C2 or C2C).
A dropper is a special type of software designed to install the malware piece. We talk about single-staged or two stage dropper, depending on whether or not the malware code is contained in the dropper. When the malicious code is not contained within the dropper, then it would be downloaded to the victims’ device from an external source. Some security researches may call this two stage type of dropper a downloader, while refering as a two stage dropper as the one that requires further steps to put together different pieces of code (like decompressing or executing different pieces of code) to build a final malware already embedded in them.
The Command and Control (C2) is an attacker controlled computer server used to send commands to the malware running in the victim´s systems. It´s the way the malware communicates with its “owner”. There are multiply ways that a C2 can be established and, depending on the malware capabilities, the complexity of the commands and the communication that can be established may vary. For example, threat actors have been seen using cloud-base services, emails, blog comments, GitHub repositories and DNS queries among many others for C2 communication.
There are different types of malware according to their capabilities, and sometimes one malware piece can be classified in more than one type. The following is a list of the most common ones:
- Worm: An autonomous program capable of replicating and propagating itself through the network.
- Trojan: A program that appears to serve for a designated purpose, but also has a hidden malicious capability to bypass security mechanisms abusing the authorizations given to it.
- Rootkit: A set of software tools with administrator privileges, designed to hide the presence of other tools and hide their activities.
- Ransomware: A computer program designed to deny access to a system or its information until a ransom is paid.
- Keylogger: Software or hardware that records keyboard events without the users’ knowledge.
- Adware: Malware that offer the user specific advertising.
- Spyware: Software that is installed in a system without the knowledge of the owner or the user.
- Scareware: Malware that tricks computer users into visiting compromised websites.
- Backdoor: Method by which someone can obtain administrator user access in a computer system, a network or a software application.
- Wiper: Malware that erase the hard drive of the computer it infects.
- Exploit kit: Package used to deliver malware. When a victim visits a compromised website evaluates the vulnerabilities in the victims’ system in order to exploit certain vulnerabilities. We talk about a malware family to reference a group of malicious software with common characteristics and, most likely, the same author. Sometimes a malware family can be directly related with a specific threat actor. Sometimes, a malware (or tool), is shared among different groups.
Using Public Sources for Collection: OSINT
OSINT (Open Source Intelligence) is the collection of public available date. The most common sources that come to mind when talking about OSINT are social media, blogs, news and the dark web. Essentially, any data made publicly available could be used for OSINT purposes.
Important note
There are many great resources for someone looking to start collecting information: VirusTotal (https://www.virustotal.com/), CCSS Forum (https://www.ccssforum.org/) and URLHaus (https://urlhaus.abuse.ch/) are probably two great places to get started with the collection process.
Also, take a look at OSINTCurio.us (https://osintcurio.us/) to learn more about OSINT resources and techniques.
Honeypots
A honeypot is a decoy system that imitates possible targets of attacks. A honeypot can be set up to detect, deflect or counteract an attacker. All traffic received is considered malicious and an every interaction with the honeypot can be used to study the attacker techniques.
There are many types of honeypots (an interesting list can be found here:https://hack2interesting.com/honeypots-lets-collect-it-all/), but they are mostly divided in three categories: low interaction, medium interaction and high interaction.
Low interaction honeypots simulate the transport layer and provide a very limited access to the operating system. Medium interaction honeypots simulate the application layer in order to lure the attacker into sending the payload. Finally, high interaction honeypots usually involve real operating systems and applications. These ones are better to uncover the abuse of unknown vulnerabilities.
Malware Analysis and Sandboxing
Malware analysis is the study of the malicious software functionality. Typically we can distinguish two types of malware analysis: dynamic and static.
Static malware analysis refers to the code analysis of the software, performed without executing it. This process is often referred as reverse engineering or reversing and it is done by using a disassembler like IDA or the most recent NSA tool, Ghidra, among others.
Dynamic malware analysis is performed by observing the behavior of the malware piece after executing it. This type of analysis is usually performed in a controlled environment to avoid infecting production systems.
A Sandbox is an isolated and controlled environment used in malware analysis to dynamically analyze malware pieces automatically. In a sandbox, the suspected malware piece is executed and its behavior recorded.
Of course things are not always so simple, and malware developers implement techniques to prevent the malware from being sandboxed. At the same time, security researchers develop their own techniques to bypass the threat actors´ anti‑sandbox techniques. Despite this chase of the cat and the mouse, sandboxing systems are still a crucial part of the malware analysis process.
Tip
There are some great online sandboxing solutions like Any Run (any.run) and Hybrid Analysis (https://www.hybrid-analysis.com/). Also Cuckoo Sandbox (https://cuckoosandbox.org/) is an open source and offline sandboxing system for Windows, Linux, MacOs and Android.