Defining the toolkit
Consider that Splunk offers a three-tier architecture for machine learning defined as:
- Tier 1: Core platform searching features
- Tier 2: Packaged solutions and apps offered on Splunkbase
- Tier 3: Using the Splunk Machine Learning Toolkit
Since tier 1 and tier 2 should be self-explanatory to you at this point, let's have a closer look at tier 3.
To define the Machine Learning Toolkit, we will start with a typical machine learning project so as to understand what type of work will be carried out by most data scientists. These work efforts are:
- Collect (data)
- Clean and transform (data)
- Explore and visualize (data)
- Model (data)
- Evaluate (the results of the model)
- Deployment (once the predictions are made, how can they be put to use?)
Time well spent
Of the preceding listed tasks, popular opinion from the field (of data scientists) states that up to 60% of the time is spent performing the cleaning and transforming of data, while almost 20% (of the time) is spent on data collection.
The Splunk...