Chapter 7 – When and How to Use Artificial Intelligence
The questions will focus on the hyperparameters.
1. The number of k clusters is not that important. (Yes | No)
The answer is no. The number of clusters requires careful selection, possibly a trial-and-error approach. Each project will lead to different clusters.
2. Mini-batches and batches contain the same amount of data. (Yes | No)
The answer is no. "Batch" generally refers to the dataset and "mini-batch" represents a "subset" of data.
3. K-means can run without mini-batches. (Yes | No)
The answer is yes and no. If the volume of data remains small, then the training epochs can run on the whole dataset. If the data volume exceeds a reasonable amount of computer power (CPU or GPU), mini-batches must be created to optimize training computation.
4. Must centroids be optimized for result acceptance? (Yes | No)
The answer is yes and no. Suppose you want to put a key in a keyhole. The keyhole represents the centroid of your visual cluster. You must be precise. If you are simply throwing a piece of paper in your garbage can, you do not need to aim at the perfect center (centroid) of the cluster (marked by the rim of the garbage can) to attain that goal. Centroid precision depends on what is asked of the algorithm.
5. It does not take long to optimize hyperparameters. (Yes | No)
The answer is yes and no. If it's a simple project, it will not take long. If you are facing a large dataset, it will take some time to find the optimal hyperparameters.
6. Can it sometimes take weeks to train a large dataset? (Yes | No)
The answer is yes. Media hype and hard work are two different worlds. Machine learning and deep learning are are still tough projects to implement.
7. AWS SageMaker only offers a k-means algorithm. (Yes | No)
The answer is no. AWS Sagemaker offers a complete range of algorithms in the job creation interface. You can also use your algorithms choosing Custom
.

AWS Sagemaker algorithms