Building loss functions
As discussed in the background subsection, the problem with neural style transfer revolves around loss functions of content and style. In this subsection, we will discuss and define the required loss functions.
Content loss
In any CNN-based model, activations from top layers contain more global and abstract information (for example, high-level structures like a face), and bottom layers will contain local information (for example, low-level structures like eyes, nose, edges, and corners) about the image. We would want to leverage the top layers of a CNN for capturing the right representations for the content of an image. Hence, for the content loss, considering we will be using the pretrained VGG-16 model, we can define our loss function as the L2 norm (scaled and squared Euclidean distance) between the activations of a top layer (giving feature representations) computed over the target image, and the activations of the same layer computed over the generated image. Assuming...