Implementing a caption generation model
First, let's read the dataset and transform it the way we need. Import the os
library and declare the directory in which the dataset is present, as shown in the following code:
import os annotation_dir = 'Flickr8k_text'
Next, define a function to open a file and return the lines present in the file as a list:
def read_file(file_name): with open(os.path.join(annotation_dir, file_name), 'rb') as file_handle: file_lines = file_handle.read().splitlines() return file_lines
Read the image paths of the training and testing datasets followed by the captions file:
train_image_paths = read_file('Flickr_8k.trainImages.txt') test_image_paths = read_file('Flickr_8k.testImages.txt') captions = read_file('Flickr8k.token.txt') print(len(train_image_paths)) print(len(test_image_paths)) print(len(captions))
This should print the following:
6000 1000 40460
Next, the image-to-caption map has to be generated. This will help in training for easily looking up captions...