Gathering the data
Apart from legal aspect (see the last section of this chapter), there is no real limit on the kind of content you can store in the datasets: tabular data, images, text—if you fit within the size requirements, you can store it. This includes data harvested from other sources: tweets by hashtag or topic are among the popular datasets at the time of writing:

Discussion of the different frameworks for harvesting data from social media (Twitter, Reddit etc) is outside the scope of this book.