Useful datasets
One of the best data sources is the UCI Machine Learning Repository. When we go to the web page at https://archive.ics.uci.edu/ml/datasets.html, we see the following list:

For example, if we click the first dataset (Abalone
), we see the following. To save space, only the top part is shown:

From the web page, users can download the dataset and find definitions of variables and even citations. The code that follows can be used to download a related R dataset:
dataSet<-"UCIdatasets" path<-"http://canisius.edu/~yany/RData/" con<-paste(path,dataSet,".RData",sep='') load(url(con)) dim(.UCIdatasets) head(.UCIdatasets)
The related output is shown here:

From the preceding output, we know that the dataset has 427
observations (dataset). For each dataset, we have 7
related features, such as Name
, Data_Types
, Default_Task
, Attribute_Types
, N_Instances
(number of instances), N_Attributes
(number of attributes), and Year
. The variable called Default_Task
could be interpreted...