NumPy and pandas
When you think about it, NumPy is a fairly low-level array-manipulation library, and the majority of other Python libraries are written on top of it.
One of these libraries is pandas
, which is a high-level data-manipulation library. When you are exploring a dataset, you usually perform operations such as calculating descriptive statistics, grouping by a certain characteristic, and merging. The pandas
library has many friendly functions to perform these various useful operations.
Let's use a diabetes dataset in this example. The diabetes dataset in sklearn.datasets
is standardized with a zero mean and unit L2 norm.
The dataset contains 442 records with 10 features: age, sex, body mass index, average blood pressure, and six blood serum measurements.
The target represents the disease progression after these baseline measures are taken. You can look at the data description at https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html and a related paper at http://web.stanford.edu...