Introduction
Data analysis and statistical processing are very import applications for sophisticated, modern programming languages. The subject area is vast. The Python ecosystem includes a number of add-on packages that provide sophisticated data exploration, analysis, and decision-making features.
We'll look at some basic statistical calculations that we can do with Python's built-in libraries and data structures. We'll look at the question of correlation and how to create a regression model.
We'll also look at questions of randomness and the null hypothesis. It's essential to be sure that there really is a measurable statistical effect in a set of data. We can waste a lot of compute cycles analyzing insignificant noise if we're not careful.
We'll look at a common optimization technique, as well. It helps to produce results quickly. A poorly designed algorithm applied to a very large set of data can be an unproductive waste of time.