Index
A
- A/B test
- about / Experimental
- Adam Optimizer
- about / Tensorflow and neural networks
- addition rule / The addition rule
- alternative hypothesis
- about / Hypothesis tests
- anomaly detection / Basic structure
- ARIMA / Going beyond with this example
- arithmetic mean / Measures of center
- about / Measures of center
- arithmetic symbols
- about / Arithmetic symbols
- summation / Summation
- proportional / Proportional
- dot product / Dot product
B
- back-propagation / Basic structure
- bar charts
- about / Bar charts
- basic Python, example
- about / Example of basic Python
- single Tweet, parsing / Example – parsing a single tweet
- Bayes formula
- about / Proportional
- Bayes theorem
- about / Bayes theorem
- examples / Bayes theorem
- applications / More applications of Bayes theorem
- titanic dataset / Example – Titanic
- medical studies example / Example – medical studies
- bi-modal
- about / Sampling distributions
- bias/variance tradeoff
- extreme cases / Two extreme cases of bias/variance tradeoff
- working, with error functions / How bias/variance play into error functions
- bias variance tradeoff
- about / The bias variance tradeoff
- error, due to bias / Error due to bias
- error, due to variance / Error due to variance
- big data
- about / Some more terminology
- binary classifier / A bit deeper
- binomial random variable
- about / Binomial random variables
- restaurant openings example / Binomial random variables
- blood types example / Binomial random variables
- box plots
- about / Box plots
- creating / Box plots
C
- Cartesian graph / Graphs
- causation
- versus correlation / Correlation versus causation
- central limit theorem
- about / Sampling distributions
- centroid
- about / K-means clustering
- chi-square goodness of fit test
- about / Chi-square goodness of fit test
- asuumptions / Assumptions of the chi-square goodness of fit test
- example / Example of a chi-square test for goodness of fit
- chi-square test for association/independence
- about / Chi-square test for association/independence
- assumptions / Assumptions of the chi-square independence test
- classification
- about / Classification
- classification tree
- fitting / How does a computer fit a classification tree?
- cluster
- about / K-means clustering
- clustering
- about / Unsupervised learning
- coefficient of variation
- about / Definition
- employee salaries example / Example – employee salaries
- collectively exhaustive / Collectively exhaustive events
- collectively exhaustive events
- about / Collectively exhaustive events
- examples / Collectively exhaustive events
- communication
- about / Why does communication matter?
- complementary events / Complementary events
- compound events
- about / Compound events
- example / Compound events
- conditional probability
- about / Conditional probability
- confidence
- about / Confidence intervals
- confidence intervals
- about / Confidence intervals
- confounding factor / Random sampling
- confusion matrix / A bit deeper
- continuous data
- about / Digging deeper
- example / Digging deeper
- continuous random variable
- about / Continuous random variables
- correlation
- versus causation / Correlation versus causation
- correlation coefficients
- about / The insightful part – correlations in data
- cross validation error
- versus training error visualization / Visualizing training error versus cross-validation error
- CSV (comma separated value) / Example – world alcohol consumption data
D
- data
- organized data / Basic terminology
- unorganized data / Basic terminology
- types / Flavors of data, Why look at these distinctions?
- levels / The four levels of data
- data, obtaining
- about / Obtaining data
- observational / Observational
- experimental / Experimental
- data exploration
- about / Explore the data
- basic questions / Basic questions for data exploration
- yelp dataset / Dataset 1 – Yelp
- titanic dataset / Dataset 2 – titanic
- data mining
- about / Some more terminology
- data model
- about / The math
- data points
- about / Illustrative example – data points
- data preprocessing
- example / Example of data preprocessing
- word/phrase counts / Word/phrase counts
- relative length of text / Relative length of text
- topics, picking / Picking out topics
- data sampling
- about / Sampling data
- probability sampling / Probability sampling
- random sampling / Random sampling
- unequal probability sampling / Unequal probability sampling
- data science
- about / What is data science?, Basic terminology, The data science Venn diagram, Introduction to data science
- need for / Why data science?
- Sigma Technologies example / Example – Sigma Technologies
- steps / Overview of the five steps
- interesting question, asking / Ask an interesting question
- data, obtaining / Obtain the data
- data, exploring / Explore the data
- data, modeling / Model the data
- results, communicating / Communicate and visualize the results
- results, visualizing / Communicate and visualize the results
- data science, case studies
- about / Data science case studies
- government paper pushing automation / Case study – automating government paper pushing, Fire all humans, right?
- marketing dollars / Case study – marketing dollars
- job descriptions / Case study – what's in a job description?
- data science Venn diagram
- about / The data science Venn diagram
- math/statistics / The data science Venn diagram, The math
- computer programming / The data science Venn diagram, Computer programming
- domain knowledge / The data science Venn diagram, Domain knowledge
- decision trees
- about / Decision trees
- versus random forests / Comparing Random forests with decision trees
- Deep Neural Network Classifier (DNNClassifier)
- about / Tensorflow and neural networks
- dimension reduction
- about / Unsupervised learning
- cons / Feature extraction and principal component analysis
- discrete data
- about / Digging deeper
- example / Digging deeper
- discrete random variables
- about / Discrete random variables
- types / Types of discrete random variables
- binomial random variable / Binomial random variables
- geometric random variable / Binomial random variables
- Poisson random variable / Poisson random variable,
- continuous random variable / Continuous random variables
- domain knowledge / Domain knowledge
- Domain Knowledge / The data science Venn diagram
- dot product
- about / Dot product
- dummy variables
- about / Dummy variables
E
- Empirical rule
- about / The Empirical rule
- example / The Empirical rule
- ensembling techniques / Ensembling techniques
- entity movement / Basic structure
- entropy
- about / How does a computer build a regression tree?
- error functions
- about / How bias/variance play into error functions
- Euler's number / Logistic regression
- event
- about / Basic definitions
- exploration tips, for qualitative data
- about / Exploration tips for qualitative data
- nominal level columns / Nominal level columns
- filtering / Filtering in Pandas
- ordinal level columns / Ordinal level columns
- exploratory data analysis (EDA)
- about / Some more terminology
- exponent
- about / Logarithms/exponents
- examples / Logarithms/exponents
- extra-marital affairs case study
- about / Case study 2 – why do some people cheat on their spouses?
- extreme cases, bias/variance tradeoff
- underfitting / Underfitting
- overfitting / Overfitting
F
- false negative
- about / Type I and type II errors
- false negatives / A bit deeper
- false positive
- about / Type I and type II errors
- false positives / A bit deeper
- feature extraction / The Silhouette Coefficient, Feature extraction and principal component analysis
- about / Feature extraction and principal component analysis
- pros / Feature extraction and principal component analysis
- feature selection
- about / Feature extraction and principal component analysis
- filtering
- about / Filtering in Pandas
- Frequentist approach
- about / Frequentist approach
- marketing stats example / Frequentist approach
- law of large numbers / The law of large numbers
G
- geometric random variable
- about / Binomial random variables
- weather example / Binomial random variables
- gini index
- about / How does a computer build a regression tree?
- global score / The Silhouette Coefficient
- graphs
- about / Graphs
- Cartesian graph / Graphs
- scatter plots / Scatter plots
- line graphs / Line graphs
- bar charts / Bar charts
- histograms / Histograms
- box plots / Box plots
- grid searching
- about / Grid searching
H
- histograms
- about / Histograms
- plotting / Histograms
- hypothesis test
- about / Hypothesis tests
- conducting / Conducting a hypothesis test
- one sample t-tests / One sample t-tests
- type II error / Type I and type II errors
- type I error / Type I and type II errors
- hypothesis test, for categorical values
- about / Hypothesis test for categorical variables
- chi-square goodness of fit test / Chi-square goodness of fit test
- chi-square test for association/independence / Chi-square test for association/independence
I
- independent events
- examples / Independence
- intersection / Set theory
- interval level, of data
- about / The interval level
- example / Example
- mathematical operations / Mathematical operations allowed
- measures of center / Measures of center
- measures of variation / Measures of variation
J
- jaccard measure / Set theory
K
- k-fold cross validation
- about / Case study 3 – using tensorflow
- K-means clustering
- about / K-means clustering
- example / Illustrative example – beer!
- K-Nearest Neighbors (KNN) algorithm
- about / How bias/variance play into error functions
- K folds cross validation
- about / K folds cross-validation
- features / K folds cross-validation
- KPI (key performance indicator)
- about / Verbal communication
L
- labeled data / Supervised learning
- levels, data
- nominal / The nominal level
- ordinal / The ordinal level
- interval / The interval level
- ratio / The ratio level
- likelihood
- about / Naïve Bayes classification
- likert scale
- about / Discrete random variables
- linear algebra
- about / Dot product, Linear algebra
- matrix multiplication / Matrix multiplication
- linear regression
- about / Linear regression
- predictors, adding / Adding more predictors
- line graphs
- about / Line graphs
- logarithm
- about / Logarithms/exponents
- examples / Logarithms/exponents
- logistic regression
- about / Logistic regression, The math of logistic regression
- log odds
- about / Probability, odds, and log odds
M
- machine learning
- about / The data science Venn diagram, Example – spawner-recruit models, Some more terminology, What is machine learning?
- facial recognition example / What is machine learning?
- limitations / Machine learning isn't perfect
- working / How does machine learning work?
- types / Types of machine learning
- supervised learning / Supervised learning
- unsupervised learning / Unsupervised learning
- overview / Overview of the types of machine learning
- magnitude / Set theory
- margin of error
- about / Confidence intervals
- Math & Statistics Knowledge base / The data science Venn diagram
- mathematics
- about / Mathematics as a discipline
- matrices
- multiplying / How to multiply matrices
- matrix
- about / Vectors and matrices
- matrix multiplication
- about / Matrix multiplication
- measures of center / Measures of center
- measures of relative standing
- about / Measures of relative standing
- correlations, in data / The insightful part – correlations in data
- measures of variation
- about / Measures of variation, Measures of relative standing
- median / Measures of center, Measures of center
- model coefficients
- about / Linear regression
- models
- about / The math
- multilayer perceptrons (MLP) / Basic structure
- multiplication rule / The multiplication rule
- mutual exclusivity / Mutual exclusivity
- mutually exhaustive / Collectively exhaustive events
N
- Naïve Bayes classification
- about / Naïve Bayes classification
- neural networks
- about / Neural networks
- basic structure / Basic structure
- advantage / Basic structure
- nominal level, of data
- about / The nominal level, What data is like at the nominal level
- mathematical operations / Mathematical operations allowed
- measure of center / Measures of center
- normalizing constant
- about / Naïve Bayes classification
- notation / Probability
- null hypothesis
- about / Hypothesis tests
- null model
- about / Regression metrics
- null set / Set theory
O
- odds
- about / Probability, odds, and log odds
- one-tailed test / Assumptions of the one sample t-tests
- one sample t-test
- about / One sample t-tests
- example / Example of a one sample t-tests
- assumptions / Assumptions of the one sample t-tests
- optimal number
- selecting, for cluster validation / Choosing an optimal number for K and cluster validation
- ordinal level, of data
- about / The ordinal level, Quick recap and check
- examples / Examples
- mathematical operations / Mathematical operations allowed
- measures of center / Measures of center
- organized data
- about / Basic terminology
- overfitting
- about / Regression metrics
P
- p-value
- about / Hypothesis tests
- parameter
- about / What are statistics?
- pattern recognition / Basic structure
- perceptron / Basic structure
- perceptrons / Basic structure
- point estimates
- about / Point estimates
- Poisson distribution
- about / Point estimates
- Poisson random variable
- about / Poisson random variable,, Point estimates
- examples / Poisson random variable,
- call center example / Poisson random variable,
- population
- about / What are statistics?
- posterior
- about / Naïve Bayes classification
- prediction / Supervised learning
- predictive analytics models / Supervised learning
- presentation, to formal audience
- tips / On the more formal side of things
- Principal Component Analysis (PCA) / Feature extraction and principal component analysis
- about / Feature extraction and principal component analysis
- prior probability
- about / Naïve Bayes classification
- probabilistic model
- about / Some more terminology
- probability
- about / Probability, Probability, odds, and log odds
- probability, rules
- about / The rules of probability
- addition rule / The addition rule
- mutual exclusivity / Mutual exclusivity
- multiplication rule / The multiplication rule
- independence / Independence
- complementary events / Complementary events
- probability density function (PDF)
- about / Continuous random variables
- probability mass function (PMF) / Binomial random variables
- probability mass functions (PMF)
- about / Discrete random variables
- probability sampling / Probability sampling
- procedure
- about / Basic definitions
- proportional
- about / Proportional
- Python
- need for / Why Python?
- practices / Python practices
Q
- qualitative data
- about / Quantitative versus qualitative data
- qualitative data, versus quantitative data
- about / Quantitative versus qualitative data
- coffee shop data example / Example – coffee shop data
- world alcohol consumption data example / Example – world alcohol consumption data
- quantitative data
- about / Quantitative versus qualitative data
- discrete data / Digging deeper
- continuous data / Digging deeper
R
- random forests
- about / Random forests
- versus decision trees / Comparing Random forests with decision trees
- advantages / Comparing Random forests with decision trees
- disadvantages / Comparing Random forests with decision trees
- random sampling / Random sampling
- random variables
- about / Random variables
- discrete random variable / Discrete random variables
- ratio level, of data
- about / The ratio level
- examples / Examples
- measures of center / Measures of center
- issues / Problems with the ratio level
- regression
- about / Regression
- regression metrics
- about / Regression metrics
- regression tree
- building / How does a computer build a regression tree?
- reinforcement learning
- about / Reinforcement learning, Overview of the types of machine learning
- pros / Overview of the types of machine learning
- cons / Overview of the types of machine learning
- relative frequency
- about / Frequentist approach
- relative length
- about / Relative length of text
S
- sample
- about / What are statistics?
- sample space
- about / Basic definitions
- sampling bias / Random sampling
- sampling distributions
- about / Sampling distributions
- scalar
- about / Dot product
- scatter plot
- about / Scatter plots
- set / Set theory
- set theory / Set theory
- Silhouette Coefficient
- about / The Silhouette Coefficient
- Simpson's paradox / Simpson's paradox
- slope
- about / Graphs
- spawner-recruit models
- example / Example – spawner-recruit models
- square matrix
- about / Vectors and matrices
- standard deviation / Standard deviation
- about / Measures of variation
- standard normal distribution
- about / Continuous random variables
- statistical model
- about / Some more terminology
- statistical modeling
- about / How does statistical modeling fit into all of this?
- statistics
- about / What are statistics?
- statistics, measuring
- about / How do we measure statistics?
- measures of center / Measures of center
- measures of variation / Measures of variation
- measure of relative standing / Measures of relative standing
- stock prices prediction based on social media case study
- about / Case study 1 – predicting stock prices based on social media
- text sentiment analysis / Text sentiment analysis
- exploratory data analysis / Exploratory data analysis
- regression route / Regression route
- classification route / Classification route
- example / Going beyond with this example
- structured data
- about / Structured versus unstructured data
- versus unstructured data / Structured versus unstructured data
- subset / Set theory
- Substantive Expertise / The data science Venn diagram
- summation
- about / Summation
- sum of squared residuals
- about / Linear regression
- supervised learning
- about / Supervised learning
- working / Supervised learning
- example / Supervised learning
- predictions / It's not only about predictions
- types / Types of supervised learning
- regression / Regression
- classification / Classification
- pros / Overview of the types of machine learning
- cons / Overview of the types of machine learning
T
- tensorflow case study
- about / Case study 3 – using tensorflow
- neural networks, creating / Tensorflow and neural networks
- test statistic
- about / Assumptions of the one sample t-tests
- titanic dataset / Dataset 2 – titanic
- training error visualization
- versus cross validation error / Visualizing training error versus cross-validation error
- true negatives / A bit deeper
- true positives / A bit deeper
- Type I error / A bit deeper
- type I error / Type I and type II errors
- type II error / Type I and type II errors
- Type II error / A bit deeper
U
- underfitting / Underfitting
- unequal probability sampling / Unequal probability sampling
- union / Set theory
- unorganized data
- about / Basic terminology
- unstructured data
- about / Structured versus unstructured data
- versus structured data / Structured versus unstructured data
- unsupervised learning
- about / Unsupervised learning, Overview of the types of machine learning, Unsupervised learning
- reinforcement learning / Reinforcement learning
- pros / Overview of the types of machine learning
- cons / Overview of the types of machine learning
- using / When to use unsupervised learning
V
- vector
- about / Vectors and matrices
- verbal communication
- about / Verbal communication
- story telling / It's about telling a story
W
- why/how/what strategy, of presentation
- about / The why/how/what strategy of presenting
Y
- yelp dataset
- Dataframe / Dataframes
- Series object / Series
Z
- z-score / Measures of relative standing