Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Principles of Data Science

You're reading from   Principles of Data Science Mathematical techniques and theory to succeed in data-driven industries

Arrow left icon
Product type Paperback
Published in Dec 2016
Publisher Packt
ISBN-13 9781785887918
Length 388 pages
Edition 1st Edition
Languages
Arrow right icon
Toc

Table of Contents (20) Chapters Close

Principles of Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. How to Sound Like a Data Scientist FREE CHAPTER 2. Types of Data 3. The Five Steps of Data Science 4. Basic Mathematics 5. Impossible or Improbable – A Gentle Introduction to Probability 6. Advanced Probability 7. Basic Statistics 8. Advanced Statistics 9. Communicating Data 10. How to Tell If Your Toaster Is Learning – Machine Learning Essentials 11. Predictions Don't Grow on Trees – or Do They? 12. Beyond the Essentials 13. Case Studies Index

Index

A

  • A/B test
    • about / Experimental
  • Adam Optimizer
    • about / Tensorflow and neural networks
  • addition rule / The addition rule
  • alternative hypothesis
    • about / Hypothesis tests
  • anomaly detection / Basic structure
  • ARIMA / Going beyond with this example
  • arithmetic mean / Measures of center
    • about / Measures of center
  • arithmetic symbols
    • about / Arithmetic symbols
    • summation / Summation
    • proportional / Proportional
    • dot product / Dot product

B

  • back-propagation / Basic structure
  • bar charts
    • about / Bar charts
  • basic Python, example
    • about / Example of basic Python
    • single Tweet, parsing / Example – parsing a single tweet
  • Bayes formula
    • about / Proportional
  • Bayes theorem
    • about / Bayes theorem
    • examples / Bayes theorem
    • applications / More applications of Bayes theorem
    • titanic dataset / Example – Titanic
    • medical studies example / Example – medical studies
  • bi-modal
    • about / Sampling distributions
  • bias/variance tradeoff
    • extreme cases / Two extreme cases of bias/variance tradeoff
    • working, with error functions / How bias/variance play into error functions
  • bias variance tradeoff
    • about / The bias variance tradeoff
    • error, due to bias / Error due to bias
    • error, due to variance / Error due to variance
  • big data
    • about / Some more terminology
  • binary classifier / A bit deeper
  • binomial random variable
    • about / Binomial random variables
    • restaurant openings example / Binomial random variables
    • blood types example / Binomial random variables
  • box plots
    • about / Box plots
    • creating / Box plots

C

  • Cartesian graph / Graphs
  • causation
    • versus correlation / Correlation versus causation
  • central limit theorem
    • about / Sampling distributions
  • centroid
    • about / K-means clustering
  • chi-square goodness of fit test
    • about / Chi-square goodness of fit test
    • asuumptions / Assumptions of the chi-square goodness of fit test
    • example / Example of a chi-square test for goodness of fit
  • chi-square test for association/independence
    • about / Chi-square test for association/independence
    • assumptions / Assumptions of the chi-square independence test
  • classification
    • about / Classification
  • classification tree
    • fitting / How does a computer fit a classification tree?
  • cluster
    • about / K-means clustering
  • clustering
    • about / Unsupervised learning
  • coefficient of variation
    • about / Definition
    • employee salaries example / Example – employee salaries
  • collectively exhaustive / Collectively exhaustive events
  • collectively exhaustive events
    • about / Collectively exhaustive events
    • examples / Collectively exhaustive events
  • communication
    • about / Why does communication matter?
  • complementary events / Complementary events
  • compound events
    • about / Compound events
    • example / Compound events
  • conditional probability
    • about / Conditional probability
  • confidence
    • about / Confidence intervals
  • confidence intervals
    • about / Confidence intervals
  • confounding factor / Random sampling
  • confusion matrix / A bit deeper
  • continuous data
    • about / Digging deeper
    • example / Digging deeper
  • continuous random variable
    • about / Continuous random variables
  • correlation
    • versus causation / Correlation versus causation
  • correlation coefficients
    • about / The insightful part – correlations in data
  • cross validation error
    • versus training error visualization / Visualizing training error versus cross-validation error
  • CSV (comma separated value) / Example – world alcohol consumption data

D

  • data
    • organized data / Basic terminology
    • unorganized data / Basic terminology
    • types / Flavors of data, Why look at these distinctions?
    • levels / The four levels of data
  • data, obtaining
    • about / Obtaining data
    • observational / Observational
    • experimental / Experimental
  • data exploration
    • about / Explore the data
    • basic questions / Basic questions for data exploration
    • yelp dataset / Dataset 1 – Yelp
    • titanic dataset / Dataset 2 – titanic
  • data mining
    • about / Some more terminology
  • data model
    • about / The math
  • data points
    • about / Illustrative example – data points
  • data preprocessing
    • example / Example of data preprocessing
    • word/phrase counts / Word/phrase counts
    • relative length of text / Relative length of text
    • topics, picking / Picking out topics
  • data sampling
    • about / Sampling data
    • probability sampling / Probability sampling
    • random sampling / Random sampling
    • unequal probability sampling / Unequal probability sampling
  • data science
    • about / What is data science?, Basic terminology, The data science Venn diagram, Introduction to data science
    • need for / Why data science?
    • Sigma Technologies example / Example – Sigma Technologies
    • steps / Overview of the five steps
    • interesting question, asking / Ask an interesting question
    • data, obtaining / Obtain the data
    • data, exploring / Explore the data
    • data, modeling / Model the data
    • results, communicating / Communicate and visualize the results
    • results, visualizing / Communicate and visualize the results
  • data science, case studies
    • about / Data science case studies
    • government paper pushing automation / Case study – automating government paper pushing, Fire all humans, right?
    • marketing dollars / Case study – marketing dollars
    • job descriptions / Case study – what's in a job description?
  • data science Venn diagram
    • about / The data science Venn diagram
    • math/statistics / The data science Venn diagram, The math
    • computer programming / The data science Venn diagram, Computer programming
    • domain knowledge / The data science Venn diagram, Domain knowledge
  • decision trees
    • about / Decision trees
    • versus random forests / Comparing Random forests with decision trees
  • Deep Neural Network Classifier (DNNClassifier)
    • about / Tensorflow and neural networks
  • dimension reduction
    • about / Unsupervised learning
    • cons / Feature extraction and principal component analysis
  • discrete data
    • about / Digging deeper
    • example / Digging deeper
  • discrete random variables
    • about / Discrete random variables
    • types / Types of discrete random variables
    • binomial random variable / Binomial random variables
    • geometric random variable / Binomial random variables
    • Poisson random variable / Poisson random variable,
    • continuous random variable / Continuous random variables
  • domain knowledge / Domain knowledge
  • Domain Knowledge / The data science Venn diagram
  • dot product
    • about / Dot product
  • dummy variables
    • about / Dummy variables

E

  • Empirical rule
    • about / The Empirical rule
    • example / The Empirical rule
  • ensembling techniques / Ensembling techniques
  • entity movement / Basic structure
  • entropy
    • about / How does a computer build a regression tree?
  • error functions
    • about / How bias/variance play into error functions
  • Euler's number / Logistic regression
  • event
    • about / Basic definitions
  • exploration tips, for qualitative data
    • about / Exploration tips for qualitative data
    • nominal level columns / Nominal level columns
    • filtering / Filtering in Pandas
    • ordinal level columns / Ordinal level columns
  • exploratory data analysis (EDA)
    • about / Some more terminology
  • exponent
    • about / Logarithms/exponents
    • examples / Logarithms/exponents
  • extra-marital affairs case study
    • about / Case study 2 – why do some people cheat on their spouses?
  • extreme cases, bias/variance tradeoff
    • underfitting / Underfitting
    • overfitting / Overfitting

F

  • false negative
    • about / Type I and type II errors
  • false negatives / A bit deeper
  • false positive
    • about / Type I and type II errors
  • false positives / A bit deeper
  • feature extraction / The Silhouette Coefficient, Feature extraction and principal component analysis
    • about / Feature extraction and principal component analysis
    • pros / Feature extraction and principal component analysis
  • feature selection
    • about / Feature extraction and principal component analysis
  • filtering
    • about / Filtering in Pandas
  • Frequentist approach
    • about / Frequentist approach
    • marketing stats example / Frequentist approach
    • law of large numbers / The law of large numbers

G

  • geometric random variable
    • about / Binomial random variables
    • weather example / Binomial random variables
  • gini index
    • about / How does a computer build a regression tree?
  • global score / The Silhouette Coefficient
  • graphs
    • about / Graphs
    • Cartesian graph / Graphs
    • scatter plots / Scatter plots
    • line graphs / Line graphs
    • bar charts / Bar charts
    • histograms / Histograms
    • box plots / Box plots
  • grid searching
    • about / Grid searching

H

  • histograms
    • about / Histograms
    • plotting / Histograms
  • hypothesis test
    • about / Hypothesis tests
    • conducting / Conducting a hypothesis test
    • one sample t-tests / One sample t-tests
    • type II error / Type I and type II errors
    • type I error / Type I and type II errors
  • hypothesis test, for categorical values
    • about / Hypothesis test for categorical variables
    • chi-square goodness of fit test / Chi-square goodness of fit test
    • chi-square test for association/independence / Chi-square test for association/independence

I

  • independent events
    • examples / Independence
  • intersection / Set theory
  • interval level, of data
    • about / The interval level
    • example / Example
    • mathematical operations / Mathematical operations allowed
    • measures of center / Measures of center
    • measures of variation / Measures of variation

J

  • jaccard measure / Set theory

K

  • k-fold cross validation
    • about / Case study 3 – using tensorflow
  • K-means clustering
    • about / K-means clustering
    • example / Illustrative example – beer!
  • K-Nearest Neighbors (KNN) algorithm
    • about / How bias/variance play into error functions
  • K folds cross validation
    • about / K folds cross-validation
    • features / K folds cross-validation
  • KPI (key performance indicator)
    • about / Verbal communication

L

  • labeled data / Supervised learning
  • levels, data
    • nominal / The nominal level
    • ordinal / The ordinal level
    • interval / The interval level
    • ratio / The ratio level
  • likelihood
    • about / Naïve Bayes classification
  • likert scale
    • about / Discrete random variables
  • linear algebra
    • about / Dot product, Linear algebra
    • matrix multiplication / Matrix multiplication
  • linear regression
    • about / Linear regression
    • predictors, adding / Adding more predictors
  • line graphs
    • about / Line graphs
  • logarithm
    • about / Logarithms/exponents
    • examples / Logarithms/exponents
  • logistic regression
    • about / Logistic regression, The math of logistic regression
  • log odds
    • about / Probability, odds, and log odds

M

  • machine learning
    • about / The data science Venn diagram, Example – spawner-recruit models, Some more terminology, What is machine learning?
    • facial recognition example / What is machine learning?
    • limitations / Machine learning isn't perfect
    • working / How does machine learning work?
    • types / Types of machine learning
    • supervised learning / Supervised learning
    • unsupervised learning / Unsupervised learning
    • overview / Overview of the types of machine learning
  • magnitude / Set theory
  • margin of error
    • about / Confidence intervals
  • Math & Statistics Knowledge base / The data science Venn diagram
  • mathematics
    • about / Mathematics as a discipline
  • matrices
    • multiplying / How to multiply matrices
  • matrix
    • about / Vectors and matrices
  • matrix multiplication
    • about / Matrix multiplication
  • measures of center / Measures of center
  • measures of relative standing
    • about / Measures of relative standing
    • correlations, in data / The insightful part – correlations in data
  • measures of variation
    • about / Measures of variation, Measures of relative standing
  • median / Measures of center, Measures of center
  • model coefficients
    • about / Linear regression
  • models
    • about / The math
  • multilayer perceptrons (MLP) / Basic structure
  • multiplication rule / The multiplication rule
  • mutual exclusivity / Mutual exclusivity
  • mutually exhaustive / Collectively exhaustive events

N

  • Naïve Bayes classification
    • about / Naïve Bayes classification
  • neural networks
    • about / Neural networks
    • basic structure / Basic structure
    • advantage / Basic structure
  • nominal level, of data
    • about / The nominal level, What data is like at the nominal level
    • mathematical operations / Mathematical operations allowed
    • measure of center / Measures of center
  • normalizing constant
    • about / Naïve Bayes classification
  • notation / Probability
  • null hypothesis
    • about / Hypothesis tests
  • null model
    • about / Regression metrics
  • null set / Set theory

O

  • odds
    • about / Probability, odds, and log odds
  • one-tailed test / Assumptions of the one sample t-tests
  • one sample t-test
    • about / One sample t-tests
    • example / Example of a one sample t-tests
    • assumptions / Assumptions of the one sample t-tests
  • optimal number
    • selecting, for cluster validation / Choosing an optimal number for K and cluster validation
  • ordinal level, of data
    • about / The ordinal level, Quick recap and check
    • examples / Examples
    • mathematical operations / Mathematical operations allowed
    • measures of center / Measures of center
  • organized data
    • about / Basic terminology
  • overfitting
    • about / Regression metrics
    / Overfitting

P

  • p-value
    • about / Hypothesis tests
  • parameter
    • about / What are statistics?
  • pattern recognition / Basic structure
  • perceptron / Basic structure
  • perceptrons / Basic structure
  • point estimates
    • about / Point estimates
  • Poisson distribution
    • about / Point estimates
  • Poisson random variable
    • about / Poisson random variable,, Point estimates
    • examples / Poisson random variable,
    • call center example / Poisson random variable,
  • population
    • about / What are statistics?
  • posterior
    • about / Naïve Bayes classification
  • prediction / Supervised learning
  • predictive analytics models / Supervised learning
  • presentation, to formal audience
    • tips / On the more formal side of things
  • Principal Component Analysis (PCA) / Feature extraction and principal component analysis
    • about / Feature extraction and principal component analysis
  • prior probability
    • about / Naïve Bayes classification
  • probabilistic model
    • about / Some more terminology
  • probability
    • about / Probability, Probability, odds, and log odds
  • probability, rules
    • about / The rules of probability
    • addition rule / The addition rule
    • mutual exclusivity / Mutual exclusivity
    • multiplication rule / The multiplication rule
    • independence / Independence
    • complementary events / Complementary events
  • probability density function (PDF)
    • about / Continuous random variables
  • probability mass function (PMF) / Binomial random variables
  • probability mass functions (PMF)
    • about / Discrete random variables
  • probability sampling / Probability sampling
  • procedure
    • about / Basic definitions
  • proportional
    • about / Proportional
  • Python
    • need for / Why Python?
    • practices / Python practices

Q

  • qualitative data
    • about / Quantitative versus qualitative data
  • qualitative data, versus quantitative data
    • about / Quantitative versus qualitative data
    • coffee shop data example / Example – coffee shop data
    • world alcohol consumption data example / Example – world alcohol consumption data
  • quantitative data
    • about / Quantitative versus qualitative data
    • discrete data / Digging deeper
    • continuous data / Digging deeper

R

  • random forests
    • about / Random forests
    • versus decision trees / Comparing Random forests with decision trees
    • advantages / Comparing Random forests with decision trees
    • disadvantages / Comparing Random forests with decision trees
  • random sampling / Random sampling
  • random variables
    • about / Random variables
    • discrete random variable / Discrete random variables
  • ratio level, of data
    • about / The ratio level
    • examples / Examples
    • measures of center / Measures of center
    • issues / Problems with the ratio level
  • regression
    • about / Regression
  • regression metrics
    • about / Regression metrics
  • regression tree
    • building / How does a computer build a regression tree?
  • reinforcement learning
    • about / Reinforcement learning, Overview of the types of machine learning
    • pros / Overview of the types of machine learning
    • cons / Overview of the types of machine learning
  • relative frequency
    • about / Frequentist approach
  • relative length
    • about / Relative length of text

S

  • sample
    • about / What are statistics?
  • sample space
    • about / Basic definitions
  • sampling bias / Random sampling
  • sampling distributions
    • about / Sampling distributions
  • scalar
    • about / Dot product
  • scatter plot
    • about / Scatter plots
  • set / Set theory
  • set theory / Set theory
  • Silhouette Coefficient
    • about / The Silhouette Coefficient
  • Simpson's paradox / Simpson's paradox
  • slope
    • about / Graphs
  • spawner-recruit models
    • example / Example – spawner-recruit models
  • square matrix
    • about / Vectors and matrices
  • standard deviation / Standard deviation
    • about / Measures of variation
  • standard normal distribution
    • about / Continuous random variables
  • statistical model
    • about / Some more terminology
  • statistical modeling
    • about / How does statistical modeling fit into all of this?
  • statistics
    • about / What are statistics?
  • statistics, measuring
    • about / How do we measure statistics?
    • measures of center / Measures of center
    • measures of variation / Measures of variation
    • measure of relative standing / Measures of relative standing
  • stock prices prediction based on social media case study
    • about / Case study 1 – predicting stock prices based on social media
    • text sentiment analysis / Text sentiment analysis
    • exploratory data analysis / Exploratory data analysis
    • regression route / Regression route
    • classification route / Classification route
    • example / Going beyond with this example
  • structured data
    • about / Structured versus unstructured data
    • versus unstructured data / Structured versus unstructured data
  • subset / Set theory
  • Substantive Expertise / The data science Venn diagram
  • summation
    • about / Summation
  • sum of squared residuals
    • about / Linear regression
  • supervised learning
    • about / Supervised learning
    • working / Supervised learning
    • example / Supervised learning
    • predictions / It's not only about predictions
    • types / Types of supervised learning
    • regression / Regression
    • classification / Classification
    • pros / Overview of the types of machine learning
    • cons / Overview of the types of machine learning

T

  • tensorflow case study
    • about / Case study 3 – using tensorflow
    • neural networks, creating / Tensorflow and neural networks
  • test statistic
    • about / Assumptions of the one sample t-tests
  • titanic dataset / Dataset 2 – titanic
  • training error visualization
    • versus cross validation error / Visualizing training error versus cross-validation error
  • true negatives / A bit deeper
  • true positives / A bit deeper
  • Type I error / A bit deeper
  • type I error / Type I and type II errors
  • type II error / Type I and type II errors
  • Type II error / A bit deeper

U

  • underfitting / Underfitting
  • unequal probability sampling / Unequal probability sampling
  • union / Set theory
  • unorganized data
    • about / Basic terminology
  • unstructured data
    • about / Structured versus unstructured data
    • versus structured data / Structured versus unstructured data
  • unsupervised learning
    • about / Unsupervised learning, Overview of the types of machine learning, Unsupervised learning
    • reinforcement learning / Reinforcement learning
    • pros / Overview of the types of machine learning
    • cons / Overview of the types of machine learning
    • using / When to use unsupervised learning

V

  • vector
    • about / Vectors and matrices
  • verbal communication
    • about / Verbal communication
    • story telling / It's about telling a story

W

  • why/how/what strategy, of presentation
    • about / The why/how/what strategy of presenting

Y

  • yelp dataset
    • Dataframe / Dataframes
    • Series object / Series

Z

  • z-score / Measures of relative standing
lock icon The rest of the chapter is locked
arrow left Previous Section
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime
Visually different images