Thoughtful Machine Learning. A Test-Driven Approach - Helion
ISBN: 978-14-493-7409-9
stron: 236, Format: ebook
Data wydania: 2014-09-26
Księgarnia: Helion
Cena książki: 92,65 zł (poprzednio: 107,73 zł)
Oszczędzasz: 14% (-15,08 zł)
Learn how to apply test-driven development (TDD) to machine-learning algorithms—and catch mistakes that could sink your analysis. In this practical guide, author Matthew Kirk takes you through the principles of TDD and machine learning, and shows you how to apply TDD to several machine-learning algorithms, including Naive Bayesian classifiers and Neural Networks.
Machine-learning algorithms often have tests baked in, but they can’t account for human errors in coding. Rather than blindly rely on machine-learning results as many researchers have, you can mitigate the risk of errors with TDD and write clean, stable machine-learning code. If you’re familiar with Ruby 2.1, you’re ready to start.
- Apply TDD to write and run tests before you start coding
- Learn the best uses and tradeoffs of eight machine learning algorithms
- Use real-world examples to test each algorithm through engaging, hands-on exercises
- Understand the similarities between TDD and the scientific method for validating solutions
- Be aware of the risks of machine learning, such as underfitting and overfitting data
- Explore techniques for improving your machine-learning models or data extraction
Osoby które kupowały "Thoughtful Machine Learning. A Test-Driven Approach", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Thoughtful Machine Learning. A Test-Driven Approach eBook -- spis treści
- Preface
- What to Expect from This Book
- How to Read This Book
- Who This Book Is For
- How to Contact Me
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- Acknowledgments
- 1. Test-Driven Machine Learning
- History of Test-Driven Development
- TDD and the Scientific Method
- TDD Makes a Logical Proposition of Validity
- Example: Proof through axioms and functional tests
- Example: Proof through sufficient conditions, unit tests, and integration tests
- TDD Involves Writing Your Assumptions Down on Paper or in Code
- TDD and Scientific Method Work in Feedback Loops
- Example: Peer review
- TDD Makes a Logical Proposition of Validity
- Risks with Machine Learning
- Unstable Data
- Underfitting
- Overfitting
- Unpredictable Future
- What to Test for to Reduce Risks
- Mitigate Unstable Data with Seam Testing
- Example: Seam testing a neural network
- Check Fit by Cross-Validating
- Example: Cross-validating a model
- Reduce Overfitting Risk by Testing the Speed of Training
- Example: Benchmark testing
- Monitor for Future Shifts with Precision and Recall
- Conclusion
- 2. A Quick Introduction to Machine Learning
- What Is Machine Learning?
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- What Can Machine Learning Accomplish?
- Mathematical Notation Used Throughout the Book
- Conclusion
- What Is Machine Learning?
- 3. K-Nearest Neighbors Classification
- History of K-Nearest Neighbors Classification
- House Happiness Based on a Neighborhood
- How Do You Pick K?
- Guessing K
- Heuristics for Picking K
- Use coprime class and K combinations
- Choose a K that is greater or equal to the number of classes + 1
- Choose a K that is low enough to avoid noise
- Algorithms for Picking K
- What Makes a Neighbor Near?
- Minkowski Distance
- Mahalanobis Distance
- Determining Classes
- Beard and Glasses Detection Using KNN and OpenCV
- The Class Diagram
- Raw Image to Avatar
- The Face Class
- Testing the Face class
- The Neighborhood Class
- Bootstrapping the neighborhood with faces
- Cross-validation and finding K
- Conclusion
- 4. Naive Bayesian Classification
- Using Bayes Theorem to Find Fraudulent Orders
- Conditional Probabilities
- Inverse Conditional Probability (aka Bayes Theorem)
- Naive Bayesian Classifier
- The Chain Rule
- Naivety in Bayesian Reasoning
- Pseudocount
- Spam Filter
- The Class Diagram
- Data Source
- Email Class
- Tokenization and Context
- The SpamTrainer
- Storing training data
- Building the Bayesian classifier
- Calculating a classification
- Error Minimization Through Cross-Validation
- Minimizing false positives
- Building the two folds
- Cross-validation and error measuring
- Conclusion
- Using Bayes Theorem to Find Fraudulent Orders
- 5. Hidden Markov Models
- Tracking User Behavior Using State Machines
- Emissions/Observations of Underlying States
- Simplification through the Markov Assumption
- Using Markov Chains Instead of a Finite State Machine
- Hidden Markov Model
- Evaluation: Forward-Backward Algorithm
- Using User Behavior
- The Decoding Problem through the Viterbi Algorithm
- The Learning Problem
- Part-of-Speech Tagging with the Brown Corpus
- The Seam of Our Part-of-Speech Tagger: CorpusParser
- Writing the Part-of-Speech Tagger
- Cross-Validating to Get Confidence in the Model
- How to Make This Model Better
- Conclusion
- Tracking User Behavior Using State Machines
- 6. Support Vector Machines
- Solving the Loyalty Mapping Problem
- Derivation of SVM
- Nonlinear Data
- The Kernel Trick
- Homogenous polynomial
- Heterogenous polynomial
- Radial basis functions
- When should you use each kernel?
- Soft Margins
- Optimizing with slack
- Trading off margin maximization with slack variable minimization using C
- Using SVM to Determine Sentiment
- The Class Diagram
- Corpus Class
- Tokenization of text
- Sentiment leaning, :positive or :negative
- Sentiment codes for :positive and :negative
- Return a Unique Set of Words from the Corpus
- The CorpusSet Class
- Zip two corpus objects
- Build a sparse vector that ties into SentimentClassifier
- The SentimentClassifier Class
- Refactoring the interaction with CorpusSet
- Library to handle Support Vector Machines: LibSVM
- Training data
- Cross-validating with the movie review data
- Improving Results Over Time
- Conclusion
- 7. Neural Networks
- History of Neural Networks
- What Is an Artificial Neural Network?
- Input Layer
- Standard inputs
- Symmetric inputs
- Hidden Layers
- Neurons
- Activation functions
- Output Layer
- Training Algorithms
- The delta rule
- Back Propagation
- QuickProp
- RProp
- Input Layer
- Building Neural Networks
- How Many Hidden Layers?
- How Many Neurons for Each Layer?
- Tolerance for Error and Max Epochs
- Using a Neural Network to Classify a Language
- Writing the Seam Test for Language
- Cross-Validating Our Way to a Network Class
- Tuning the Neural Network
- Convergence Testing
- Precision and Recall for Neural Networks
- Wrap-Up of Example
- Conclusion
- 8. Clustering
- User Cohorts
- K-Means Clustering
- The K-Means Algorithm
- The Downside of K-Means Clustering
- Expectation Maximization (EM) Clustering
- The Impossibility Theorem
- Categorizing Music
- Gathering the Data
- Analyzing the Data with K-Means
- EM Clustering
- EM Jazz Clustering Results
- Conclusion
- 9. Kernel Ridge Regression
- Collaborative Filtering
- Linear Regression Applied to Collaborative Filtering
- Introducing Regularization, or Ridge Regression
- Kernel Ridge Regression
- Wrap-Up of Theory
- Collaborative Filtering with Beer Styles
- Data Set
- The Tools We Will Need
- Reviewer
- Writing the Code to Figure Out Someones Preference
- Collaborative Filtering with User Preferences
- Conclusion
- 10. Improving Models and Data Extraction
- The Problem with the Curse of Dimensionality
- Feature Selection
- Feature Transformation
- Principal Component Analysis (PCA)
- Independent Component Analysis (ICA)
- Monitoring Machine Learning Algorithms
- Precision and Recall: Spam Filter
- The Confusion Matrix
- Mean Squared Error
- The Wilds of Production Environments
- Conclusion
- 11. Putting It All Together
- Machine Learning Algorithms Revisited
- How to Use This Information for Solving Problems
- Whats Next for You?
- Index