Essential Math for AI - Helion

ebook

Autor: Hala Nelson
ISBN: 9781098107581
stron: 604, Format: ebook
Data wydania: 2023-01-04
Księgarnia: Helion

Cena książki: 271,15 zł (poprzednio: 319,00 zł)
Oszczędzasz: 15% (-47,85 zł)

Osoby, które kupiły tę książkę, wybierały także »

Companies are scrambling to integrate AI into their systems and operations. But to build truly successful solutions, you need a firm grasp of the underlying mathematics. This accessible guide walks you through the math necessary to thrive in the AI field such as focusing on real-world applications rather than dense academic theory.

Engineers, data scientists, and students alike will examine mathematical topics critical for AI--including regression, neural networks, optimization, backpropagation, convolution, Markov chains, and more--through popular applications such as computer vision, natural language processing, and automated systems. And supplementary Jupyter notebooks shed light on examples with Python code and visualizations. Whether you're just beginning your career or have years of experience, this book gives you the foundation necessary to dive deeper in the field.

Understand the underlying mathematics powering AI systems, including generative adversarial networks, random graphs, large random matrices, mathematical logic, optimal control, and more
Learn how to adapt mathematical methods to different applications from completely different fields
Gain the mathematical fluency to interpret and explain how AI systems arrive at their decisions

Osoby które kupowały "Essential Math for AI", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
Efekt piaskownicy. Jak szefować żeby roboty nie zabrały ci roboty 59,50 zł, (11,90 zł -80%)
Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)

Spis treści

Essential Math for AI eBook -- spis treści

Preface
- Why I Wrote This Book
- Who Is This Book For?
- Who Is This Book Not For?
- How Will the Math Be Presented in This Book?
- Infographic
- What Math Background Is Expected from You to Be Able to Read This Book?
- Overview of the Chapters
- My Favorite Books on AI
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
1. Why Learn the Mathematics of AI?
- What Is AI?
- Why Is AI So Popular Now?
- What Is AI Able to Do?
  - An AI Agents Specific Tasks
- What Are AIs Limitations?
- What Happens When AI Systems Fail?
- Where Is AI Headed?
- Who Are the Current Main Contributors to the AI Field?
- What Math Is Typically Involved in AI?
- Summary and Looking Ahead
2. Data, Data, Data
- Data for AI
- Real Data Versus Simulated Data
- Mathematical Models: Linear Versus Nonlinear
- An Example of Real Data
- An Example of Simulated Data
- Mathematical Models: Simulations and AI
- Where Do We Get Our Data From?
- The Vocabulary of Data Distributions, Probability, and Statistics
  - Random Variables
  - Probability Distributions
  - Marginal Probabilities
  - The Uniform and the Normal Distributions
  - Conditional Probabilities and Bayes Theorem
  - Conditional Probabilities and Joint Distributions
  - Prior Distribution, Posterior Distribution, and Likelihood Function
  - Mixtures of Distributions
  - Sums and Products of Random Variables
  - Using Graphs to Represent Joint Probability Distributions
  - Expectation, Mean, Variance, and Uncertainty
  - Covariance and Correlation
  - Markov Process
  - Normalizing, Scaling, and/or Standardizing a Random Variable or Data Set
  - Common Examples
- Continuous Distributions Versus Discrete Distributions (Density Versus Mass)
- The Power of the Joint Probability Density Function
- Distribution of Data: The Uniform Distribution
- Distribution of Data: The Bell-Shaped Normal (Gaussian) Distribution
- Distribution of Data: Other Important and Commonly Used Distributions
- The Various Uses of the Word Distribution
- A/B Testing
- Summary and Looking Ahead
3. Fitting Functions to Data
- Traditional and Very Useful Machine Learning Models
- Numerical Solutions Versus Analytical Solutions
- Regression: Predict a Numerical Value
  - Training Function
  - Loss Function
    - The predicted value versus the true value
    - The absolute value distance versus the squared distance
    - Functions with singularities
    - For linear regression, the loss function is the mean squared error
    - Notation: Vectors in this book are always column vectors
    - The training, validation, and test subsets
    - Recap
    - When the training data has highly correlated features
  - Optimization
    - Convex landscapes versus nonconvex landscapes
    - How do we locate minimizers of functions?
    - Calculus in a nutshell
    - A one-dimensional optimization example
    - Derivatives of linear algebra expressions that we use all the time
    - Minimizing the mean squared error loss function
- Logistic Regression: Classify into Two Classes
  - Training Function
  - Loss Function
  - Optimization
- Softmax Regression: Classify into Multiple Classes
  - Training Function
  - Loss Function
  - Optimization
- Incorporating These Models into the Last Layer of a Neural Network
- Other Popular Machine Learning Techniques and Ensembles of Techniques
  - Support Vector Machines
    - Training function
    - Loss function
    - Optimization
    - The kernel trick
  - Decision Trees
    - Entropy and Gini impurity
    - Entropy and information gain
      - Binary output
      - Multi-class output
    - Gini impurity
    - Regression decision trees
    - Shortcomings of decision trees
  - Random Forests
  - k-means Clustering
- Performance Measures for Classification Models
- Summary and Looking Ahead
4. Optimization for Neural Networks
- The Brain Cortex and Artificial Neural Networks
- Training Function: Fully Connected, or Dense, Feed Forward Neural Networks
  - A Neural Network Is a Computational Graph Representation of the Training Function
  - Linearly Combine, Add Bias, Then Activate
    - The weights
    - A linear combination plus bias
    - Pass the result through a nonlinear activation function
    - Notation overview
  - Common Activation Functions
  - Universal Function Approximation
    - Example 1: Approximating irrational numbers with rational numbers
    - Example 2: Approximating continuous functions with polynomials
    - Statement of the universal approximation theorem for neural networks
  - Approximation Theory for Deep Learning
- Loss Functions
- Optimization
  - Mathematics and the Mysterious Success of Neural Networks
  - Gradient Descent i+1 = i - L ( i )
  - Explaining the Role of the Learning Rate Hyperparameter
    - The scale of the features affects the performance of the gradient descent
    - Near the minima (local and/or global), flat regions, or saddle points of the loss functions landscape, the gradient descent method crawls
  - Convex Versus Nonconvex Landscapes
  - Stochastic Gradient Descent
  - Initializing the Weights 0 for the Optimization Process
- Regularization Techniques
  - Dropout
  - Early Stopping
  - Batch Normalization of Each Layer
  - Control the Size of the Weights by Penalizing Their Norm
    - Commonly used weight decay regularizations
    - When do we use plain linear regression, ridge, lasso, or elastic net?
  - Penalizing the l 2 Norm Versus Penalizing the l 1 Norm
  - Explaining the Role of the Regularization Hyperparameter
- Hyperparameter Examples That Appear in Machine Learning
- Chain Rule and Backpropagation: Calculating L ( i )
  - Backpropagation Is Not Too Different from How Our Brain Learns
  - Why Is It Better to Backpropagate?
  - Backpropagation in Detail
- Assessing the Significance of the Input Data Features
- Summary and Looking Ahead
5. Convolutional Neural Networks and Computer Vision
- Convolution and Cross-Correlation
  - Translation Invariance and Translation Equivariance
  - Convolution in Usual Space Is a Product in Frequency Space
- Convolution from a Systems Design Perspective
  - Convolution and Impulse Response for Linear and Translation Invariant Systems
- Convolution and One-Dimensional Discrete Signals
- Convolution and Two-Dimensional Discrete Signals
  - Filtering Images
  - Feature Maps
- Linear Algebra Notation
  - The One-Dimensional Case: Multiplication by a Toeplitz Matrix
  - The Two-Dimensional Case: Multiplication by a Doubly Block Circulant Matrix
- Pooling
- A Convolutional Neural Network for Image Classification
- Summary and Looking Ahead
6. Singular Value Decomposition: Image Processing, Natural Language Processing, and Social Media
- Matrix Factorization
- Diagonal Matrices
- Matrices as Linear Transformations Acting on Space
  - Action of A on the Right Singular Vectors
  - Action of A on the Standard Unit Vectors and the Unit Square Determined by Them
  - Action of A on the Unit Circle
  - Breaking Down the Circle-to-Ellipse Transformation According to the Singular Value Decomposition
  - Rotation and Reflection Matrices
    - Rotation matrix
    - Reflection matrix
  - Action of A on a General Vector x
- Three Ways to Multiply Matrices
- The Big Picture
  - The Condition Number and Computational Stability
- The Ingredients of the Singular Value Decomposition
- Singular Value Decomposition Versus the Eigenvalue Decomposition
- Computation of the Singular Value Decomposition
  - Computing an Eigenvector Numerically
- The Pseudoinverse
- Applying the Singular Value Decomposition to Images
- Principal Component Analysis and Dimension Reduction
- Principal Component Analysis and Clustering
- A Social Media Application
- Latent Semantic Analysis
- Randomized Singular Value Decomposition
- Summary and Looking Ahead
7. Natural Language and Finance AI: Vectorization and Time Series
- Natural Language AI
- Preparing Natural Language Data for Machine Processing
- Statistical Models and the log Function
- Zipfs Law for Term Counts
- Various Vector Representations for Natural Language Documents
  - Term Frequency Vector Representation of a Document or Bag of Words
  - Term Frequency-Inverse Document Frequency Vector Representation of a Document
  - Topic Vector Representation of a Document Determined by Latent Semantic Analysis
    - Topic selection and dimension reduction
    - Shortcomings of latent semantic analysis
  - Topic Vector Representation of a Document Determined by Latent Dirichlet Allocation
  - Topic Vector Representation of a Document Determined by Latent Discriminant Analysis
  - Meaning Vector Representations of Words and of Documents Determined by Neural Network Embeddings
    - Word2vec vector representation of individual terms by incorporating continuous-ness attributes
    - How to visualize vectors representing words
    - Facebooks fastText vector representation of individual n-character grams
    - Doc2vec or par2vec vector representation of a document
    - Global vector or vector representation of words
- Cosine Similarity
- Natural Language Processing Applications
  - Sentiment Analysis
  - Spam Filter
  - Search and Information Retrieval
  - Machine Translation
  - Image Captioning
  - Chatbots
  - Other Applications
- Transformers and Attention Models
  - The Transformer Architecture
  - The Attention Mechanism
  - Transformers Are Far from Perfect
- Convolutional Neural Networks for Time Series Data
- Recurrent Neural Networks for Time Series Data
  - How Do Recurrent Neural Networks Work?
  - Gated Recurrent Units and Long Short-Term Memory Units
- An Example of Natural Language Data
- Finance AI
- Summary and Looking Ahead
8. Probabilistic Generative Models
- What Are Generative Models Useful For?
- The Typical Mathematics of Generative Models
- Shifting Our Brain from Deterministic Thinking to Probabilistic Thinking
- Maximum Likelihood Estimation
- Explicit and Implicit Density Models
- Explicit Density-Tractable: Fully Visible Belief Networks
  - Example: Generating Images via PixelCNN and Machine Audio via WaveNet
- Explicit Density-Tractable: Change of Variables Nonlinear Independent Component Analysis
- Explicit Density-Intractable: Variational Autoencoders Approximation via Variational Methods
- Explicit Density-Intractable: Boltzman Machine Approximation via Markov Chain
- Implicit Density-Markov Chain: Generative Stochastic Network
- Implicit Density-Direct: Generative Adversarial Networks
  - How Do Generative Adversarial Networks Work?
- Example: Machine Learning and Generative Networks for High Energy Physics
- Other Generative Models
  - Naive Bayes Classification Model
  - Gaussian Mixture Model
- The Evolution of Generative Models
  - Hopfield Nets
  - Boltzmann Machine
  - Restricted Boltzmann Machine (Explicit Density and Intractable)
    - Conditional independence
    - Universal approximation
  - The Original Autoencoder
- Probabilistic Language Modeling
- Summary and Looking Ahead
9. Graph Models
- Graphs: Nodes, Edges, and Features for Each
- Example: PageRank Algorithm
- Inverting Matrices Using Graphs
- Cayley Graphs of Groups: Pure Algebra and Parallel Computing
- Message Passing Within a Graph
- The Limitless Applications of Graphs
  - Brain Networks
  - Spread of Disease
  - Spread of Information
  - Detecting and Tracking Fake News Propagation
  - Web-Scale Recommendation Systems
  - Fighting Cancer
  - Biochemical Graphs
  - Molecular Graph Generation for Drug and Protein Structure Discovery
  - Citation Networks
  - Social Media Networks and Social Influence Prediction
  - Sociological Structures
  - Bayesian Networks
  - Traffic Forecasting
  - Logistics and Operations Research
  - Language Models
  - Graph Structure of the Web
  - Automatically Analyzing Computer Programs
  - Data Structures in Computer Science
  - Load Balancing in Distributed Networks
  - Artificial Neural Networks
- Random Walks on Graphs
- Node Representation Learning
- Tasks for Graph Neural Networks
  - Node Classification
  - Graph Classification
  - Clustering and Community Detection
  - Graph Generation
  - Influence Maximization
  - Link Prediction
- Dynamic Graph Models
- Bayesian Networks
  - A Bayesian Network Represents a Compactified Conditional Probability Table
  - Making Predictions Using a Bayesian Network
  - Bayesian Networks Are Belief Networks, Not Causal Networks
  - Keep This in Mind About Bayesian Networks
  - Chains, Forks, and Colliders
  - Given a Data Set, How Do We Set Up a Bayesian Network for the Involved Variables?
- Graph Diagrams for Probabilistic Causal Modeling
- A Brief History of Graph Theory
- Main Considerations in Graph Theory
  - Spanning Trees and Shortest Spanning Trees
  - Cut Sets and Cut Vertices
  - Planarity
  - Graphs as Vector Spaces
  - Realizability
  - Coloring and Matching
  - Enumeration
- Algorithms and Computational Aspects of Graphs
- Summary and Looking Ahead
10. Operations Research
- No Free Lunch
- Complexity Analysis and O() Notation
- Optimization: The Heart of Operations Research
- Thinking About Optimization
  - Optimization: Finite Dimensions, Unconstrained
  - Optimization: Finite Dimensions, Constrained Lagrange Multipliers
    - The meaning of Lagrange multipliers
  - Optimization: Infinite Dimensions, Calculus of Variations
    - Analogy between optimizing functions and optimizing functionals
    - Example 1: Harmonic functions, the Dirichlet energy, and the heat equation
      - A harmonic function minimizes the Dirichlet energy
      - The heat equation does gradient descent for the Dirichlet energy functional
    - Example 2: The shortest path between two points is along the straight line connecting them
    - Other introductory examples to the calculus of variations
- Optimization on Networks
  - Traveling Salesman Problem
  - Minimum Spanning Tree
  - Shortest Path
  - Max-Flow Min-Cut
  - Max-Flow Min-Cost
  - The Critical Path Method for Project Design
- The n-Queens Problem
- Linear Optimization
  - The General Form and the Standard Form
  - Visualizing a Linear Optimization Problem in Two Dimensions
  - Convex to Linear
  - The Geometry of Linear Optimization
    - The interplay of algebra and geometry
  - The Simplex Method
    - The main idea of the simplex method
    - The simplex method hops around the corners of the polyhedron
    - Steps of the simplex method
    - Notes on the simplex method
    - The revised simplex method
    - The full tableau implementation of the simplex method
  - Transportation and Assignment Problems
  - Duality, Lagrange Relaxation, Shadow Prices, Max-Min, Min-Max, and All That
    - Motivation for duality-Lagrange multipliers
    - Finding the dual linear optimization problem from the primal linear optimization problem
    - Derivation for the dual of a linear optimization problem in standard form
    - Dual simplex method
    - Example: Networks, linear optimization, and duality
    - Example: Two-person zero-sum games, linear optimization, and duality
    - Quadratic optimization with linear constraints, Lagrangian, min-max theorem, and duality
  - Sensitivity
- Game Theory and Multiagents
- Queuing
- Inventory
- Machine Learning for Operations Research
- Hamilton-Jacobi-Bellman Equation
- Operations Research for AI
- Summary and Looking Ahead
11. Probability
- Where Did Probability Appear in This Book?
- What More Do We Need to Know That Is Essential for AI?
- Causal Modeling and the Do Calculus
  - An Alternative: The Do Calculus
    - The adjustment formula
    - The backdoor criterion, or controlling for confounders
    - Controlling for confounders
    - Are there more rules that eliminate the do operator?
- Paradoxes and Diagram Interpretations
  - Monty Hall Problem
  - Berksons Paradox
  - Simpsons Paradox
- Large Random Matrices
  - Examples of Random Vectors and Random Matrices
    - Quantitative finance
    - Neuroscience
    - Mathematical physics: Wigner matrices
    - Multivariate statistics: Wishart matrices and covariance
    - Dynamical systems
    - Other equally important examples
  - Main Considerations in Random Matrix Theory
  - Random Matrix Ensembles
  - Eigenvalue Density of the Sum of Two Large Random Matrices
  - Essential Math for Large Random Matrices
- Stochastic Processes
  - Bernoulli Process
  - Poisson Process
  - Random Walk
  - Wiener Process or Brownian Motion
  - Martingale
  - Levy Process
  - Branching Process
  - Markov Chain
  - Itôs Lemma
- Markov Decision Processes and Reinforcement Learning
  - Examples of Reinforcement Learning
  - Reinforcement Learning as a Markov Decision Process
  - Reinforcement Learning in the Context of Optimal Control and Nonlinear Dynamics
  - Python Library for Reinforcement Learning
- Theoretical and Rigorous Grounds
  - Which Events Have a Probability?
  - Can We Talk About a Wider Range of Random Variables?
  - A Probability Triple (Sample Space, Sigma Algebra, Probability Measure)
  - Where Is the Difficulty?
  - Random Variable, Expectation, and Integration
  - Distribution of a Random Variable and the Change of Variable Theorem
  - Next Steps in Rigorous Probability Theory
    - Limit theorems
  - The Universality Theorem for Neural Networks
- Summary and Looking Ahead
12. Mathematical Logic
- Various Logic Frameworks
- Propositional Logic
  - From Few Axioms to a Whole Theory
  - Codifying Logic Within an Agent
  - How Do Deterministic and Probabilistic Machine Learning Fit In?
- First-Order Logic
  - Relationships Between For All and There Exist
- Probabilistic Logic
- Fuzzy Logic
- Temporal Logic
- Comparison with Human Natural Language
- Machines and Complex Mathematical Reasoning
- Summary and Looking Ahead
13. Artificial Intelligence and Partial Differential Equations
- What Is a Partial Differential Equation?
- Modeling with Differential Equations
  - Models at Different Scales
  - The Parameters of a PDE
  - Changing One Thing in a PDE Can Be a Big Deal
  - Can AI Step In?
- Numerical Solutions Are Very Valuable
  - Continuous Functions Versus Discrete Functions
  - PDE Themes from My Ph.D. Thesis
    - Discretize right away and do a computer simulation
    - The curse of dimensionality
    - The geometry of the problem
    - Model things that you care for
  - Discretization and the Curse of Dimensionality
  - Finite Differences
    - Example: Solve y ' ( x ) = 1 on [0,1], with boundary conditions y(0)=-1 and y(1)=0
    - Example: Discretize the one-dimensional heat equation u t = u xx in the interior of the interval x ( 0 , 1 )
  - Finite Elements
  - Variational or Energy Methods
  - Monte Carlo Methods
- Some Statistical Mechanics: The Wonderful Master Equation
- Solutions as Expectations of Underlying Random Processes
- Transforming the PDE
  - Fourier Transform
  - Laplace Transform
- Solution Operators
  - Example Using the Heat Equation
  - Example Using the Poisson Equation
  - Fixed Point Iteration
    - How does it work?
    - How do we use it to solve ODEs and PDEs?
    - Simple but very informative example
    - Where is the complication?
    - Recent successes!
    - Setting the stage for deep learning for PDEs
    - Mesh independence and different resolutions
- AI for PDEs
  - Deep Learning to Learn Physical Parameter Values
  - Deep Learning to Learn Meshes
    - Deep learning for three-dimensional meshes
  - Deep Learning to Approximate Solution Operators of PDEs
    - Neural operator networks to learn the solution operators that we derived
    - The important questions
    - Fourier neural network
    - Statement of the universal approximation theorem for operators
    - How do we branch out and dive into the more technical details?
  - Numerical Solutions of High-Dimensional Differential Equations
  - Simulating Natural Phenomena Directly from Data
- Hamilton-Jacobi-Bellman PDE for Dynamic Programming
  - Bellmans equation in deterministic and stochastic settings
  - The big picture
  - Hamilton-Jacobi-Bellman PDE
  - Solving the Hamilton-Jacobi-Bellman PDE
  - Dynamic programming and reinforcement learning
- PDEs for AI?
- Other Considerations in Partial Differential Equations
- Summary and Looking Ahead
14. Artificial Intelligence, Ethics, Mathematics, Law, and Policy
- Good AI
- Policy Matters
- What Could Go Wrong?
  - From Math to Weapons
  - Chemical Warfare Agents
  - AI and Politics
  - Unintended Outcomes of Generative Models
- How to Fix It?
  - Addressing Underrepresentation in Training Data
  - Addressing Bias in Word Vectors
  - Addressing Privacy
  - Addressing Fairness
  - Injecting Morality into AI
  - Democratization and Accessibility of AI to Nonexperts
  - Prioritizing High Quality Data
- Distinguishing Bias from Discrimination
- The Hype
- Final Thoughts
Index