AI Engineering - Helion

ebook

Autor: Chip Huyen
ISBN: 9781098166267
stron: 534, Format: ebook
Data wydania: 2024-12-04
Księgarnia: Helion

Cena książki: 237,15 zł (poprzednio: 296,44 zł)
Oszczędzasz: 20% (-59,29 zł)

Osoby, które kupiły tę książkę, wybierały także »

Recent breakthroughs in AI have not only increased demand for AI products, they've also lowered the barriers to entry for those who want to build AI products. The model-as-a-service approach has transformed AI from an esoteric discipline into a powerful development tool that anyone can use. Everyone, including those with minimal or no prior AI experience, can now leverage AI models to build applications. In this book, author Chip Huyen discusses AI engineering: the process of building applications with readily available foundation models.

The book starts with an overview of AI engineering, explaining how it differs from traditional ML engineering and discussing the new AI stack. The more AI is used, the more opportunities there are for catastrophic failures, and therefore, the more important evaluation becomes. This book discusses different approaches to evaluating open-ended models, including the rapidly growing AI-as-a-judge approach.

AI application developers will discover how to navigate the AI landscape, including models, datasets, evaluation benchmarks, and the seemingly infinite number of use cases and application patterns. You'll learn a framework for developing an AI application, starting with simple techniques and progressing toward more sophisticated methods, and discover how to efficiently deploy these applications.

Understand what AI engineering is and how it differs from traditional machine learning engineering
Learn the process for developing an AI application, the challenges at each step, and approaches to address them
Explore various model adaptation techniques, including prompt engineering, RAG, fine-tuning, agents, and dataset engineering, and understand how and why they work
Examine the bottlenecks for latency and cost when serving foundation models and learn how to overcome them
Choose the right model, dataset, evaluation benchmarks, and metrics for your needs

Chip Huyen works to accelerate data analytics on GPUs at Voltron Data. Previously, she was with Snorkel AI and NVIDIA, founded an AI infrastructure startup, and taught Machine Learning Systems Design at Stanford. She's the author of the book Designing Machine Learning Systems, an Amazon bestseller in AI.

AI Engineering builds upon and is complementary to Designing Machine Learning Systems (O'Reilly).

Osoby które kupowały "AI Engineering", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Biologika Sukcesji Pokoleniowej. Sezon 3. Konflikty na terytorium 126,36 zł, (13,90 zł -89%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Podręcznik startupu. Budowa wielkiej firmy krok po kroku 92,67 zł, (13,90 zł -85%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)

Spis treści

AI Engineering. Building Applications with Foundation Models eBook -- spis treści

Preface
- What This Book Is About
- What This Book Is Not
- Who This Book Is For
- Navigating This Book
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
1. Introduction to Building AI Applications with Foundation Models
- The Rise of AI Engineering
  - From Language Models to Large Language Models
    - Language models
    - Self-supervision
  - From Large Language Models to Foundation Models
  - From Foundation Models to AI Engineering
- Foundation Model Use Cases
  - Coding
  - Image and Video Production
  - Writing
  - Education
  - Conversational Bots
  - Information Aggregation
  - Data Organization
  - Workflow Automation
- Planning AI Applications
  - Use Case Evaluation
    - The role of AI and humans in the application
    - AI product defensibility
  - Setting Expectations
  - Milestone Planning
  - Maintenance
- The AI Engineering Stack
  - Three Layers of the AI Stack
  - AI Engineering Versus ML Engineering
    - Model development
      - Modeling and training
      - Dataset engineering
      - Inference optimization
    - Application development
      - Evaluation
      - Prompt engineering and context construction
      - AI interface
  - AI Engineering Versus Full-Stack Engineering
- Summary
2. Understanding Foundation Models
- Training Data
  - Multilingual Models
  - Domain-Specific Models
- Modeling
  - Model Architecture
    - Transformer architecture
      - Attention mechanism
      - Transformer block
    - Other model architectures
  - Model Size
    - Scaling law: Building compute-optimal models
    - Scaling extrapolation
    - Scaling bottlenecks
- Post-Training
  - Supervised Finetuning
  - Preference Finetuning
    - Reward model
    - Finetuning using the reward model
- Sampling
  - Sampling Fundamentals
  - Sampling Strategies
    - Temperature
    - Top-k
    - Top-p
    - Stopping condition
  - Test Time Compute
  - Structured Outputs
    - Prompting
    - Post-processing
    - Constrained sampling
    - Finetuning
  - The Probabilistic Nature of AI
    - Inconsistency
    - Hallucination
- Summary
3. Evaluation Methodology
- Challenges of Evaluating Foundation Models
- Understanding Language Modeling Metrics
  - Entropy
  - Cross Entropy
  - Bits-per-Character and Bits-per-Byte
  - Perplexity
  - Perplexity Interpretation and Use Cases
- Exact Evaluation
  - Functional Correctness
  - Similarity Measurements Against Reference Data
    - Exact match
    - Lexical similarity
    - Semantic similarity
  - Introduction to Embedding
- AI as a Judge
  - Why AI as a Judge?
  - How to Use AI as a Judge
  - Limitations of AI as a Judge
    - Inconsistency
    - Criteria ambiguity
    - Increased costs and latency
    - Biases of AI as a judge
  - What Models Can Act as Judges?
- Ranking Models with Comparative Evaluation
  - Challenges of Comparative Evaluation
    - Scalability bottlenecks
    - Lack of standardization and quality control
    - From comparative performance to absolute performance
  - The Future of Comparative Evaluation
- Summary
4. Evaluate AI Systems
- Evaluation Criteria
  - Domain-Specific Capability
  - Generation Capability
    - Factual consistency
    - Safety
  - Instruction-Following Capability
    - Instruction-following criteria
    - Roleplaying
  - Cost and Latency
- Model Selection
  - Model Selection Workflow
  - Model Build Versus Buy
    - Open source, open weight, and model licenses
    - Open source models versus model APIs
      - Data privacy
      - Data lineage and copyright
      - Performance
      - Functionality
      - API cost versus engineering cost
      - Control, access, and transparency
      - On-device deployment
  - Navigate Public Benchmarks
    - Benchmark selection and aggregation
      - Public leaderboards
      - Custom leaderboards with public benchmarks
    - Data contamination with public benchmarks
      - How data contamination happens
      - Handling data contamination
- Design Your Evaluation Pipeline
  - Step 1. Evaluate All Components in a System
  - Step 2. Create an Evaluation Guideline
    - Define evaluation criteria
    - Create scoring rubrics with examples
    - Tie evaluation metrics to business metrics
  - Step 3. Define Evaluation Methods and Data
    - Select evaluation methods
    - Annotate evaluation data
    - Evaluate your evaluation pipeline
    - Iterate
- Summary
5. Prompt Engineering
- Introduction to Prompting
  - In-Context Learning: Zero-Shot and Few-Shot
  - System Prompt and User Prompt
  - Context Length and Context Efficiency
- Prompt Engineering Best Practices
  - Write Clear and Explicit Instructions
    - Explain, without ambiguity, what you want the model to do
    - Ask the model to adopt a persona
    - Provide examples
    - Specify the output format
  - Provide Sufficient Context
  - Break Complex Tasks into Simpler Subtasks
  - Give the Model Time to Think
  - Iterate on Your Prompts
  - Evaluate Prompt Engineering Tools
  - Organize and Version Prompts
- Defensive Prompt Engineering
  - Proprietary Prompts and Reverse Prompt Engineering
  - Jailbreaking and Prompt Injection
    - Direct manual prompt hacking
    - Automated attacks
    - Indirect prompt injection
  - Information Extraction
  - Defenses Against Prompt Attacks
    - Model-level defense
    - Prompt-level defense
    - System-level defense
- Summary
6. RAG and Agents
- RAG
  - RAG Architecture
  - Retrieval Algorithms
    - Term-based retrieval
    - Embedding-based retrieval
    - Comparing retrieval algorithms
    - Combining retrieval algorithms
  - Retrieval Optimization
    - Chunking strategy
    - Reranking
    - Query rewriting
    - Contextual retrieval
  - RAG Beyond Texts
    - Multimodal RAG
    - RAG with tabular data
- Agents
  - Agent Overview
  - Tools
    - Knowledge augmentation
    - Capability extension
    - Write actions
  - Planning
    - Planning overview
    - Foundation models as planners
    - Plan generation
      - Function calling
      - Planning granularity
      - Complex plans
    - Reflection and error correction
    - Tool selection
  - Agent Failure Modes and Evaluation
    - Planning failures
    - Tool failures
    - Efficiency
- Memory
- Summary
7. Finetuning
- Finetuning Overview
- When to Finetune
  - Reasons to Finetune
  - Reasons Not to Finetune
  - Finetuning and RAG
- Memory Bottlenecks
  - Backpropagation and Trainable Parameters
  - Memory Math
    - Memory needed for inference
    - Memory needed for training
  - Numerical Representations
  - Quantization
    - Inference quantization
    - Training quantization
- Finetuning Techniques
  - Parameter-Efficient Finetuning
    - PEFT techniques
    - LoRA
      - Why does LoRA work?
      - LoRA configurations
      - Serving LoRA adapters
      - Quantized LoRA
  - Model Merging and Multi-Task Finetuning
    - Summing
      - Linear combination
      - Spherical linear interpolation (SLERP)
      - Pruning redundant task-specific parameters
    - Layer stacking
    - Concatenation
  - Finetuning Tactics
    - Finetuning frameworks and base models
      - Base models
      - Finetuning methods
      - Finetuning frameworks
    - Finetuning hyperparameters
      - Learning rate
      - Batch size
      - Number of epochs
      - Prompt loss weight
- Summary
8. Dataset Engineering
- Data Curation
  - Data Quality
  - Data Coverage
  - Data Quantity
  - Data Acquisition and Annotation
- Data Augmentation and Synthesis
  - Why Data Synthesis
  - Traditional Data Synthesis Techniques
    - Rule-based data synthesis
    - Simulation
  - AI-Powered Data Synthesis
    - Instruction data synthesis
    - Data verification
    - Limitations to AI-generated data
      - Quality control
      - Superficial imitation
      - Potential model collapse
      - Obscure data lineage
  - Model Distillation
- Data Processing
  - Inspect Data
  - Deduplicate Data
  - Clean and Filter Data
  - Format Data
- Summary
9. Inference Optimization
- Understanding Inference Optimization
  - Inference Overview
    - Computational bottlenecks
    - Online and batch inference APIs
  - Inference Performance Metrics
    - Latency, TTFT, and TPOT
    - Throughput and goodput
    - Utilization, MFU, and MBU
  - AI Accelerators
    - Whats an accelerator?
    - Computational capabilities
    - Memory size and bandwidth
    - Power consumption
- Inference Optimization
  - Model Optimization
    - Model compression
    - Overcoming the autoregressive decoding bottleneck
      - Speculative decoding
      - Inference with reference
      - Parallel decoding
    - Attention mechanism optimization
      - Redesigning the attention mechanism
      - Optimizing the KV cache size
      - Writing kernels for attention computation
    - Kernels and compilers
  - Inference Service Optimization
    - Batching
    - Decoupling prefill and decode
    - Prompt caching
    - Parallelism
- Summary
10. AI Engineering Architecture and User Feedback
- AI Engineering Architecture
  - Step 1. Enhance Context
  - Step 2. Put in Guardrails
    - Input guardrails
    - Output guardrails
    - Guardrail implementation
  - Step 3. Add Model Router and Gateway
    - Router
    - Gateway
  - Step 4. Reduce Latency with Caches
    - Exact caching
    - Semantic caching
  - Step 5. Add Agent Patterns
  - Monitoring and Observability
    - Metrics
    - Logs and traces
    - Drift detection
  - AI Pipeline Orchestration
- User Feedback
  - Extracting Conversational Feedback
    - Natural language feedback
      - Early termination
      - Error correction
      - Complaints
      - Sentiment
    - Other conversational feedback
      - Regeneration
      - Conversation organization
      - Conversation length
      - Dialogue diversity
  - Feedback Design
    - When to collect feedback
      - In the beginning
      - When something bad happens
      - When the model has low confidence
    - How to collect feedback
  - Feedback Limitations
    - Biases
    - Degenerate feedback loop
- Summary
Epilogue
Index