Hands-On Generative AI with Transformers and Diffusion Models - Helion

ebook

Autor: Omar Sanseviero, Pedro Cuenca, Apolin
ISBN: 9781098149208
stron: 418, Format: ebook
Data wydania: 2024-11-22
Księgarnia: Helion

Cena książki: 228,65 zł (poprzednio: 265,87 zł)
Oszczędzasz: 14% (-37,22 zł)

Osoby, które kupiły tę książkę, wybierały także »

Learn to use generative AI techniques to create novel text, images, audio, and even music with this practical, hands-on book. Readers will understand how state-of-the-art generative models work, how to fine-tune and adapt them to their needs, and how to combine existing building blocks to create new models and creative applications in different domains.

This go-to book introduces theoretical concepts followed by guided practical applications, with extensive code samples and easy-to-understand illustrations. You'll learn how to use open source libraries to utilize transformers and diffusion models, conduct code exploration, and study several existing projects to help guide your work.

Build and customize models that can generate text and images
Explore trade-offs between using a pretrained model and fine-tuning your own model
Create and utilize models that can generate, edit, and modify images in any style
Customize transformers and diffusion models for multiple creative purposes
Train models that can reflect your own unique style

Osoby które kupowały "Hands-On Generative AI with Transformers and Diffusion Models", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
React.js i Node.js. Kurs video. Budowanie serwisu w oparciu o popularne biblioteki języka JavaScript 128,46 zł, (16,70 zł -87%)
Angular instalacja i działanie 76,15 zł, (9,90 zł -87%)
Instalacja i konfiguracja baz danych. Kurs video. Przygotowanie do egzaminu 70-765 Provisioning SQL Databases 285,00 zł, (39,90 zł -86%)

Spis treści

Hands-On Generative AI with Transformers and Diffusion Models eBook -- spis treści

Preface
- Who Should Read This Book
- Prerequisites
- What You Will Learn
- How to Read This Book
- Software and Hardware Requirements
- Conventions Used in This Book
- Using Code Examples
- How to Contact Us
- State of the Art: A Moving Target
- Acknowledgments
  - Jonathan
  - Apolinário
  - Pedro
  - Omar
I. Leveraging Open Models
1. An Introduction to Generative Media
- Generating Images
- Generating Text
- Generating Sound Clips
- Ethical and Societal Implications
- Where Weve Been and Where Things Stand
- How Are Generative AI Models Created?
- Summary
2. Transformers
- A Language Model in Action
  - Tokenizing Text
  - Predicting Probabilities
  - Generating Text
  - Zero-Shot Generalization
  - Few-Shot Generalization
- A Transformer Block
- Transformer Model Genealogy
  - Sequence-to-Sequence Tasks
  - Encoder-Only Models
- The Power of Pretraining
- Transformers Recap
  - Limitations
  - Beyond Text
- Project Time: Using LMs to Generate Text
- Summary
- Exercises
- Challenges
- References
3. Compressing and Representing Information
- AutoEncoders
  - Preparing the Data
  - Modeling the Encoder
  - Decoder
  - Training
  - Exploring the Latent Space
  - Visualizing the Latent Space
- Variational AutoEncoders
  - VAE Encoders and Decoders
  - Sampling from the Encoder Distribution
  - Training the VAE
  - VAEs for Generative Modeling
- CLIP
  - Contrastive Loss
  - Using CLIP, Step-by-Step
  - Zero-Shot Image Classification with CLIP
  - Zero-Shot Image-Classification Pipeline
  - CLIP Use Cases
- Alternatives to CLIP
- Project Time: Semantic Image Search
- Summary
- Exercises
- Challenges
- References
4. Diffusion Models
- The Key Insight: Iterative Refinement
- Training a Diffusion Model
  - The Data
  - Adding Noise
  - The UNet
  - Training
  - Sampling
  - Evaluation
- In Depth: Noise Schedules
  - Why Add Noise?
  - Starting Simple
  - The Math
  - Effect of Input Resolution and Scaling
- In Depth: UNets and Alternatives
  - A Simple UNet
  - Improving the UNet
  - Alternative Architectures
- In Depth: Diffusion Objectives
- Project Time: Train Your Diffusion Model
- Summary
- Exercises
- Challenges
- References
5. Stable Diffusion and Conditional Generation
- Adding Control: Conditional Diffusion Models
  - Preparing the Data
  - Creating a Class-Conditioned Model
  - Training the Model
  - Sampling
- Improving Efficiency: Latent Diffusion
- Stable Diffusion: Components in Depth
  - The Text Encoder
  - The Variational AutoEncoder
  - The UNet
  - Stable Diffusion XL
  - FLUX, SD3, and Video
  - Classifier-Free Guidance
- Putting It All Together: Annotated Sampling Loop
- Open Data, Open Models
  - Challenges and the Sunset of LAION-5B
  - Alternatives
  - Fair and Commercial Use
- Project Time: Build an Interactive ML Demo with Gradio
- Summary
- Exercises
- Challenge
- References
II. Transfer Learning for Generative Models
6. Fine-Tuning Language Models
- Classifying Text
  - Identify a Dataset
  - Define Which Model Type to Use
  - Select a Good Base Model
  - Preprocess the Dataset
  - Define Evaluation Metrics
  - Train the Model
  - Still Relevant?
- Generating Text
  - Picking the Right Generative Model
  - Training a Generative Model
- Instructions
- A Quick Introduction to Adapters
- A Light Introduction to Quantization
- Putting It All Together
- A Deeper Dive into Evaluation
- Project Time: Retrieval-Augmented Generation
- Summary
- Exercises
- Challenge
- References
7. Fine-Tuning Stable Diffusion
- Full Stable Diffusion Fine-Tuning
  - Preparing the Dataset
  - Fine-Tuning the Model
  - Inference
- DreamBooth
  - Preparing the Dataset
  - Prior Preservation
  - DreamBoothing the Model
  - Inference
- Training LoRAs
- Giving Stable Diffusion New Capabilities
  - Inpainting
  - Additional Inputs for Special Conditionings
- Project Time: Train an SDXL DreamBooth LoRA by Yourself
- Summary
- Exercises
- Challenge
- References
III. Going Further
8. Creative Applications of Text-to-Image Models
- Image to Image
- Inpainting
- Prompt Weighting and Image Editing
  - Prompt Weighting and Merging
  - Editing Diffusion Images with Semantic Guidance
- Real Image Editing via Inversion
  - Editing with LEDITS++
  - Real Image Editing via Instruction Fine-Tuning
- ControlNet
- Image Prompting and Image Variations
  - Image Variations
  - Image Prompting
    - Style transfer
    - Additional controls
- Project Time: Your Creative Canvas
- Summary
- Exercises
- References
9. Generating Audio
- Audio Data
  - Waveforms
  - Spectrograms
- Speech to Text with Transformer-Based Architectures
  - Encoder-Based Techniques
  - Encoder-Decoder Techniques
  - From Model to Pipeline
  - Evaluation
- From Text to Speech to Generative Audio
  - Generating Audio with Sequence-to-Sequence Models
  - Going Beyond Speech with Bark
  - AudioLM and MusicLM
  - AudioGen and MusicGen
  - Audio Diffusion and Riffusion
  - Dance Diffusion
  - More on Diffusion Models for Generative Audio
- Evaluating Audio-Generation Systems
- Whats Next?
- Project Time: End-to-End Conversational System
- Summary
- Exercises
- Challenges
- References
10. Rapidly Advancing Areas in Generative AI
- Preference Optimization
- Long Contexts
- Mixture of Experts
- Optimizations and Quantizations
- Data
- One Model to Rule Them All
- Computer Vision
- 3D Computer Vision
- Video Generation
- Multimodality
- Community
A. Open Source Tools
- The Hugging Face Stack
- Data
- Wrappers
- Local Inference
- Deployment Tools
B. LLM Memory Requirements
- Inference Memory Requirements
- Training Memory Requirements
- Further Reading
C. End-to-End Retrieval-Augmented Generation
- Processing the Data
- Embedding the Documents
- Retrieval
- Generation
- Production-Level RAG
Index