Building Generative AI Services with FastAPI - Helion

ISBN: 9781098160265
stron: 530, Format: ebook
Data wydania: 2025-04-15
Księgarnia: Helion
Cena książki: 203,15 zł (poprzednio: 236,22 zł)
Oszczędzasz: 14% (-33,07 zł)
Ready to build production-grade applications with generative AI? This practical guide takes you through designing and deploying AI services using the FastAPI web framework. Learn how to integrate models that process text, images, audio, and video while seamlessly interacting with databases, filesystems, websites, and APIs. Whether you're a web developer, data scientist, or DevOps engineer, this book equips you with the tools to build scalable, real-time AI applications.
Author Alireza Parandeh provides clear explanations and hands-on examples covering authentication, concurrency, caching, and retrieval-augmented generation (RAG) with vector databases. You'll also explore best practices for testing AI outputs, optimizing performance, and securing microservices. With containerized deployment using Docker, you'll be ready to launch AI-powered applications confidently in the cloud.
Build generative AI services that interact with databases, filesystems, websites, and APIs
Manage concurrency in AI workloads and handle long-running tasks
Stream AI-generated outputs in real time via WebSocket and server-sent events
Secure services with authentication, content filtering, throttling, and rate limiting
Optimize AI performance with caching, batch processing, and fine-tuning techniques
Visit the Book's Website.
Osoby które kupowały "Building Generative AI Services with FastAPI", wybierały także:
- Cisco CCNA 200-301. Kurs video. Administrowanie bezpieczeństwem sieci. Część 3 665,00 zł, (39,90 zł -94%)
- Cisco CCNA 200-301. Kurs video. Administrowanie urządzeniami Cisco. Część 2 665,00 zł, (39,90 zł -94%)
- Cisco CCNA 200-301. Kurs video. Podstawy sieci komputerowych i konfiguracji. Część 1 665,00 zł, (39,90 zł -94%)
- Cisco CCNP Enterprise 350-401 ENCOR. Kurs video. Programowanie i automatyzacja sieci 443,33 zł, (39,90 zł -91%)
- CCNP Enterprise 350-401 ENCOR. Kurs video. Mechanizmy kierowania ruchem pakiet 443,33 zł, (39,90 zł -91%)
Spis treści
Building Generative AI Services with FastAPI. A Practical Approach to Developing Context-Rich Generative AI Applications eBook -- spis treści
- Foreword
- Preface
- Objective and Approach
- Prerequisites
- Book Structure
- How to Read This Book
- Hardware and Software Requirements
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- I. Developing AI Services
- 1. Introduction
- What Is Generative AI?
- Why Generative AI Services Will Power Future Applications
- Facilitating the Creative Process
- Suggesting Contextually Relevant Solutions
- Personalizing the User Experience
- Minimizing Delay in Resolving Customer Queries
- Acting as an Interface to Complex Systems
- Automating Manual Administrative Tasks
- Scaling and Democratizing Content Generation
- How to Build a Generative AI Service
- Why Build Generative AI Services with FastAPI?
- What Prevents the Adoption of Generative AI Services
- Overview of the Capstone Project
- Summary
- 2. Getting Started with FastAPI
- Introduction to FastAPI
- Setting Up Your Development Environment
- Installing Python, FastAPI, and Required Packages
- Creating a Simple FastAPI Web Server
- FastAPI Features and Advantages
- Inspired by Flask Routing Pattern
- Handling Asynchronous and Synchronous Operations
- Built-In Support for Background Tasks
- Custom Middleware and CORS Support
- Freedom to Customize Any Service Layer
- Data Validation and Serialization
- Rich Ecosystem of Plug-Ins
- Automatic Documentation
- Dependency Injection System
- Lifespan Events
- Security and Authentication Components
- Bidirectional Web Socket, GraphQL, and Custom Response Support
- Modern Python and IDE Integration with Sensible Defaults
- FastAPI Project Structures
- Flat Structure
- Nested Structure
- Modular Structure
- Progressive Reorganization of Your FastAPI Project
- Onion/Layered Application Design Pattern
- Comparing FastAPI to Other Python Web Frameworks
- FastAPI Limitations
- Inefficient Model Memory Management
- Limited Number of Threads
- Restricted to Global Interpreter Lock
- Lack of Support for Micro-Batch Processing Inference Requests
- Cannot Efficiently Split AI Workloads Between CPU and GPU
- Dependency Conflicts
- Lack of Support for Resource-Intensive AI Workloads
- Setting Up a Managed Python Environment and Tooling
- Summary
- 3. AI Integration and Model Serving
- Serving Generative Models
- Language Models
- Transformers versus recurrent neural networks
- Tokenization and embedding
- Training transformers
- Positional encoding
- Autoregressive prediction
- Integrating a language model into your application
- Connecting FastAPI with Streamlit UI generator
- Audio Models
- Vision Models
- Video Models
- OpenAI Sora
- 3D Models
- OpenAI Shap-E
- Language Models
- Strategies for Serving Generative AI Models
- Be Model Agnostic: Swap Models on Every Request
- Be Compute Efficient: Preload Models with the FastAPI Lifespan
- Be Lean: Serve Models Externally
- Cloud providers
- BentoML
- Model providers
- The Role of Middleware in Service Monitoring
- Summary
- Additional References
- Serving Generative Models
- 4. Implementing Type-Safe AI Services
- Introduction to Type Safety
- Implementing Type Safety
- Type Annotations
- Using Annotated
- Dataclasses
- Pydantic Models
- How to Use Pydantic
- Compound Pydantic Models
- Field Constraints and Validators
- Custom Field and Model Validators
- Computed Fields
- Model Export and Serialization
- Parsing Environment Variables with Pydantic
- Dataclasses or Pydantic Models in FastAPI
- Summary
- II. Communicating with External Systems
- 5. Achieving Concurrency in AI Workloads
- Optimizing GenAI Services for Multiple Users
- Optimizing for I/O Tasks with Asynchronous Programming
- Synchronous Versus Asynchronous (Async) Execution
- Async Programming with Model Provider APIs
- Event Loop and Thread Pool in FastAPI
- Blocking the Main Server
- Project: Talk to the Web (Web Scraper)
- Project: Talk to Documents (RAG)
- Optimizing Model Serving for Memory- and Compute-Bound AI Inference Tasks
- Compute-Bound Operations
- Externalizing Model Serving
- Request batching and continuous batching
- Paged attention
- Managing Long-Running AI Inference Tasks
- Summary
- Additional References
- 6. Real-Time Communication with Generative Models
- Web Communication Mechanisms
- Regular/Short Polling
- Long Polling
- Server-Sent Events
- WebSocket
- Comparing Communication Mechanisms
- Implementing SSE Endpoints
- SSE with GET Request
- Cross-origin resource sharing
- Streaming LLM outputs from Hugging Face models
- SSE with POST Request
- SSE with GET Request
- Implementing WS Endpoints
- Streaming LLM Outputs with WebSocket
- Handling WebSocket Exceptions
- Designing APIs for Streaming
- Summary
- Web Communication Mechanisms
- 7. Integrating Databases into AI Services
- The Role of a Database
- Database Systems
- Project: Storing User Conversations with an LLM in a Relational Database
- Defining ORM Models
- Creating a Database Engine and Session Management
- Implementing CRUD Endpoints
- Repository and Services Design Pattern
- Managing Database Schemas Changes
- Storing Data When Working with Real-Time Streams
- Summary
- III. Securing, Optimizing, Testing, and Deploying AI Services
- 8. Authentication and Authorization
- Authentication and Authorization
- Authentication Methods
- Basic Authentication
- JSON Web Tokens (JWT) Authentication
- What is JWT?
- Getting started with JWT authentication
- Hashing and salting
- Authentication flows
- Implementing OAuth Authentication
- OAuth Authentication with GitHub
- OAuth2 Flow Types
- Authorization code flow
- Implicit flow
- Client credentials flow
- Resource owner password credentials flow
- Device authorization flow
- Authorization
- Authorization Models
- Role-Based Access Control
- Relationship-Based Access Control
- Attribute-Based Access Control
- Hybrid Authorization Models
- Summary
- 9. Securing AI Services
- Usage Moderation and Abuse Protection
- Guardrails
- Input Guardrails
- Output Guardrails
- Guardrail Thresholds
- Implementing a Moderation Guardrail
- API Rate Limiting and Throttling
- Implementing Rate Limits in FastAPI
- User-based rate limits
- Rate limits across instances in production
- Limiting WebSocket connections
- Throttling Real-Time Streams
- Implementing Rate Limits in FastAPI
- Summary
- 10. Optimizing AI Services
- Optimization Techniques
- Batch Processing
- Caching
- Keyword caching
- Semantic caching
- Building a semantic caching service from scratch
- Semantic caching with GPT cache
- Similarity threshold
- Eviction policies
- Context/prompt caching
- Model Quantization
- Precision versus quality trade-off
- Floating-point numbers
- How to quantize pretrained LLMs
- Structured Outputs
- Prompt Engineering
- Prompt templates
- Advanced prompting techniques
- In-context learning
- Thought generation
- Decomposition
- Ensembling
- Self-criticism
- Agentic
- Fine-Tuning
- When should you consider fine-tuning?
- How to fine-tune a pretrained model
- Summary
- Optimization Techniques
- 11. Testing AI Services
- The Importance of Testing
- Software Testing
- Types of Tests
- The Biggest Challenge in Testing Software
- Planning Tests
- Test Dimensions
- Test Data
- Test Phases
- Test Environments
- Testing Strategies
- Challenges of Testing GenAI Services
- Variability of Outputs (Flakiness)
- Performance and Resource Constraints (Slow and Expensive)
- Regression
- Bias
- Adversarial Attacks
- Unbound Testing Coverage
- Project: Implementing Tests for a RAG System
- Unit Tests
- Installing and configuring pytest
- Fixtures and scope
- Parameterization
- Conftest module
- Setup and teardown
- Handling asynchronous tests
- Mocking and patching
- Fakes
- Dummies
- Stubs
- Spies
- Mocks
- Implementing test doubles with pytest-mock
- Integration Testing
- Context precision and recall
- Behavioral testing
- Minimum functionality tests (MFTs)
- Invariance tests (ITs)
- Directional expectation tests (DETs)
- Auto-evaluation tests
- End-to-End Testing
- Vertical E2E tests
- Horizontal E2E tests
- Unit Tests
- Summary
- 12. Deployment of AI Services
- Deployment Options
- Deploying to Virtual Machines
- Deploying to Serverless Functions
- Deploying to Managed App Platforms
- Deploying with Containers
- Containerization with Docker
- Docker Architecture
- Building Docker Images
- Container Registries
- Container Filesystem and Docker Layers
- Docker Storage
- Docker volumes
- Bind mounts
- Temporary mounts (tmpfs)
- Handling filesystem permissions
- Docker Networking
- Bridge network driver
- Configure user-defined bridge networks
- Embedded DNS
- Publishing ports
- Host network driver
- None network driver
- Bridge network driver
- Enabling GPU Driver
- Docker Compose
- Enabling GPU Access in Docker Compose
- Optimizing Docker Images
- Use minimal base image
- Avoid GPU inference runtimes
- Externalize application data
- Layer ordering and caching
- Layer ordering to avoid frequent cache invalidation
- Minimize layers
- Keep build context small
- Use cache and bind mounts
- Use external cache
- Multi-stage builds
- docker init
- Summary
- Deployment Options
- Afterword
- Index