Building Generative AI Services with FastAPI - Helion

ebook

Autor: Alireza Parandeh
ISBN: 9781098160265
stron: 530, Format: ebook
Data wydania: 2025-04-15
Księgarnia: Helion

Cena książki: 194,65 zł (poprzednio: 226,34 zł)
Oszczędzasz: 14% (-31,69 zł)

Osoby, które kupiły tę książkę, wybierały także »

Ready to build production-grade applications with generative AI? This practical guide takes you through designing and deploying AI services using the FastAPI web framework. Learn how to integrate models that process text, images, audio, and video while seamlessly interacting with databases, filesystems, websites, and APIs. Whether you're a web developer, data scientist, or DevOps engineer, this book equips you with the tools to build scalable, real-time AI applications.

Author Alireza Parandeh provides clear explanations and hands-on examples covering authentication, concurrency, caching, and retrieval-augmented generation (RAG) with vector databases. You'll also explore best practices for testing AI outputs, optimizing performance, and securing microservices. With containerized deployment using Docker, you'll be ready to launch AI-powered applications confidently in the cloud.

Build generative AI services that interact with databases, filesystems, websites, and APIs
Manage concurrency in AI workloads and handle long-running tasks
Stream AI-generated outputs in real time via WebSocket and server-sent events
Secure services with authentication, content filtering, throttling, and rate limiting
Optimize AI performance with caching, batch processing, and fine-tuning techniques

Visit the Book's Website.

Osoby które kupowały "Building Generative AI Services with FastAPI", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Biologika Sukcesji Pokoleniowej. Sezon 3. Konflikty na terytorium 126,36 zł, (13,90 zł -89%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Podręcznik startupu. Budowa wielkiej firmy krok po kroku 92,67 zł, (13,90 zł -85%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)

Spis treści

Building Generative AI Services with FastAPI. A Practical Approach to Developing Context-Rich Generative AI Applications eBook -- spis treści

Foreword
Preface
- Objective and Approach
- Prerequisites
- Book Structure
- How to Read This Book
- Hardware and Software Requirements
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
I. Developing AI Services
1. Introduction
- What Is Generative AI?
- Why Generative AI Services Will Power Future Applications
  - Facilitating the Creative Process
  - Suggesting Contextually Relevant Solutions
  - Personalizing the User Experience
  - Minimizing Delay in Resolving Customer Queries
  - Acting as an Interface to Complex Systems
  - Automating Manual Administrative Tasks
  - Scaling and Democratizing Content Generation
- How to Build a Generative AI Service
- Why Build Generative AI Services with FastAPI?
- What Prevents the Adoption of Generative AI Services
- Overview of the Capstone Project
- Summary
2. Getting Started with FastAPI
- Introduction to FastAPI
- Setting Up Your Development Environment
  - Installing Python, FastAPI, and Required Packages
  - Creating a Simple FastAPI Web Server
- FastAPI Features and Advantages
  - Inspired by Flask Routing Pattern
  - Handling Asynchronous and Synchronous Operations
  - Built-In Support for Background Tasks
  - Custom Middleware and CORS Support
  - Freedom to Customize Any Service Layer
  - Data Validation and Serialization
  - Rich Ecosystem of Plug-Ins
  - Automatic Documentation
  - Dependency Injection System
  - Lifespan Events
  - Security and Authentication Components
  - Bidirectional Web Socket, GraphQL, and Custom Response Support
  - Modern Python and IDE Integration with Sensible Defaults
- FastAPI Project Structures
  - Flat Structure
  - Nested Structure
  - Modular Structure
  - Progressive Reorganization of Your FastAPI Project
- Onion/Layered Application Design Pattern
- Comparing FastAPI to Other Python Web Frameworks
- FastAPI Limitations
  - Inefficient Model Memory Management
  - Limited Number of Threads
  - Restricted to Global Interpreter Lock
  - Lack of Support for Micro-Batch Processing Inference Requests
  - Cannot Efficiently Split AI Workloads Between CPU and GPU
  - Dependency Conflicts
  - Lack of Support for Resource-Intensive AI Workloads
- Setting Up a Managed Python Environment and Tooling
- Summary
3. AI Integration and Model Serving
- Serving Generative Models
  - Language Models
    - Transformers versus recurrent neural networks
    - Tokenization and embedding
    - Training transformers
    - Positional encoding
    - Autoregressive prediction
    - Integrating a language model into your application
    - Connecting FastAPI with Streamlit UI generator
  - Audio Models
  - Vision Models
  - Video Models
    - OpenAI Sora
  - 3D Models
    - OpenAI Shap-E
- Strategies for Serving Generative AI Models
  - Be Model Agnostic: Swap Models on Every Request
  - Be Compute Efficient: Preload Models with the FastAPI Lifespan
  - Be Lean: Serve Models Externally
    - Cloud providers
    - BentoML
    - Model providers
- The Role of Middleware in Service Monitoring
- Summary
- Additional References
4. Implementing Type-Safe AI Services
- Introduction to Type Safety
- Implementing Type Safety
  - Type Annotations
  - Using Annotated
  - Dataclasses
- Pydantic Models
  - How to Use Pydantic
  - Compound Pydantic Models
  - Field Constraints and Validators
  - Custom Field and Model Validators
  - Computed Fields
  - Model Export and Serialization
  - Parsing Environment Variables with Pydantic
  - Dataclasses or Pydantic Models in FastAPI
- Summary
II. Communicating with External Systems
5. Achieving Concurrency in AI Workloads
- Optimizing GenAI Services for Multiple Users
- Optimizing for I/O Tasks with Asynchronous Programming
  - Synchronous Versus Asynchronous (Async) Execution
  - Async Programming with Model Provider APIs
  - Event Loop and Thread Pool in FastAPI
  - Blocking the Main Server
  - Project: Talk to the Web (Web Scraper)
  - Project: Talk to Documents (RAG)
- Optimizing Model Serving for Memory- and Compute-Bound AI Inference Tasks
  - Compute-Bound Operations
  - Externalizing Model Serving
    - Request batching and continuous batching
    - Paged attention
- Managing Long-Running AI Inference Tasks
- Summary
- Additional References
6. Real-Time Communication with Generative Models
- Web Communication Mechanisms
  - Regular/Short Polling
  - Long Polling
  - Server-Sent Events
  - WebSocket
  - Comparing Communication Mechanisms
- Implementing SSE Endpoints
  - SSE with GET Request
    - Cross-origin resource sharing
    - Streaming LLM outputs from Hugging Face models
  - SSE with POST Request
- Implementing WS Endpoints
  - Streaming LLM Outputs with WebSocket
  - Handling WebSocket Exceptions
  - Designing APIs for Streaming
- Summary
7. Integrating Databases into AI Services
- The Role of a Database
- Database Systems
- Project: Storing User Conversations with an LLM in a Relational Database
  - Defining ORM Models
  - Creating a Database Engine and Session Management
  - Implementing CRUD Endpoints
  - Repository and Services Design Pattern
- Managing Database Schemas Changes
- Storing Data When Working with Real-Time Streams
- Summary
III. Securing, Optimizing, Testing, and Deploying AI Services
8. Authentication and Authorization
- Authentication and Authorization
- Authentication Methods
  - Basic Authentication
  - JSON Web Tokens (JWT) Authentication
    - What is JWT?
    - Getting started with JWT authentication
    - Hashing and salting
    - Authentication flows
- Implementing OAuth Authentication
  - OAuth Authentication with GitHub
  - OAuth2 Flow Types
    - Authorization code flow
    - Implicit flow
    - Client credentials flow
    - Resource owner password credentials flow
    - Device authorization flow
- Authorization
  - Authorization Models
  - Role-Based Access Control
  - Relationship-Based Access Control
  - Attribute-Based Access Control
  - Hybrid Authorization Models
- Summary
9. Securing AI Services
- Usage Moderation and Abuse Protection
- Guardrails
  - Input Guardrails
  - Output Guardrails
  - Guardrail Thresholds
  - Implementing a Moderation Guardrail
- API Rate Limiting and Throttling
  - Implementing Rate Limits in FastAPI
    - User-based rate limits
    - Rate limits across instances in production
    - Limiting WebSocket connections
  - Throttling Real-Time Streams
- Summary
10. Optimizing AI Services
- Optimization Techniques
  - Batch Processing
  - Caching
    - Keyword caching
    - Semantic caching
      - Building a semantic caching service from scratch
      - Semantic caching with GPT cache
      - Similarity threshold
      - Eviction policies
    - Context/prompt caching
  - Model Quantization
    - Precision versus quality trade-off
    - Floating-point numbers
    - How to quantize pretrained LLMs
  - Structured Outputs
  - Prompt Engineering
    - Prompt templates
    - Advanced prompting techniques
      - In-context learning
      - Thought generation
      - Decomposition
      - Ensembling
      - Self-criticism
      - Agentic
  - Fine-Tuning
    - When should you consider fine-tuning?
    - How to fine-tune a pretrained model
- Summary
11. Testing AI Services
- The Importance of Testing
- Software Testing
  - Types of Tests
  - The Biggest Challenge in Testing Software
  - Planning Tests
  - Test Dimensions
  - Test Data
  - Test Phases
  - Test Environments
  - Testing Strategies
- Challenges of Testing GenAI Services
  - Variability of Outputs (Flakiness)
  - Performance and Resource Constraints (Slow and Expensive)
  - Regression
  - Bias
  - Adversarial Attacks
  - Unbound Testing Coverage
- Project: Implementing Tests for a RAG System
  - Unit Tests
    - Installing and configuring pytest
    - Fixtures and scope
    - Parameterization
    - Conftest module
    - Setup and teardown
    - Handling asynchronous tests
    - Mocking and patching
      - Fakes
      - Dummies
      - Stubs
      - Spies
      - Mocks
      - Implementing test doubles with pytest-mock
  - Integration Testing
    - Context precision and recall
    - Behavioral testing
      - Minimum functionality tests (MFTs)
      - Invariance tests (ITs)
      - Directional expectation tests (DETs)
      - Auto-evaluation tests
  - End-to-End Testing
    - Vertical E2E tests
    - Horizontal E2E tests
- Summary
12. Deployment of AI Services
- Deployment Options
  - Deploying to Virtual Machines
  - Deploying to Serverless Functions
  - Deploying to Managed App Platforms
  - Deploying with Containers
- Containerization with Docker
  - Docker Architecture
  - Building Docker Images
  - Container Registries
  - Container Filesystem and Docker Layers
  - Docker Storage
    - Docker volumes
    - Bind mounts
    - Temporary mounts (tmpfs)
    - Handling filesystem permissions
  - Docker Networking
    - Bridge network driver
      - Configure user-defined bridge networks
      - Embedded DNS
      - Publishing ports
    - Host network driver
    - None network driver
  - Enabling GPU Driver
  - Docker Compose
  - Enabling GPU Access in Docker Compose
  - Optimizing Docker Images
    - Use minimal base image
    - Avoid GPU inference runtimes
    - Externalize application data
    - Layer ordering and caching
      - Layer ordering to avoid frequent cache invalidation
      - Minimize layers
      - Keep build context small
      - Use cache and bind mounts
      - Use external cache
    - Multi-stage builds
  - docker init
- Summary
Afterword
Index