Genomics in the Azure Cloud - Helion

ebook

Autor: Colby T. Ford
ISBN: 9781098139001
stron: 300, Format: ebook
Data wydania: 2022-11-14
Księgarnia: Helion

Cena książki: 194,65 zł (poprzednio: 226,34 zł)
Oszczędzasz: 14% (-31,69 zł)

Osoby, które kupiły tę książkę, wybierały także »

This practical guide bridges the gap between general cloud computing architecture in Microsoft Azure and scientific computing for bioinformatics and genomics. You'll get a solid understanding of the architecture patterns and services that are offered in Azure and how they might be used in your bioinformatics practice. You'll get code examples that you can reuse for your specific needs. And you'll get plenty of concrete examples to illustrate how a given service is used in a bioinformatics context.

You'll also get valuable advice on how to:

Use enterprise platform services to easily scale your bioinformatics workloads
Organize, query, and analyze genomic data at scale
Build a genomics data lake and accompanying data warehouse
Use Azure Machine Learning to scale your model training, track model performance, and deploy winning models
Orchestrate and automate processing pipelines using Azure Data Factory and Databricks
Cloudify your organization's existing bioinformatics pipelines by moving your workflows to Azure high-performance compute services
And more

Osoby które kupowały "Genomics in the Azure Cloud", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Biologika Sukcesji Pokoleniowej. Sezon 3. Konflikty na terytorium 126,36 zł, (13,90 zł -89%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Podręcznik startupu. Budowa wielkiej firmy krok po kroku 92,67 zł, (13,90 zł -85%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)

Spis treści

Genomics in the Azure Cloud eBook -- spis treści

Preface
- Who Should Read This Book
- How the Book Is Organized
- Software and Hardware Requirements
- Code Conventions and Downloads
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
1. Essentials of Cloud Architecture
- Cloud Horsepower
  - Considerations for the Cloud
    - I have to move everything to the cloud at once.
    - The cloud is always cheaper/more expensive.
    - Our IT security team can manage security better.
  - Three Benefits of the Cloud
    - Collaboration
    - Scalability
    - Automation
- Types of Cloud Services
  - Infrastructure Services
    - Example: Genomics Data Science Virtual Machine
  - Platform Services
    - Example: Azure database for PostgreSQL
  - Software Services
- Azure Environment Organization
- Getting an Azure Account
- Welcome to the Azure Portal
  - Setting Up a Resource Group
  - Creating Resources
  - Free Services
- Basics of the Bioinformatics Workflow
  - Primary Analysis
    - FASTA
    - FASTQ
  - Secondary Analysis
    - SAM (and BAM)
    - VCF
  - Tertiary Analysis
  - Other Analyses
  - Other File Formats
    - GEN (and BGEN)
    - GFF
    - PDB
2. Organizing Genomics Data with Data Lakes
- Organizing Your Genomics Data
  - Going for Bronze, Silver, and Gold
    - Bronze (raw)
    - Silver (staging/intermediate)
    - Gold (curated)
  - Letting Your Bioinformatics Workflow Dictate Your Data Lake Organization
  - Planning for -omics and Non-omics Data Together
    - Study, subject, and sample directories
- Creating a Data Lake with Azure Storage
  - Blob Storage Versus Data Lake Storage
    - About the hierarchical namespace
- Balancing Costs Versus Performance in Data Storage
  - The Goldilocks Method of Storage Tiers
    - Cost breakdown
  - Genomics Data Lifecycle
    - Using Azure Storage Explorer
    - Lifecycle rules
- Managing Access Inside the Lake
  - Role-Based Access Control
  - Access-Control Lists
- Azure Open Datasets for Genomics
3. Querying Variant Data in SQL
- Building a Genomics Data Warehouse
  - Example: Lab Results
  - Data Warehouse Architecture for Genomics
    - Variant data warehouse
- Azure Synapse Analytics
  - Creating an Azure Synapse Analytics Workspace
  - Registering Services in Subscriptions
  - Getting to Work in the Synapse Workspace
  - Using Open Row Sets
  - Creating External Tables
  - Did Someone Say Pool Party?
    - Serverless SQL pools
    - Dedicated SQL pools
    - Serverless Spark pools
    - Pool cost considerations
- Connecting to More Data Sources
- Azure SQL DB
  - Creating a Database in Azure SQL DB
    - Elastic pools
    - Provisioned and Serverless compute
- Relaxing at Your Genomics Data Lakehouse
  - Efficient File Formats
    - Parquet to the floor
    - ACIDity
    - Changing the tides with Delta
4. Orchestrating Data Movement and Transformation
- Creating Your Data Factory
- Getting Started with Data Movement
  - Getting Data into Your Data Lake Using the Copy Data Tool
  - Linking to NCBIs FTP Server
  - Transforming Data Using Data Flows
    - Parsing a VCF file with a data flow
  - Building and Triggering Pipelines for Automation
5. Azure Databricks (and Apache Spark)
- Introduction to Apache Spark and Databricks
- Setting Up an Azure Databricks Workspace
  - Connecting Databricks to Your Data Lake
- Processing Variant Data with the Glow Package
  - Exploring DataFrames
  - Filtering to Chromosome Coordinates
    - Count of variants by chromosome
- Automating Variant Data Processing
  - Orchestrating a Databricks Notebook from Data Factory
    - Access tokens
    - Creating the pipeline
  - A Brief Interlude About Distributed File Formats
    - Parquet
    - Delta
- Using Other Tools in Databricks
  - Single-Node Bioinformatics Tools
  - Koalas
    - Pandas on Spark
  - Hail
6. Azure Machine Learning
- How to Scale Machine Learning Tasks
- Creating an Azure Machine Learning Workspace
- Training a Drug Sensitivity Model
  - Creating a Compute Instance in Azure Machine Learning Studio
  - Datastores and Datasets
    - About the data
  - Experimenting with Cluster-Based Training
- Automating Model Training with AutoML
  - Explainable Machine Learning
- Using Azure Machine Learning Not for Machine Learning
  - Performing Alignment in a Notebook
  - Custom Docker Images for Bioinformatics
7. High-Performance Computing and Other Compute Services
- Bring Your Own Pipeline (BYOP)
  - Why Azure for HPC?
- Azure Batch
  - Scaling Workloads with Cromwell
    - Running your first workflow
- Azure CycleCloud
  - Setting Up CycleCloud Clusters
    - Creating a cluster for bioinformatics
- Microsoft Genomics
  - Alignment and Variant Calling with the msgen Package
8. Deployment, Security, Compliance, and Potpourri
- Automating the Deployment of Cloud Resources
  - Dev, Staging, and Prod
  - Lifting Your Deployment with ARMs and Biceps
    - Bicep
- Security Planning
  - Azure Active Directory
    - Roles and groups
  - Role-Based Access Controls and Access-Control Lists
- Compliance
  - HIPAA, HITECH, and HITRUST
    - GDPR
  - Azure Blueprints
- Cost Considerations
  - Azure Pricing Calculator
  - Retail Pricing Versus Enterprise Agreements
  - Budgeting Examples
- Quota Problems
  - Please, Sir, Can I Have Some More (vCPUs)?
- Getting General Support
Conclusion
- Looking Backward
  - Baby Azure
- What Else?
  - Using Other Web-Based Bioinformatics Platforms
- Looking Forward
  - Cheaper Sequencing = More Data
Index