Genomics in the Azure Cloud - Helion
ISBN: 9781098139001
stron: 300, Format: ebook
Data wydania: 2022-11-14
Księgarnia: Helion
Cena książki: 228,65 zł (poprzednio: 265,87 zł)
Oszczędzasz: 14% (-37,22 zł)
This practical guide bridges the gap between general cloud computing architecture in Microsoft Azure and scientific computing for bioinformatics and genomics. You'll get a solid understanding of the architecture patterns and services that are offered in Azure and how they might be used in your bioinformatics practice. You'll get code examples that you can reuse for your specific needs. And you'll get plenty of concrete examples to illustrate how a given service is used in a bioinformatics context.
You'll also get valuable advice on how to:
- Use enterprise platform services to easily scale your bioinformatics workloads
- Organize, query, and analyze genomic data at scale
- Build a genomics data lake and accompanying data warehouse
- Use Azure Machine Learning to scale your model training, track model performance, and deploy winning models
- Orchestrate and automate processing pipelines using Azure Data Factory and Databricks
- Cloudify your organization's existing bioinformatics pipelines by moving your workflows to Azure high-performance compute services
- And more
Osoby które kupowały "Genomics in the Azure Cloud", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Genomics in the Azure Cloud eBook -- spis treści
- Preface
- Who Should Read This Book
- How the Book Is Organized
- Software and Hardware Requirements
- Code Conventions and Downloads
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- 1. Essentials of Cloud Architecture
- Cloud Horsepower
- Considerations for the Cloud
- I have to move everything to the cloud at once.
- The cloud is always cheaper/more expensive.
- Our IT security team can manage security better.
- Three Benefits of the Cloud
- Collaboration
- Scalability
- Automation
- Considerations for the Cloud
- Types of Cloud Services
- Infrastructure Services
- Example: Genomics Data Science Virtual Machine
- Platform Services
- Example: Azure database for PostgreSQL
- Software Services
- Infrastructure Services
- Azure Environment Organization
- Getting an Azure Account
- Welcome to the Azure Portal
- Setting Up a Resource Group
- Creating Resources
- Free Services
- Basics of the Bioinformatics Workflow
- Primary Analysis
- FASTA
- FASTQ
- Secondary Analysis
- SAM (and BAM)
- VCF
- Tertiary Analysis
- Other Analyses
- Other File Formats
- GEN (and BGEN)
- GFF
- PDB
- Primary Analysis
- Cloud Horsepower
- 2. Organizing Genomics Data with Data Lakes
- Organizing Your Genomics Data
- Going for Bronze, Silver, and Gold
- Bronze (raw)
- Silver (staging/intermediate)
- Gold (curated)
- Letting Your Bioinformatics Workflow Dictate Your Data Lake Organization
- Planning for -omics and Non-omics Data Together
- Study, subject, and sample directories
- Going for Bronze, Silver, and Gold
- Creating a Data Lake with Azure Storage
- Blob Storage Versus Data Lake Storage
- About the hierarchical namespace
- Blob Storage Versus Data Lake Storage
- Balancing Costs Versus Performance in Data Storage
- The Goldilocks Method of Storage Tiers
- Cost breakdown
- Genomics Data Lifecycle
- Using Azure Storage Explorer
- Lifecycle rules
- The Goldilocks Method of Storage Tiers
- Managing Access Inside the Lake
- Role-Based Access Control
- Access-Control Lists
- Azure Open Datasets for Genomics
- Organizing Your Genomics Data
- 3. Querying Variant Data in SQL
- Building a Genomics Data Warehouse
- Example: Lab Results
- Data Warehouse Architecture for Genomics
- Variant data warehouse
- Azure Synapse Analytics
- Creating an Azure Synapse Analytics Workspace
- Registering Services in Subscriptions
- Getting to Work in the Synapse Workspace
- Using Open Row Sets
- Creating External Tables
- Did Someone Say Pool Party?
- Serverless SQL pools
- Dedicated SQL pools
- Serverless Spark pools
- Pool cost considerations
- Connecting to More Data Sources
- Azure SQL DB
- Creating a Database in Azure SQL DB
- Elastic pools
- Provisioned and Serverless compute
- Creating a Database in Azure SQL DB
- Relaxing at Your Genomics Data Lakehouse
- Efficient File Formats
- Parquet to the floor
- ACIDity
- Changing the tides with Delta
- Efficient File Formats
- Building a Genomics Data Warehouse
- 4. Orchestrating Data Movement and Transformation
- Creating Your Data Factory
- Getting Started with Data Movement
- Getting Data into Your Data Lake Using the Copy Data Tool
- Linking to NCBIs FTP Server
- Transforming Data Using Data Flows
- Parsing a VCF file with a data flow
- Building and Triggering Pipelines for Automation
- 5. Azure Databricks (and Apache Spark)
- Introduction to Apache Spark and Databricks
- Setting Up an Azure Databricks Workspace
- Connecting Databricks to Your Data Lake
- Processing Variant Data with the Glow Package
- Exploring DataFrames
- Filtering to Chromosome Coordinates
- Count of variants by chromosome
- Automating Variant Data Processing
- Orchestrating a Databricks Notebook from Data Factory
- Access tokens
- Creating the pipeline
- A Brief Interlude About Distributed File Formats
- Parquet
- Delta
- Orchestrating a Databricks Notebook from Data Factory
- Using Other Tools in Databricks
- Single-Node Bioinformatics Tools
- Koalas
- Pandas on Spark
- Hail
- 6. Azure Machine Learning
- How to Scale Machine Learning Tasks
- Creating an Azure Machine Learning Workspace
- Training a Drug Sensitivity Model
- Creating a Compute Instance in Azure Machine Learning Studio
- Datastores and Datasets
- About the data
- Experimenting with Cluster-Based Training
- Automating Model Training with AutoML
- Explainable Machine Learning
- Using Azure Machine Learning Not for Machine Learning
- Performing Alignment in a Notebook
- Custom Docker Images for Bioinformatics
- 7. High-Performance Computing and Other Compute Services
- Bring Your Own Pipeline (BYOP)
- Why Azure for HPC?
- Azure Batch
- Scaling Workloads with Cromwell
- Running your first workflow
- Scaling Workloads with Cromwell
- Azure CycleCloud
- Setting Up CycleCloud Clusters
- Creating a cluster for bioinformatics
- Setting Up CycleCloud Clusters
- Microsoft Genomics
- Alignment and Variant Calling with the msgen Package
- Bring Your Own Pipeline (BYOP)
- 8. Deployment, Security, Compliance, and Potpourri
- Automating the Deployment of Cloud Resources
- Dev, Staging, and Prod
- Lifting Your Deployment with ARMs and Biceps
- Bicep
- Security Planning
- Azure Active Directory
- Roles and groups
- Role-Based Access Controls and Access-Control Lists
- Azure Active Directory
- Compliance
- HIPAA, HITECH, and HITRUST
- GDPR
- Azure Blueprints
- HIPAA, HITECH, and HITRUST
- Cost Considerations
- Azure Pricing Calculator
- Retail Pricing Versus Enterprise Agreements
- Budgeting Examples
- Quota Problems
- Please, Sir, Can I Have Some More (vCPUs)?
- Getting General Support
- Automating the Deployment of Cloud Resources
- Conclusion
- Looking Backward
- Baby Azure
- What Else?
- Using Other Web-Based Bioinformatics Platforms
- Looking Forward
- Cheaper Sequencing = More Data
- Looking Backward
- Index