Mastering Python for Bioinformatics - Helion

ebook

Autor: Ken Youens-Clark
ISBN: 9781098100834
stron: 456, Format: ebook
Data wydania: 2021-05-05
Księgarnia: Helion

Cena książki: 271,15 zł (poprzednio: 319,00 zł)
Oszczędzasz: 15% (-47,85 zł)

Osoby, które kupiły tę książkę, wybierały także »

Life scientists today urgently need training in bioinformatics skills. Too many bioinformatics programs are poorly written and barely maintained--usually by students and researchers who've never learned basic programming skills. This practical guide shows postdoc bioinformatics professionals and students how to exploit the best parts of Python to solve problems in biology while creating documented, tested, reproducible software.

Ken Youens-Clark, author of Tiny Python Projects (Manning), demonstrates not only how to write effective Python code but also how to use tests to write and refactor scientific programs. You'll learn the latest Python features and toolsâ??including linters, formatters, type checkers, and testsâ??to create documented and tested programs. You'll also tackle 14 challenges in Rosalind, a problem-solving platform for learning bioinformatics and programming.

Create command-line Python programs to document and validate parameters
Write tests to verify refactor programs and confirm they're correct
Address bioinformatics ideas using Python data structures and modules such as Biopython
Create reproducible shortcuts and workflows using makefiles
Parse essential bioinformatics file formats such as FASTA and FASTQ
Find patterns of text using regular expressions
Use higher-order functions in Python like filter(), map(), and reduce()

Osoby które kupowały "Mastering Python for Bioinformatics", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
Efekt piaskownicy. Jak szefować żeby roboty nie zabrały ci roboty 59,50 zł, (11,90 zł -80%)
Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)

Spis treści

Mastering Python for Bioinformatics eBook -- spis treści

Preface
- Who Should Read This?
- Programming Style: Why I Avoid OOP and Exceptions
- Structure
- Test-Driven Development
- Using the Command Line and Installing Python
- Getting the Code and Tests
- Installing Modules
- Installing the new.py Program
- Why Did I Write This Book?
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
I. The Rosalind.info Challenges
1. Tetranucleotide Frequency: Counting Things
- Getting Started
  - Creating the Program Using new.py
  - Using argparse
  - Tools for Finding Errors in the Code
  - Introducing Named Tuples
  - Adding Types to Named Tuples
  - Representing the Arguments with a NamedTuple
  - Reading Input from the Command Line or a File
  - Testing Your Program
  - Running the Program to Test the Output
- Solution 1: Iterating and Counting the Characters in a String
  - Counting the Nucleotides
  - Writing and Verifying a Solution
- Additional Solutions
  - Solution 2: Creating a count() Function and Adding a Unit Test
  - Solution 3: Using str.count()
  - Solution 4: Using a Dictionary to Count All the Characters
  - Solution 5: Counting Only the Desired Bases
  - Solution 6: Using collections.defaultdict()
  - Solution 7: Using collections.Counter()
- Going Further
- Review
2. Transcribing DNA into mRNA: Mutating Strings, Reading and Writing Files
- Getting Started
  - Defining the Programs Parameters
  - Defining an Optional Parameter
  - Defining One or More Required Positional Parameters
  - Using nargs to Define the Number of Arguments
  - Using argparse.FileType() to Validate File Arguments
  - Defining the Args Class
  - Outlining the Program Using Pseudocode
  - Iterating the Input Files
  - Creating the Output Filenames
  - Opening the Output Files
  - Writing the Output Sequences
  - Printing the Status Report
  - Using the Test Suite
- Solutions
  - Solution 1: Using str.replace()
  - Solution 2: Using re.sub()
- Benchmarking
- Going Further
- Review
3. Reverse Complement of DNA: String Manipulation
- Getting Started
  - Iterating Over a Reversed String
  - Creating a Decision Tree
  - Refactoring
- Solutions
  - Solution 1: Using a for Loop and Decision Tree
  - Solution 2: Using a Dictionary Lookup
  - Solution 3: Using a List Comprehension
  - Solution 4: Using str.translate()
  - Solution 5: Using Bio.Seq
- Review
4. Creating the Fibonacci Sequence: Writing, Testing, and Benchmarking Algorithms
- Getting Started
  - An Imperative Approach
- Solutions
  - Solution 1: An Imperative Solution Using a List as a Stack
  - Solution 2: Creating a Generator Function
  - Solution 3: Using Recursion and Memoization
- Benchmarking the Solutions
- Testing the Good, the Bad, and the Ugly
- Running the Test Suite on All the Solutions
- Going Further
- Review
5. Computing GC Content: Parsing FASTA and Analyzing Sequences
- Getting Started
  - Get Parsing FASTA Using Biopython
  - Iterating the Sequences Using a for Loop
- Solutions
  - Solution 1: Using a List
  - Solution 2: Type Annotations and Unit Tests
  - Solution 3: Keeping a Running Max Variable
  - Solution 4: Using a List Comprehension with a Guard
  - Solution 5: Using the filter() Function
  - Solution 6: Using the map() Function and Summing Booleans
  - Solution 7: Using Regular Expressions to Find Patterns
  - Solution 8: A More Complex find_gc() Function
- Benchmarking
- Going Further
- Review
6. Finding the Hamming Distance: Counting Point Mutations
- Getting Started
  - Iterating the Characters of Two Strings
- Solutions
  - Solution 1: Iterating and Counting
  - Solution 2: Creating a Unit Test
  - Solution 3: Using the zip() Function
  - Solution 4: Using the zip_longest() Function
  - Solution 5: Using a List Comprehension
  - Solution 6: Using the filter() Function
  - Solution 7: Using the map() Function with zip_longest()
  - Solution 8: Using the starmap() and operator.ne() Functions
- Going Further
- Review
7. Translating mRNA into Protein: More Functional Programming
- Getting Started
  - K-mers and Codons
  - Translating Codons
- Solutions
  - Solution 1: Using a for Loop
  - Solution 2: Adding Unit Tests
  - Solution 3: Another Function and a List Comprehension
  - Solution 4: Functional Programming with the map(), partial(), and takewhile() Functions
  - Solution 5: Using Bio.Seq.translate()
- Benchmarking
- Going Further
- Review
8. Find a Motif in DNA: Exploring Sequence Similarity
- Getting Started
  - Finding Subsequences
- Solutions
  - Solution 1: Using the str.find() Method
  - Solution 2: Using the str.index() Method
  - Solution 3: A Purely Functional Approach
  - Solution 4: Using K-mers
  - Solution 5: Finding Overlapping Patterns Using Regular Expressions
- Benchmarking
- Going Further
- Review
9. Overlap Graphs: Sequence Assembly Using Shared K-mers
- Getting Started
  - Managing Runtime Messages with STDOUT, STDERR, and Logging
  - Finding Overlaps
  - Grouping Sequences by the Overlap
- Solutions
  - Solution 1: Using Set Intersections to Find Overlaps
  - Solution 2: Using a Graph to Find All Paths
- Going Further
- Review
10. Finding the Longest Shared Subsequence: Finding K-mers, Writing Functions, and Using Binary Search
- Getting Started
  - Finding the Shortest Sequence in a FASTA File
  - Extracting K-mers from a Sequence
- Solutions
  - Solution 1: Counting Frequencies of K-mers
  - Solution 2: Speeding Things Up with a Binary Search
- Going Further
- Review
11. Finding a Protein Motif: Fetching Data and Using Regular Expressions
- Getting Started
  - Downloading Sequences Files on the Command Line
  - Downloading Sequences Files with Python
  - Writing a Regular Expression to Find the Motif
- Solutions
  - Solution 1: Using a Regular Expression
  - Solution 2: Writing a Manual Solution
- Going Further
- Review
12. Inferring mRNA from Protein: Products and Reductions of Lists
- Getting Started
  - Creating the Product of Lists
  - Avoiding Overflow with Modular Multiplication
- Solutions
  - Solution 1: Using a Dictionary for the RNA Codon Table
  - Solution 2: Turn the Beat Around
  - Solution 3: Encoding the Minimal Information
- Going Further
- Review
13. Location Restriction Sites: Using, Testing, and Sharing Code
- Getting Started
  - Finding All Subsequences Using K-mers
  - Finding All Reverse Complements
  - Putting It All Together
- Solutions
  - Solution 1: Using the zip() and enumerate() Functions
  - Solution 2: Using the operator.eq() Function
  - Solution 3: Writing a revp() Function
- Testing the Program
- Going Further
- Review
14. Finding Open Reading Frames
- Getting Started
  - Translating Proteins Inside Each Frame
  - Finding the ORFs in a Protein Sequence
- Solutions
  - Solution 1: Using the str.index() Function
  - Solution 2: Using the str.partition() Function
  - Solution 3: Using a Regular Expression
- Going Further
- Review
II. Other Programs
15. Seqmagique: Creating and Formatting Reports
- Using Seqmagick to Analyze Sequence Files
- Checking Files Using MD5 Hashes
- Getting Started
  - Formatting Text Tables Using tabulate()
- Solutions
  - Solution 1: Formatting with tabulate()
  - Solution 2: Formatting with rich
- Going Further
- Review
16. FASTX grep: Creating a Utility Program to Select Sequences
- Finding Lines in a File Using grep
- The Structure of a FASTQ Record
- Getting Started
  - Guessing the File Format
- Solution
- Going Further
- Review
17. DNA Synthesizer: Creating Synthetic Data with Markov Chains
- Understanding Markov Chains
- Getting Started
  - Understanding Random Seeds
  - Reading the Training Files
  - Generating the Sequences
  - Structuring the Program
- Solution
- Going Further
- Review
18. FASTX Sampler: Randomly Subsampling Sequence Files
- Getting Started
  - Reviewing the Program Parameters
  - Defining the Parameters
  - Nondeterministic Sampling
  - Structuring the Program
- Solutions
  - Solution 1: Reading Regular Files
  - Solution 2: Reading a Large Number of Compressed Files
- Going Further
- Review
19. Blastomatic: Parsing Delimited Text Files
- Introduction to BLAST
- Using csvkit and csvchk
- Getting Started
  - Defining the Arguments
  - Parsing Delimited Text Files Using the csv Module
  - Parsing Delimited Text Files Using the pandas Module
- Solutions
  - Solution 1: Manually Joining the Tables Using Dictionaries
  - Solution 2: Writing the Output File with csv.DictWriter()
  - Solution 3: Reading and Writing Files Using pandas
  - Solution 4: Joining Files Using pandas
- Going Further
- Review
A. Documenting Commands and Creating Workflows with make
- Makefiles Are Recipes
- Running a Specific Target
- Running with No Target
- Makefiles Create DAGs
- Using make to Compile a C Program
- Using make for a Shortcut
- Defining Variables
- Writing a Workflow
- Other Workflow Managers
- Further Reading
B. Understanding $PATH and Installing Command-Line Programs
Epilogue
Index