Mastering Python for Bioinformatics - Helion
ISBN: 9781098100834
stron: 456, Format: ebook
Data wydania: 2021-05-05
Księgarnia: Helion
Cena książki: 271,15 zł (poprzednio: 319,00 zł)
Oszczędzasz: 15% (-47,85 zł)
Life scientists today urgently need training in bioinformatics skills. Too many bioinformatics programs are poorly written and barely maintained--usually by students and researchers who've never learned basic programming skills. This practical guide shows postdoc bioinformatics professionals and students how to exploit the best parts of Python to solve problems in biology while creating documented, tested, reproducible software.
Ken Youens-Clark, author of Tiny Python Projects (Manning), demonstrates not only how to write effective Python code but also how to use tests to write and refactor scientific programs. You'll learn the latest Python features and toolsâ??including linters, formatters, type checkers, and testsâ??to create documented and tested programs. You'll also tackle 14 challenges in Rosalind, a problem-solving platform for learning bioinformatics and programming.
- Create command-line Python programs to document and validate parameters
- Write tests to verify refactor programs and confirm they're correct
- Address bioinformatics ideas using Python data structures and modules such as Biopython
- Create reproducible shortcuts and workflows using makefiles
- Parse essential bioinformatics file formats such as FASTA and FASTQ
- Find patterns of text using regular expressions
- Use higher-order functions in Python like filter(), map(), and reduce()
Osoby które kupowały "Mastering Python for Bioinformatics", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Mastering Python for Bioinformatics eBook -- spis treści
- Preface
- Who Should Read This?
- Programming Style: Why I Avoid OOP and Exceptions
- Structure
- Test-Driven Development
- Using the Command Line and Installing Python
- Getting the Code and Tests
- Installing Modules
- Installing the new.py Program
- Why Did I Write This Book?
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- I. The Rosalind.info Challenges
- 1. Tetranucleotide Frequency: Counting Things
- Getting Started
- Creating the Program Using new.py
- Using argparse
- Tools for Finding Errors in the Code
- Introducing Named Tuples
- Adding Types to Named Tuples
- Representing the Arguments with a NamedTuple
- Reading Input from the Command Line or a File
- Testing Your Program
- Running the Program to Test the Output
- Solution 1: Iterating and Counting the Characters in a String
- Counting the Nucleotides
- Writing and Verifying a Solution
- Additional Solutions
- Solution 2: Creating a count() Function and Adding a Unit Test
- Solution 3: Using str.count()
- Solution 4: Using a Dictionary to Count All the Characters
- Solution 5: Counting Only the Desired Bases
- Solution 6: Using collections.defaultdict()
- Solution 7: Using collections.Counter()
- Going Further
- Review
- Getting Started
- 2. Transcribing DNA into mRNA: Mutating Strings, Reading and Writing Files
- Getting Started
- Defining the Programs Parameters
- Defining an Optional Parameter
- Defining One or More Required Positional Parameters
- Using nargs to Define the Number of Arguments
- Using argparse.FileType() to Validate File Arguments
- Defining the Args Class
- Outlining the Program Using Pseudocode
- Iterating the Input Files
- Creating the Output Filenames
- Opening the Output Files
- Writing the Output Sequences
- Printing the Status Report
- Using the Test Suite
- Solutions
- Solution 1: Using str.replace()
- Solution 2: Using re.sub()
- Benchmarking
- Going Further
- Review
- Getting Started
- 3. Reverse Complement of DNA: String Manipulation
- Getting Started
- Iterating Over a Reversed String
- Creating a Decision Tree
- Refactoring
- Solutions
- Solution 1: Using a for Loop and Decision Tree
- Solution 2: Using a Dictionary Lookup
- Solution 3: Using a List Comprehension
- Solution 4: Using str.translate()
- Solution 5: Using Bio.Seq
- Review
- Getting Started
- 4. Creating the Fibonacci Sequence: Writing, Testing, and Benchmarking Algorithms
- Getting Started
- An Imperative Approach
- Solutions
- Solution 1: An Imperative Solution Using a List as a Stack
- Solution 2: Creating a Generator Function
- Solution 3: Using Recursion and Memoization
- Benchmarking the Solutions
- Testing the Good, the Bad, and the Ugly
- Running the Test Suite on All the Solutions
- Going Further
- Review
- Getting Started
- 5. Computing GC Content: Parsing FASTA and Analyzing Sequences
- Getting Started
- Get Parsing FASTA Using Biopython
- Iterating the Sequences Using a for Loop
- Solutions
- Solution 1: Using a List
- Solution 2: Type Annotations and Unit Tests
- Solution 3: Keeping a Running Max Variable
- Solution 4: Using a List Comprehension with a Guard
- Solution 5: Using the filter() Function
- Solution 6: Using the map() Function and Summing Booleans
- Solution 7: Using Regular Expressions to Find Patterns
- Solution 8: A More Complex find_gc() Function
- Benchmarking
- Going Further
- Review
- Getting Started
- 6. Finding the Hamming Distance: Counting Point Mutations
- Getting Started
- Iterating the Characters of Two Strings
- Solutions
- Solution 1: Iterating and Counting
- Solution 2: Creating a Unit Test
- Solution 3: Using the zip() Function
- Solution 4: Using the zip_longest() Function
- Solution 5: Using a List Comprehension
- Solution 6: Using the filter() Function
- Solution 7: Using the map() Function with zip_longest()
- Solution 8: Using the starmap() and operator.ne() Functions
- Going Further
- Review
- Getting Started
- 7. Translating mRNA into Protein: More Functional Programming
- Getting Started
- K-mers and Codons
- Translating Codons
- Solutions
- Solution 1: Using a for Loop
- Solution 2: Adding Unit Tests
- Solution 3: Another Function and a List Comprehension
- Solution 4: Functional Programming with the map(), partial(), and takewhile() Functions
- Solution 5: Using Bio.Seq.translate()
- Benchmarking
- Going Further
- Review
- Getting Started
- 8. Find a Motif in DNA: Exploring Sequence Similarity
- Getting Started
- Finding Subsequences
- Solutions
- Solution 1: Using the str.find() Method
- Solution 2: Using the str.index() Method
- Solution 3: A Purely Functional Approach
- Solution 4: Using K-mers
- Solution 5: Finding Overlapping Patterns Using Regular Expressions
- Benchmarking
- Going Further
- Review
- Getting Started
- 9. Overlap Graphs: Sequence Assembly Using Shared K-mers
- Getting Started
- Managing Runtime Messages with STDOUT, STDERR, and Logging
- Finding Overlaps
- Grouping Sequences by the Overlap
- Solutions
- Solution 1: Using Set Intersections to Find Overlaps
- Solution 2: Using a Graph to Find All Paths
- Going Further
- Review
- Getting Started
- 10. Finding the Longest Shared Subsequence: Finding K-mers, Writing Functions, and Using Binary Search
- Getting Started
- Finding the Shortest Sequence in a FASTA File
- Extracting K-mers from a Sequence
- Solutions
- Solution 1: Counting Frequencies of K-mers
- Solution 2: Speeding Things Up with a Binary Search
- Going Further
- Review
- Getting Started
- 11. Finding a Protein Motif: Fetching Data and Using Regular Expressions
- Getting Started
- Downloading Sequences Files on the Command Line
- Downloading Sequences Files with Python
- Writing a Regular Expression to Find the Motif
- Solutions
- Solution 1: Using a Regular Expression
- Solution 2: Writing a Manual Solution
- Going Further
- Review
- Getting Started
- 12. Inferring mRNA from Protein: Products and Reductions of Lists
- Getting Started
- Creating the Product of Lists
- Avoiding Overflow with Modular Multiplication
- Solutions
- Solution 1: Using a Dictionary for the RNA Codon Table
- Solution 2: Turn the Beat Around
- Solution 3: Encoding the Minimal Information
- Going Further
- Review
- Getting Started
- 13. Location Restriction Sites: Using, Testing, and Sharing Code
- Getting Started
- Finding All Subsequences Using K-mers
- Finding All Reverse Complements
- Putting It All Together
- Solutions
- Solution 1: Using the zip() and enumerate() Functions
- Solution 2: Using the operator.eq() Function
- Solution 3: Writing a revp() Function
- Testing the Program
- Going Further
- Review
- Getting Started
- 14. Finding Open Reading Frames
- Getting Started
- Translating Proteins Inside Each Frame
- Finding the ORFs in a Protein Sequence
- Solutions
- Solution 1: Using the str.index() Function
- Solution 2: Using the str.partition() Function
- Solution 3: Using a Regular Expression
- Going Further
- Review
- Getting Started
- II. Other Programs
- 15. Seqmagique: Creating and Formatting Reports
- Using Seqmagick to Analyze Sequence Files
- Checking Files Using MD5 Hashes
- Getting Started
- Formatting Text Tables Using tabulate()
- Solutions
- Solution 1: Formatting with tabulate()
- Solution 2: Formatting with rich
- Going Further
- Review
- 16. FASTX grep: Creating a Utility Program to Select Sequences
- Finding Lines in a File Using grep
- The Structure of a FASTQ Record
- Getting Started
- Guessing the File Format
- Solution
- Going Further
- Review
- 17. DNA Synthesizer: Creating Synthetic Data with Markov Chains
- Understanding Markov Chains
- Getting Started
- Understanding Random Seeds
- Reading the Training Files
- Generating the Sequences
- Structuring the Program
- Solution
- Going Further
- Review
- 18. FASTX Sampler: Randomly Subsampling Sequence Files
- Getting Started
- Reviewing the Program Parameters
- Defining the Parameters
- Nondeterministic Sampling
- Structuring the Program
- Solutions
- Solution 1: Reading Regular Files
- Solution 2: Reading a Large Number of Compressed Files
- Going Further
- Review
- Getting Started
- 19. Blastomatic: Parsing Delimited Text Files
- Introduction to BLAST
- Using csvkit and csvchk
- Getting Started
- Defining the Arguments
- Parsing Delimited Text Files Using the csv Module
- Parsing Delimited Text Files Using the pandas Module
- Solutions
- Solution 1: Manually Joining the Tables Using Dictionaries
- Solution 2: Writing the Output File with csv.DictWriter()
- Solution 3: Reading and Writing Files Using pandas
- Solution 4: Joining Files Using pandas
- Going Further
- Review
- A. Documenting Commands and Creating Workflows with make
- Makefiles Are Recipes
- Running a Specific Target
- Running with No Target
- Makefiles Create DAGs
- Using make to Compile a C Program
- Using make for a Shortcut
- Defining Variables
- Writing a Workflow
- Other Workflow Managers
- Further Reading
- B. Understanding $PATH and Installing Command-Line Programs
- Epilogue
- Index