Applied Text Analysis with Python. Enabling Language-Aware Data Products with Machine Learning - Helion
ISBN: 978-14-919-6299-2
stron: 332, Format: ebook
Data wydania: 2018-06-11
Księgarnia: Helion
Cena książki: 186,15 zł (poprzednio: 216,45 zł)
Oszczędzasz: 14% (-30,30 zł)
From news and speeches to informal chatter on social media, natural language is one of the richest and most underutilized sources of data. Not only does it come in a constant stream, always changing and adapting in context; it also contains information that is not conveyed by traditional data sources. The key to unlocking natural language is through the creative application of text analytics. This practical book presents a data scientist’s approach to building language-aware products with applied machine learning.
You’ll learn robust, repeatable, and scalable techniques for text analysis with Python, including contextual and linguistic feature engineering, vectorization, classification, topic modeling, entity resolution, graph analysis, and visual steering. By the end of the book, you’ll be equipped with practical methods to solve any number of complex real-world problems.
- Preprocess and vectorize text into high-dimensional feature representations
- Perform document classification and topic modeling
- Steer the model selection process with visual diagnostics
- Extract key phrases, named entities, and graph structures to reason about data in text
- Build a dialog framework to enable chatbots and language-driven interaction
- Use Spark to scale processing power and neural networks to scale model complexity
Osoby które kupowały "Applied Text Analysis with Python. Enabling Language-Aware Data Products with Machine Learning", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Applied Text Analysis with Python. Enabling Language-Aware Data Products with Machine Learning eBook -- spis treści
- Preface
- Computational Challenges of Natural Language
- Linguistic Data: Tokens and Words
- Enter Machine Learning
- Tools for Text Analysis
- What to Expect from This Book
- Who This Book Is For
- Code Examples and GitHub Repository
- Conventions Used in This Book
- Using Code Examples
- OReilly Safari
- How to Contact Us
- Acknowledgments
- Computational Challenges of Natural Language
- 1. Language and Computation
- The Data Science Paradigm
- Language-Aware Data Products
- The Data Product Pipeline
- The model selection triple
- The Data Product Pipeline
- Language as Data
- A Computational Model of Language
- Language Features
- Contextual Features
- Structural Features
- Conclusion
- 2. Building a Custom Corpus
- What Is a Corpus?
- Domain-Specific Corpora
- The Baleen Ingestion Engine
- Corpus Data Management
- Corpus Disk Structure
- The Baleen disk structure
- Corpus Disk Structure
- Corpus Readers
- Streaming Data Access with NLTK
- Reading an HTML Corpus
- Corpus monitoring
- Reading a Corpus from a Database
- Conclusion
- What Is a Corpus?
- 3. Corpus Preprocessing and Wrangling
- Breaking Down Documents
- Identifying and Extracting Core Content
- Deconstructing Documents into Paragraphs
- Segmentation: Breaking Out Sentences
- Tokenization: Identifying Individual Tokens
- Part-of-Speech Tagging
- Intermediate Corpus Analytics
- Corpus Transformation
- Intermediate Preprocessing and Storage
- Writing to pickle
- Reading the Processed Corpus
- Intermediate Preprocessing and Storage
- Conclusion
- Breaking Down Documents
- 4. Text Vectorization and Transformation Pipelines
- Words in Space
- Frequency Vectors
- With NLTK
- In Scikit-Learn
- The Gensim way
- One-Hot Encoding
- With NLTK
- In Scikit-Learn
- The Gensim way
- Term FrequencyInverse Document Frequency
- With NLTK
- In Scikit-Learn
- The Gensim way
- Distributed Representation
- The Gensim way
- Frequency Vectors
- The Scikit-Learn API
- The BaseEstimator Interface
- Extending TransformerMixin
- Creating a custom Gensim vectorization transformer
- Creating a custom text normalization transformer
- Pipelines
- Pipeline Basics
- Grid Search for Hyperparameter Optimization
- Enriching Feature Extraction with Feature Unions
- Conclusion
- Words in Space
- 5. Classification for Text Analysis
- Text Classification
- Identifying Classification Problems
- Classifier Models
- Building a Text Classification Application
- Cross-Validation
- Streaming access to k splits
- Model Construction
- Model Evaluation
- Model Operationalization
- Cross-Validation
- Conclusion
- Text Classification
- 6. Clustering for Text Similarity
- Unsupervised Learning on Text
- Clustering by Document Similarity
- Distance Metrics
- Partitive Clustering
- k-means clustering
- Optimizing k-means
- Handling uneven geometries
- Hierarchical Clustering
- Agglomerative clustering
- Modeling Document Topics
- Latent Dirichlet Allocation
- In Scikit-Learn
- The Gensim way
- Visualizing topics
- Latent Semantic Analysis
- In Scikit-Learn
- The Gensim way
- Non-Negative Matrix Factorization
- In Scikit-Learn
- Latent Dirichlet Allocation
- Conclusion
- 7. Context-Aware Text Analysis
- Grammar-Based Feature Extraction
- Context-Free Grammars
- Syntactic Parsers
- Extracting Keyphrases
- Extracting Entities
- n-Gram Feature Extraction
- An n-Gram-Aware CorpusReader
- Choosing the Right n-Gram Window
- Significant Collocations
- n-Gram Language Models
- Frequency and Conditional Frequency
- Estimating Maximum Likelihood
- Unknown Words: Back-off and Smoothing
- Language Generation
- Conclusion
- Grammar-Based Feature Extraction
- 8. Text Visualization
- Visualizing Feature Space
- Visual Feature Analysis
- n-gram viewer
- Network visualization
- Co-occurrence plots
- Text x-rays and dispersion plots
- Guided Feature Engineering
- Part-of-speech tagging
- Most informative features
- Visual Feature Analysis
- Model Diagnostics
- Visualizing Clusters
- Visualizing Classes
- Diagnosing Classification Error
- Classification report heatmaps
- Confusion matrices
- Visual Steering
- Silhouette Scores and Elbow Curves
- Silhouette scores
- Elbow curves
- Silhouette Scores and Elbow Curves
- Conclusion
- Visualizing Feature Space
- 9. Graph Analysis of Text
- Graph Computation and Analysis
- Creating a Graph-Based Thesaurus
- Analyzing Graph Structure
- Visual Analysis of Graphs
- Extracting Graphs from Text
- Creating a Social Graph
- Finding entity pairs
- Property graphs
- Implementing the graph extraction
- Insights from the Social Graph
- Centrality
- Structural analysis
- Creating a Social Graph
- Entity Resolution
- Entity Resolution on a Graph
- Blocking with Structure
- Fuzzy Blocking
- Conclusion
- Graph Computation and Analysis
- 10. Chatbots
- Fundamentals of Conversation
- Dialog: A Brief Exchange
- Maintaining a Conversation
- Rules for Polite Conversation
- Greetings and Salutations
- Handling Miscommunication
- Entertaining Questions
- Dependency Parsing
- Constituency Parsing
- Question Detection
- From Tablespoons to Grams
- Learning to Help
- Being Neighborly
- Offering Recommendations
- Conclusion
- Fundamentals of Conversation
- 11. Scaling Text Analytics with Multiprocessing and Spark
- Python Multiprocessing
- Running Tasks in Parallel
- Process Pools and Queues
- Parallel Corpus Preprocessing
- Cluster Computing with Spark
- Anatomy of a Spark Job
- Distributing the Corpus
- RDD Operations
- NLP with Spark
- From Scikit-Learn to MLLib
- Feature extraction
- Text clustering with MLLib
- Text classification with MLLib
- Local fit, global evaluation
- Conclusion
- Python Multiprocessing
- 12. Deep Learning and Beyond
- Applied Neural Networks
- Neural Language Models
- Artificial Neural Networks
- Training a multilayer perceptron
- Deep Learning Architectures
- TensorFlow: A framework for deep learning
- Keras: An API for deep learning
- Artificial Neural Networks
- Sentiment Analysis
- Deep Structure Analysis
- Predicting sentiment with a bag-of-keyphrases
- Deep Structure Analysis
- The Future Is (Almost) Here
- Glossary
- Index