reklama - zainteresowany?

Hands-On Entity Resolution - Helion

Hands-On Entity Resolution
ebook
Autor: Michael Shearer
ISBN: 9781098148447
stron: 198, Format: ebook
Data wydania: 2024-02-01
Księgarnia: Helion

Cena książki: 211,65 zł (poprzednio: 246,10 zł)
Oszczędzasz: 14% (-34,45 zł)

Dodaj do koszyka Hands-On Entity Resolution

Entity resolution is a key analytic technique that enables you to identify multiple data records that refer to the same real-world entity. With this hands-on guide, product managers, data analysts, and data scientists will learn how to add value to data by cleansing, analyzing, and resolving datasets using open source Python libraries and cloud APIs.

Author Michael Shearer shows you how to scale up your data matching processes and improve the accuracy of your reconciliations. You'll be able to remove duplicate entries within a single source and join disparate data sources together when common keys aren't available. Using real-world data examples, this book helps you gain practical understanding to accelerate the delivery of real business value.

With entity resolution, you'll build rich and comprehensive data assets that reveal relationships for marketing and risk management purposes, key to harnessing the full potential of ML and AI. This book covers:

  • Challenges in deduplicating and joining datasets
  • Extracting, cleansing, and preparing datasets for matching
  • Text matching algorithms to identify equivalent entities
  • Techniques for deduplicating and joining datasets at scale
  • Matching datasets containing persons and organizations
  • Evaluating data matches
  • Optimizing and tuning data matching algorithms
  • Entity resolution using cloud APIs
  • Matching using privacy-enhancing technologies

Dodaj do koszyka Hands-On Entity Resolution

 

Osoby które kupowały "Hands-On Entity Resolution", wybierały także:

  • Windows Media Center. Domowe centrum rozrywki
  • Ruby on Rails. Ćwiczenia
  • Przywództwo w Å›wiecie VUCA. Jak być skutecznym liderem w niepewnym Å›rodowisku
  • Scrum. O zwinnym zarzÄ…dzaniu projektami. Wydanie II rozszerzone
  • Od hierarchii do turkusu, czyli jak zarzÄ…dzać w XXI wieku

Dodaj do koszyka Hands-On Entity Resolution

Spis treści

Hands-On Entity Resolution eBook -- spis treści

  • Preface
    • Who Should Read This Book
    • Why I Wrote This Book
    • Navigating This Book
    • Conventions Used in This Book
    • Using Code Examples
    • OReilly Online Learning
    • How to Contact Us
    • Acknowledgments
  • 1. Introduction to Entity Resolution
    • What Is Entity Resolution?
    • Why Is Entity Resolution Needed?
    • Main Challenges of Entity Resolution
      • Lack of Unique Names
      • Inconsistent Naming Conventions
      • Data Capture Inconsistencies
      • Worked Example
      • Deliberate Obfuscation
      • Match Permutations
      • Blind Matching?
    • The Entity Resolution Process
      • Data Standardization
      • Record Blocking
      • Attribute Comparison
      • Match Classification
      • Clustering
      • Canonicalization
      • Worked Example
    • Measuring Performance
    • Getting Started
  • 2. Data Standardization
    • Sample Problem
    • Environment Setup
    • Acquiring Data
      • Wikipedia Data
      • TheyWorkForYou Data
        • Adding Facebook links
    • Cleansing Data
      • Wikipedia
      • TheyWorkForYou
    • Attribute Comparison
    • Constituency
    • Measuring Performance
    • Sample Calculation
    • Summary
  • 3. Text Matching
    • Edit Distance Matching
      • Levenshtein Distance
      • Jaro Similarity
      • Jaro-Winkler Similarity
    • Phonetic Matching
      • Metaphone
      • Match Rating Approach
    • Comparing the Techniques
    • Sample Problem
    • Full Similarity Comparison
    • Measuring Performance
    • Summary
  • 4. Probabilistic Matching
    • Sample Problem
    • Single Attribute Match Probability
      • First Name Match Probability
      • Last Name Match Probability
    • Multiple Attribute Match Probability
    • Probabilistic Models
      • Bayes Theorem
      • m Value
      • u Value
      • Lambda ( ) Value
      • Bayes Factor
      • Fellegi-Sunter Model
      • Match Weight
    • Expectation-Maximization Algorithm
      • Iteration 1
      • Iteration 2
      • Iteration 3
    • Introducing Splink
      • Configuring Splink
      • Splink Performance
    • Summary
  • 5. Record Blocking
    • Sample Problem
    • Data Acquisition
      • Wikipedia Data
      • UK Companies House Data
    • Data Standardization
      • Wikipedia Data
      • UK Companies House Data
    • Record Blocking and Attribute Comparison
      • Record Blocking with Splink
      • Attribute Comparison
    • Match Classification
    • Measuring Performance
    • Summary
  • 6. Company Matching
    • Sample Problem
    • Data Acquisition
    • Data Standardization
      • Companies House Data
      • Maritime and Coastguard Agency Data
    • Record Blocking and Attribute Comparison
    • Match Classification
    • Measuring Performance
    • Matching New Entities
    • Summary
  • 7. Clustering
    • Simple Exact Match Clustering
    • Approximate Match Clustering
    • Sample Problem
      • Data Acquisition
      • Data Standardization
    • Record Blocking and Attribute Comparison
      • Data Analysis
      • Expectation-Maximization Blocking Rules
    • Match Classification and Clustering
    • Cluster Visualization
    • Cluster Analysis
    • Summary
  • 8. Scaling Up on Google Cloud
    • Google Cloud Setup
      • Setting Up Project Storage
    • Creating a Dataproc Cluster
    • Configuring a Dataproc Cluster
    • Entity Resolution on Spark
    • Measuring Performance
    • Tidy Up!
    • Summary
  • 9. Cloud Entity Resolution Services
    • Introduction to BigQuery
    • Enterprise Knowledge Graph API
      • Schema Mapping
      • Reconciliation Job
      • Result Processing
      • Entity Reconciliation Python Client
    • Measuring Performance
    • Summary
  • 10. Privacy-Preserving Record Linkage
    • An Introduction to Private Set Intersection
    • How PSI Works
    • PSI Protocol Based on ECDH
      • Bloom Filters
        • Bloom filter example
      • Golomb-Coded Sets
        • GCS example
    • Example: Using the PSI Process
      • Environment Setup
        • Google Cloud setup
        • Option 1: Prebuilt PSI package
        • Option 2: Build PSI package
        • Server install
      • Server Code
      • Client Code
        • Using raw encrypted server values
        • Using Bloom filterencoded encrypted server values
        • Using GCS-encoded encrypted server values
      • Full MCA and Companies House Sample Example
    • Summary
  • 11. Further Considerations
    • Data Considerations
      • Unstructured Data
      • Data Quality
      • Temporal Equivalence
    • Attribute Comparison
      • Set Matching
      • Geocoding Location Matching
      • Aggregating Comparisons
    • Post Processing
    • Graphical Representation
    • Real-Time Considerations
    • Performance Evaluation
      • Pairwise Approach
      • Cluster-Based Approach
    • Future of Entity Resolution
  • Index

Dodaj do koszyka Hands-On Entity Resolution

Code, Publish & WebDesing by CATALIST.com.pl



(c) 2005-2025 CATALIST agencja interaktywna, znaki firmowe należą do wydawnictwa Helion S.A.