Sharing Big Data Safely. Managing Data Security - Helion
ISBN: 978-14-919-5363-1
stron: 96, Format: ebook
Data wydania: 2015-09-15
Księgarnia: Helion
Cena książki: 80,74 zł (poprzednio: 94,99 zł)
Oszczędzasz: 15% (-14,25 zł)
Many big data-driven companies today are moving to protect certain types of data against intrusion, leaks, or unauthorized eyes. But how do you lock down data while granting access to people who need to see it? In this practical book, authors Ted Dunning and Ellen Friedman offer two novel and practical solutions that you can implement right away.
Ideal for both technical and non-technical decision makers, group leaders, developers, and data scientists, this book shows you how to:
- Share original data in a controlled way so that different groups within your organization only see part of the whole. You’ll learn how to do this with the new open source SQL query engine Apache Drill.
- Provide synthetic data that emulates the behavior of sensitive data. This approach enables external advisors to work with you on projects involving data that you can't show them.
If you’re intrigued by the synthetic data solution, explore the log-synth program that Ted Dunning developed as open source code (available on GitHub), along with how-to instructions and tips for best practice. You’ll also get a collection of use cases.
Providing lock-down security while safely sharing data is a significant challenge for a growing number of organizations. With this book, you’ll discover new options to share data safely without sacrificing security.
Osoby które kupowały "Sharing Big Data Safely. Managing Data Security", wybierały także:
- Excel 2013. Kurs video. Poziom drugi. Przetwarzanie i analiza danych 79,00 zł, (35,55 zł -55%)
- Zrozumieć BPMN. Modelowanie procesów biznesowych. Wydanie 2 rozszerzone 39,90 zł, (19,95 zł -50%)
- Excel 2016 PL. Biblia 109,00 zł, (54,50 zł -50%)
- Naczelny Algorytm. Jak jego odkrycie zmieni nasz świat 49,00 zł, (24,50 zł -50%)
- Big Data. Najlepsze praktyki budowy skalowalnych systemów obsługi danych w czasie rzeczywistym 89,00 zł, (44,50 zł -50%)
Spis treści
Sharing Big Data Safely. Managing Data Security eBook -- spis treści
- Preface
- Who Should Use This Book
- 1. So Secure Its Lost
- Safe Access in Secure Big Data Systems
- 2. The Challenge: Sharing Data Safely
- Surprising Outcomes with Anonymity
- The Netflix Prize
- Unexpected Results from the Netflix Contest
- Implications of Breaking Anonymity
- Be Alert to the Possibility of Cross-Reference Datasets
- New York Taxicabs: Threats to Privacy
- Sharing Data Safely
- 3. Data on a Need-to-Know Basis
- Views: A Secure Way to Limit What Is Seen
- Why Limit Access?
- Apache Drill Views for Granular Security
- How Views Work
- Summary of Need-to-Know Methods
- 4. Fake Data Gives Real Answers
- The Surprising Thing About Fake Data
- Keep It Simple: log-synth
- Log-synth Use Case 1: Broken Large-Scale Hive Query
- Log-synth Use Case 2: Fraud Detection Model for Common Point of Compromise
- What Thieves Do
- Why Machine Learning Experts Were Consulted
- Using log-synth to Generate Fake User Histories
- Summary: Fake Data and log-synth to Safely Work with Secure Data
- 5. Fixing a Broken Large-Scale Query
- A Description of the Problem
- Determining What the Synthetic Data Needed to Be
- Schema for the Synthetic Data
- Generating the Synthetic Data
- Tips and Caveats
- What to Do from Here?
- 6. Fraud Detection
- What Is Really Important?
- The User Model
- Sampler for the Common Point of Compromise
- How the Breach Model Works
- Results of the Entire System Together
- Handy Tricks
- Summary
- 7. A Detailed Look at log-synth
- Goals
- Maintaining Simplicity: The Role of JSON in log-synth
- Structure
- Sampling Complex Values
- Structuring and De-structuring Samplers
- Extending log-synth
- Using log-synth with Apache Drill
- Choice of Data Generators
- R is for Random
- Benchmark Systems
- Probabilistic Programming
- Differential Privacy Preserving Systems
- Future Directions for log-synth
- 8. Sharing Data Safely: Practical Lessons
- A. Additional Resources
- Log-synth Open Source Software
- Apache Drill and Drill SQL Views
- General Resources and References
- Cheapside Hoard and Treasures
- Codes and Cipher
- Netflix Prize
- Problems with Data Sharing
- Additional OReilly Books by Dunning and Friedman