Cassandra: The Definitive Guide - Helion

ebook

Autor: Eben Hewitt
ISBN: 978-14-493-9664-0
stron: 332, Format: ebook
Data wydania: 2010-11-12
Księgarnia: Helion

Cena książki: 139,00 zł

Osoby, które kupiły tę książkę, wybierały także »

Tagi: NoSQL

What could you do with data if scalability wasn't a problem? With this hands-on guide, you'll learn how Apache Cassandra handles hundreds of terabytes of data while remaining highly available across multiple data centers -- capabilities that have attracted Facebook, Twitter, and other data-intensive companies. Cassandra: The Definitive Guide provides the technical details and practical examples you need to assess this database management system and put it to work in a production environment.

Author Eben Hewitt demonstrates the advantages of Cassandra's nonrelational design, and pays special attention to data modeling. If you're a developer, DBA, application architect, or manager looking to solve a database scaling issue or future-proof your application, this guide shows you how to harness Cassandra's speed and flexibility.

Understand the tenets of Cassandra's column-oriented structure
Learn how to write, update, and read Cassandra data
Discover how to add or remove nodes from the cluster as your application requires
Examine a working application that translates from a relational model to Cassandra's data model
Use examples for writing clients in Java, Python, and C#
Use the JMX interface to monitor a cluster's usage, memory patterns, and more
Tune memory settings, data storage, and caching for better performance

Osoby które kupowały "Cassandra: The Definitive Guide", wybierały także:

MongoDB w akcji 89,00 zł, (44,50 zł -50%)
NoSQL. Przyjazny przewodnik 69,00 zł, (34,50 zł -50%)
NoSQL. Kompendium wiedzy 39,00 zł, (19,50 zł -50%)
Mastering Amazon Relational Database Service for MySQL 99,24 zł, (66,49 zł -33%)
Big Data and Analytics 99,24 zł, (66,49 zł -33%)

Spis treści

Cassandra: The Definitive Guide eBook -- spis treści

Cassandra: The Definitive Guide
Dedication
SPECIAL OFFER: Upgrade this ebook with OReilly
A Note Regarding Supplemental Files
Foreword
Preface
- Why Apache Cassandra?
- Is This Book for You?
- Whats in This Book?
- Finding Out More
- Conventions Used in This Book
- Using Code Examples
- Safari Enabled
- How to Contact Us
- Acknowledgments
1. Introducing Cassandra
- Whats Wrong with Relational Databases?
- A Quick Review of Relational Databases
  - RDBMS: The Awesome and the Not-So-Much
    - Transactions, ACID-ity, and two-phase commit
    - Schema
    - Sharding and shared-nothing architecture
    - Summary
  - Web Scale
- The Cassandra Elevator Pitch
  - Cassandra in 50 Words or Less
  - Distributed and Decentralized
  - Elastic Scalability
  - High Availability and Fault Tolerance
  - Tuneable Consistency
  - Brewers CAP Theorem
  - Row-Oriented
  - Schema-Free
  - High Performance
- Where Did Cassandra Come From?
- Use Cases for Cassandra
  - Large Deployments
  - Lots of Writes, Statistics, and Analysis
  - Geographical Distribution
  - Evolving Applications
- Who Is Using Cassandra?
- Summary
2. Installing Cassandra
- Installing the Binary
  - Extracting the Download
  - Whats In There?
- Building from Source
  - Additional Build Targets
  - Building with Maven
- Running Cassandra
  - On Windows
  - On Linux
  - Starting the Server
- Running the Command-Line Client Interface
- Basic CLI Commands
  - Help
  - Connecting to a Server
  - Describing the Environment
  - Creating a Keyspace and Column Family
  - Writing and Reading Data
- Summary
3. The Cassandra Data Model
- The Relational Data Model
- A Simple Introduction
- Clusters
- Keyspaces
- Column Families
  - Column Family Options
- Columns
  - Wide Rows, Skinny Rows
  - Column Sorting
- Super Columns
  - Composite Keys
- Design Differences Between RDBMS and Cassandra
  - No Query Language
  - No Referential Integrity
  - Secondary Indexes
  - Sorting Is a Design Decision
  - Denormalization
- Design Patterns
  - Materialized View
  - Valueless Column
  - Aggregate Key
- Some Things to Keep in Mind
- Summary
4. Sample Application
- Data Design
- Hotel App RDBMS Design
- Hotel App Cassandra Design
- Hotel Application Code
  - Creating the Database
    - Loading the schema
  - Data Structures
  - Getting a Connection
  - Prepopulating the Database
  - The Search Application
- Twissandra
- Summary
5. The Cassandra Architecture
- System Keyspace
- Peer-to-Peer
- Gossip and Failure Detection
- Anti-Entropy and Read Repair
- Memtables, SSTables, and Commit Logs
- Hinted Handoff
- Compaction
- Bloom Filters
- Tombstones
- Staged Event-Driven Architecture (SEDA)
- Managers and Services
  - Cassandra Daemon
  - Storage Service
  - Messaging Service
  - Hinted Handoff Manager
- Summary
6. Configuring Cassandra
- Keyspaces
  - Creating a Column Family
  - Transitioning from 0.6 to 0.7
- Replicas
- Replica Placement Strategies
  - Simple Strategy
  - Old Network Topology Strategy
  - Network Topology Strategy
- Replication Factor
  - Increasing the Replication Factor
- Partitioners
  - Random Partitioner
  - Order-Preserving Partitioner
  - Collating Order-Preserving Partitioner
  - Byte-Ordered Partitioner
- Snitches
  - Simple Snitch
  - PropertyFileSnitch
- Creating a Cluster
  - Changing the Cluster Name
  - Adding Nodes to a Cluster
  - Multiple Seed Nodes
- Dynamic Ring Participation
- Security
  - Using SimpleAuthenticator
  - Programmatic Authentication
  - Using MD5 Encryption
  - Providing Your Own Authentication
- Miscellaneous Settings
- Additional Tools
  - Viewing Keys
  - Importing Previous Configurations
- Summary
7. Reading and Writing Data
- Query Differences Between RDBMS and Cassandra
  - No Update Query
  - Record-Level Atomicity on Writes
  - No Server-Side Transaction Support
  - No Duplicate Keys
- Basic Write Properties
- Consistency Levels
- Basic Read Properties
- The API
  - Ranges and Slices
- Setup and Inserting Data
- Using a Simple Get
- Seeding Some Values
- Slice Predicate
  - Getting Particular Column Names with Get Slice
  - Getting a Set of Columns with Slice Range
    - Counts
    - Reversed
  - Getting All Columns in a Row
- Get Range Slices
- Multiget Slice
- Deleting
- Batch Mutates
  - Batch Deletes
  - Range Ghosts
- Programmatically Defining Keyspaces and Column Families
- Summary
8. Clients
- Basic Client API
- Thrift
  - Thrift Support for Java
  - Exceptions
  - Thrift Summary
- Avro
  - Avro Ant Targets
  - Avro Specification
  - Avro Summary
- A Bit of Git
- Connecting Client Nodes
  - Client List
  - Round-Robin DNS
  - Load Balancer
- Cassandra Web Console
- Hector (Java)
  - Features
  - The Hector API
- HectorSharp (C#)
- Chirper
- Chiton (Python)
- Pelops (Java)
- Kundera (Java ORM)
- Fauna (Ruby)
- Summary
9. Monitoring
- Logging
  - Tailing
  - General Tips
    - Following along
    - Warning signs
- Overview of JMX and MBeans
  - MBeans
  - Integrating JMX
- Interacting with Cassandra via JMX
- Cassandras MBeans
  - org.apache.cassandra.concurrent
  - org.apache.cassandra.db
  - org.apache.cassandra.gms
  - org.apache.cassandra.service
    - StorageService
    - StreamingService
- Custom Cassandra MBeans
- Runtime Analysis Tools
  - Heap Analysis with JMX and JHAT
  - Detecting Thread Problems
- Health Check
- Summary
10. Maintenance
- Getting Ring Information
  - Info
  - Ring
    - Range Tokens
- Getting Statistics
  - Using cfstats
  - Using tpstats
- Basic Maintenance
  - Repair
  - Flush
  - Cleanup
- Snapshots
  - Taking a Snapshot
  - Clearing a Snapshot
- Load-Balancing the Cluster
  - loadbalance and streams
- Decommissioning a Node
- Updating Nodes
  - Removing Tokens
  - Compaction Threshold
  - Changing Column Families in a Working Cluster
- Summary
11. Performance Tuning
- Data Storage
- Reply Timeout
- Commit Logs
- Memtables
- Concurrency
- Caching
- Buffer Sizes
- Using the Python Stress Test
  - Generating the Python Thrift Interfaces
    - Getting Thrift
  - Running the Python Stress Test
- Startup and JVM Settings
  - Tuning the JVM
- Summary
12. Integrating Hadoop
- What Is Hadoop?
- Working with MapReduce
  - Cassandra Hadoop Source Package
- Running the Word Count Example
  - Outputting Data to Cassandra
  - Hadoop Streaming
- Tools Above MapReduce
  - Pig
  - Hive
- Cluster Configuration
- Use Cases
  - Raptr.com: Keith Thornhill
  - Imagini: Dave Gardner
- Summary
A. The Nonrelational Landscape
- Nonrelational Databases
- Object Databases
- XML Databases
  - SoftwareAG Tamino
  - eXist
  - Oracle Berkeley XML DB
  - MarkLogic Server
  - Apache Xindice
  - Summary
- Document-Oriented Databases
  - IBM Lotus
  - Apache CouchDB
  - MongoDB
  - Riak
- Graph Databases
  - FlockDB
  - Neo4J
- Key-Value Stores and Distributed Hashtables
  - Amazon Dynamo
  - Project Voldemort
  - Redis
- Columnar Databases
  - Google Bigtable
  - HBase
  - Hypertable
  - Polyglot Persistence
- Summary
Glossary
Index
About the Author
Colophon
SPECIAL OFFER: Upgrade this ebook with OReilly
Copyright