Data Science with Java. Practical Methods for Scientists and Engineers - Helion
ISBN: 978-14-919-3406-7
stron: 236, Format: ebook
Data wydania: 2017-06-06
Księgarnia: Helion
Cena książki: 194,65 zł (poprzednio: 226,34 zł)
Oszczędzasz: 14% (-31,69 zł)
Data Science is booming thanks to R and Python, but Java brings the robustness, convenience, and ability to scale critical to today’s data science applications. With this practical book, Java software engineers looking to add data science skills will take a logical journey through the data science pipeline. Author Michael Brzustowicz explains the basic math theory behind each step of the data science process, as well as how to apply these concepts with Java.
You’ll learn the critical roles that data IO, linear algebra, statistics, data operations, learning and prediction, and Hadoop MapReduce play in the process. Throughout this book, you’ll find code examples you can use in your applications.
- Examine methods for obtaining, cleaning, and arranging data into its purest form
- Understand the matrix structure that your data should take
- Learn basic concepts for testing the origin and validity of data
- Transform your data into stable and usable numerical values
- Understand supervised and unsupervised learning algorithms, and methods for evaluating their success
- Get up and running with MapReduce, using customized components suitable for data science algorithms
Osoby które kupowały "Data Science with Java. Practical Methods for Scientists and Engineers", wybierały także:
- Wprowadzenie do Javy. Programowanie i struktury danych. Wydanie XII 193,23 zł, (59,90 zł -69%)
- Spring i Spring Boot. Kurs video. Testowanie aplikacji i bezpiecze 129,00 zł, (51,60 zł -60%)
- Metoda dziel i zwyci 89,00 zł, (35,60 zł -60%)
- JavaFX. Kurs video. Wzorce oraz typy generyczne 79,00 zł, (31,60 zł -60%)
- Platforma Xamarin. Kurs video. Poziom drugi. Zaawansowane techniki tworzenia aplikacji cross-platform 99,00 zł, (39,60 zł -60%)
Spis treści
Data Science with Java. Practical Methods for Scientists and Engineers eBook -- spis treści
- Preface
- Who Should Read This Book
- Why I Wrote This Book
- A Word on Data Science Today
- Navigating This Book
- Conventions Used in This Book
- Using Code Examples
- OReilly Safari
- How to Contact Us
- Acknowledgments
- 1. Data I/O
- What Is Data, Anyway?
- Data Models
- Univariate Arrays
- Multivariate Arrays
- Data Objects
- Matrices and Vectors
- JSON
- Dealing with Real Data
- Nulls
- Blank Spaces
- Parse Errors
- Outliers
- Managing Data Files
- Understanding File Contents First
- Reading from a Text File
- Parsing big strings
- Parsing delimited strings
- Parsing JSON strings
- Reading from a JSON File
- Reading from an Image File
- Writing to a Text File
- Mastering Database Operations
- Command-Line Clients
- Structured Query Language
- Create
- Select
- Insert
- Update
- Delete
- Drop
- Java Database Connectivity
- Connections
- Statements
- Prepared statements
- Result sets
- Visualizing Data with Plots
- Creating Simple Plots
- Scatter plots
- Bar charts
- Plotting multiple series
- Basic formatting
- Plotting Mixed Chart Types
- Saving a Plot to a File
- Creating Simple Plots
- 2. Linear Algebra
- Building Vectors and Matrices
- Array Storage
- Block Storage
- Map Storage
- Accessing Elements
- Working with Submatrices
- Randomization
- Operating on Vectors and Matrices
- Scaling
- Transposing
- Addition and Subtraction
- Length
- Distances
- Multiplication
- Inner Product
- Outer Product
- Entrywise Product
- Compound Operations
- Affine Transformation
- Mapping a Function
- Decomposing Matrices
- Cholesky Decomposition
- LU Decomposition
- QR Decomposition
- Singular Value Decomposition
- Eigen Decomposition
- Determinant
- Inverse
- Solving Linear Systems
- Building Vectors and Matrices
- 3. Statistics
- The Probabilistic Origins of Data
- Probability Density
- Cumulative Probability
- Statistical Moments
- Entropy
- Continuous Distributions
- Uniform
- Normal
- Multivariate normal
- Log normal
- Empirical
- Discrete Distributions
- Bernoulli
- Binomial
- Poisson
- Characterizing Datasets
- Calculating Moments
- Sample moments
- Updating moments
- Descriptive Statistics
- Count
- Sum
- Min
- Max
- Mean
- Median
- Mode
- Variance
- Standard deviation
- Error on the mean
- Skewness
- Kurtosis
- Multivariate Statistics
- Covariance and Correlation
- Covariance
- Pearsons correlation
- Regression
- Simple regression
- Multiple regression
- Calculating Moments
- Working with Large Datasets
- Accumulating Statistics
- Merging Statistics
- Regression
- Using Built-in Database Functions
- The Probabilistic Origins of Data
- 4. Data Operations
- Transforming Text Data
- Extracting Tokens from a Document
- Utilizing Dictionaries
- Vectorizing a Document
- Scaling and Regularizing Numeric Data
- Scaling Columns
- Min-max scaling
- Centering the data
- Unit normal scaling
- Scaling Rows
- L1 regularization
- L2 regularization
- Matrix Scaling Operator
- Scaling Columns
- Reducing Data to Principal Components
- Covariance Method
- SVD Method
- Creating Training, Validation, and Test Sets
- Index-Based Resampling
- List-Based Resampling
- Mini-Batches
- Encoding Labels
- A Generic Encoder
- One-Hot Encoding
- Transforming Text Data
- 5. Learning and Prediction
- Learning Algorithms
- Iterative Learning Procedure
- Gradient Descent Optimizer
- Evaluating Learning Processes
- Minimizing a Loss Function
- Linear loss
- Quadratic loss
- Cross-entropy loss
- Bernoulli
- Multinomial
- Two-Point
- Minimizing the Sum of Variances
- Silhouette Coefficient
- Log-Likelihood
- Classifier Accuracy
- Minimizing a Loss Function
- Unsupervised Learning
- k-Means Clustering
- DBSCAN
- Dealing with outliers
- Optimizing radius of capture and minPoints
- Inference from DBSCAN
- Gaussian Mixtures
- Gaussian mixture model
- Fitting with the EM algorithm
- Optimizing the number of clusters
- Supervised Learning
- Naive Bayes
- Gaussian
- Multinomial
- Bernoulli
- Iris example
- Linear Models
- Linear
- Logistic
- Softmax
- Tanh
- Linear model estimator
- Iris example
- Deep Networks
- A network layer
- Feed forward
- Back propagation
- Deep network estimator
- MNIST example
- Naive Bayes
- Learning Algorithms
- 6. Hadoop MapReduce
- Hadoop Distributed File System
- MapReduce Architecture
- Writing MapReduce Applications
- Anatomy of a MapReduce Job
- Hadoop Data Types
- Writable and WritableComparable types
- Custom Writable and WritableComparable types
- Writable
- WritableComparable
- Mappers
- Generic mappers
- Customizing a mapper
- Reducers
- Generic reducers
- Customizing a reducer
- The Simplicity of a JSON String as Text
- Deployment Wizardry
- Running a standalone program
- Deploying a JAR application
- Including dependencies
- Simplifying with a BASH script
- MapReduce Examples
- Word Count
- Custom Word Count
- Sparse Linear Algebra
- A. Datasets
- Anscombes Quartet
- Sentiment
- Gaussian Mixtures
- Iris
- MNIST
- Index