Financial Data Engineering - Helion
ISBN: 9781098159955
stron: 506, Format: ebook
Data wydania: 2024-10-09
Księgarnia: Helion
Cena książki: 203,15 zł (poprzednio: 247,74 zł)
Oszczędzasz: 18% (-44,59 zł)
Today, investment in financial technology and digital transformation is reshaping the financial landscape and generating many opportunities. Too often, however, engineers and professionals in financial institutions lack a practical and comprehensive understanding of the concepts, problems, techniques, and technologies necessary to build a modern, reliable, and scalable financial data infrastructure. This is where financial data engineering is needed.
A data engineer developing a data infrastructure for a financial product possesses not only technical data engineering skills but also a solid understanding of financial domain-specific challenges, methodologies, data ecosystems, providers, formats, technological constraints, identifiers, entities, standards, regulatory requirements, and governance.
This book offers a comprehensive, practical, domain-driven approach to financial data engineering, featuring real-world use cases, industry practices, and hands-on projects.
You'll learn:
Tamer Khraisha, PhD, is a senior data engineer and scientific author with more than a decade of experience in the financial sector.
Osoby które kupowały "Financial Data Engineering", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Financial Data Engineering eBook -- spis treści
- Foreword
- Preface
- Who Should Read This Book?
- Prerequisites
- What to Expect from This Book
- Book Resources and References
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- I. Foundations of Financial Data Engineering
- 1. Financial Data Engineering Clarified
- Defining Financial Data Engineering
- First of All, What Is Finance?
- Finance as an economic function
- Finance as a market
- Finance as a research field
- Finance as a technology
- Defining Data Engineering
- Defining Financial Data Engineering
- First of All, What Is Finance?
- Why Financial Data Engineering?
- Volume, Variety, and Velocity of Financial Data
- Volume
- Velocity
- Variety
- Finance-Specific Data Requirements and Problems
- Financial Machine Learning
- Supervised learning
- Unsupervised learning
- Reinforcement learning
- The Disruptive FinTech Landscape
- Regulatory Requirements and Compliance
- Volume, Variety, and Velocity of Financial Data
- The Financial Data Engineer Role
- Description of the Role
- Where Do Financial Data Engineers Work?
- FinTech
- Commercial banks
- Investment banks
- Asset management firms
- Hedge funds
- Regulatory institutions
- Financial data vendors
- Security exchanges
- Big tech firms
- Responsibilities and Activities of a Financial Data Engineer
- Starting with data
- Scaling with data
- Leading with data
- Skills of a Financial Data Engineer
- Financial domain knowledge
- Technical data engineering skills
- Business and soft skills
- Summary
- Defining Financial Data Engineering
- 2. Financial Data Ecosystem
- Sources of Financial Data
- Public Financial Data
- Regulatory disclosure requirements
- Public institutional and governmental data
- Public research data
- Free stock market APIs
- Security Exchanges
- Commercial Data Vendors, Providers, and Distributors
- Bloomberg
- LSEG Eikon
- FactSet
- S&P Global Market Intelligence
- Wharton Research Data Services
- Survey Data
- Alternative Data
- Confidential and Proprietary Data
- Public Financial Data
- Structures of Financial Data
- Time Series Data
- Cross-Sectional Data
- Panel Data
- Matrix Data
- Graph Data
- Simple graphs
- Directed graphs
- Weighted graphs
- Multipartite graphs
- Temporal graphs
- Multilayer graphs
- Text Data
- Types of Financial Data
- Fundamental Data
- Market Data
- Transaction Data
- Transaction specifications
- Initiation date
- Settlement date
- Settlement method
- Transaction parties
- Analytics Data
- Alternative Data
- Reference Data
- Entity Data
- Benchmark Financial Datasets
- Center for Research in Security Prices
- Compustat Financials
- Trade and Quote Database
- Institutional Brokers Estimate System
- IvyDB OptionMetrics
- Trade Reporting and Compliance Engine
- Orbis Global Database
- SDC Platinum
- Standard & Poors Dow Jones Indices
- Alternative Datasets
- BitSight Security Ratings
- Global New Vehicle Registrations
- Weather Source
- Patent data
- Summary
- Sources of Financial Data
- 3. Financial Identification Systems
- Financial Identifiers
- Financial Identifier and Identification System Defined
- The Need for Financial Identifiers
- Who Creates Financial Identification Systems?
- International Organization for Standardization (ISO)
- National Numbering Agencies
- Financial data vendors
- Financial institutions
- Desired Properties of a Financial Identifier
- Uniqueness
- Globality
- Scalability
- Completeness
- Accessibility
- Timeliness
- Authenticity
- Granularity
- Permanence
- Immutability
- Security
- Financial Identification Systems Landscape
- International Securities Identification Number
- Classification of Financial Instruments
- Financial Instrument Short Name
- Committee on Uniform Security Identification Procedures
- Legal Entity Identifier
- Transaction Identifiers
- Stock Exchange Daily Official List
- Ticker Symbols
- Derivative Identifiers
- Option symbol
- CFI, UPI, and OTC ISIN
- Alternative Instrument Identifier
- Financial Instrument Global Identifier
- FactSet Permanent Identifier
- LSEG Permanent Identifier
- Digital Asset Identifiers
- Industry and Sector Identifiers
- Bank Identifiers
- Summary
- Financial Identifiers
- 4. Financial Entity Systems
- Financial Entity Defined
- Financial Named Entity Recognition
- Named Entity Recognition Described
- How Does Named Entity Recognition Work?
- Data preprocessing
- Entity extraction
- Entity categorization
- Entity disambiguation
- Evaluation
- Approaches to Named Entity Recognition
- Lexicon/dictionary-based approach
- Rule-based approach
- Feature-engineering machine learning approach
- Deep learning approach
- Large language models
- Wikification
- Knowledge graphs
- Named Entity Recognition Software Libraries
- Financial Entity Resolution
- Entity Resolution Described
- The Importance of Entity Resolution in Finance
- Multiple identifiers
- Missing identifiers
- Data aggregation and integration
- Data deduplication
- How Does Entity Resolution Work?
- Data preprocessing
- Indexing
- Comparison
- Classification
- Evaluation
- Approaches to Entity Resolution
- Deterministic linkage
- Link tables
- Exact matching
- Rule-based matching
- Probabilistic linkage
- Supervised machine learning approach
- Deterministic linkage
- Entity Resolution Software Libraries
- Summary
- 5. Financial Data Governance
- Financial Data Governance
- Financial Data Governance Defined
- Financial Data Governance Justified
- Data Quality
- Dimension 1: Data Errors
- Dimension 2: Data Outliers
- Dimension 3: Data Biases
- Dimension 4: Data Granularity
- Dimension 5: Data Duplicates
- Dimension 6: Data Availability and Completeness
- Dimension 7: Data Timeliness
- Dimension 8: Data Constraints
- Dimension 9: Data Relevance
- Data Integrity
- Principle 1: Data Standards
- Principle 2: Data Backups
- Principle 3: Data Archiving
- Principle 4: Data Aggregation
- Principle 5: Data Lineage
- Principle 6: Data Catalogs
- Principle 7: Data Ownership
- Principle 8: Data Contracts
- Principle 9: Data Reconciliation
- Data Security and Privacy
- Data Privacy
- Data Anonymization
- Anonymization strategy
- Anonymization techniques
- Data Encryption
- Access Control
- Summary
- Financial Data Governance
- II. The Financial Data Engineering Lifecycle
- 6. Overview of the Financial Data Engineering Lifecycle
- Financial Data Engineering Lifecycle Defined
- Criteria for Building the Financial Data Engineering Stack
- Criterion 1: Open Source Versus Commercial Software
- Criterion 2: Ease of Use Versus Performance
- Criterion 3: Cloud Versus On Premises
- On premises
- Cloud computing
- Criterion 4: Public Versus Private Versus Hybrid Cloud
- Public cloud
- Private cloud
- Hybrid cloud
- Criterion 5: Single Versus Multi-Cloud
- Criterion 6: Monolithic Versus Modular Codebase
- Monolith architecture
- Modular architecture
- Summary
- 7. Data Ingestion Layer
- Data Transmission and Arrival Processes
- Data Transmission Protocols
- Application layer
- Transport layer
- Network layer
- Network access layer
- Data Arrival Processes
- Scheduled data arrival process
- Event-driven data arrival process
- Homogeneous data arrival process
- Heterogeneous data arrival process
- Single-item data arrival process
- Bulk data arrival process
- Data Transmission Protocols
- Data Ingestion Formats
- General-Purpose Formats
- Big Data Formats
- In-Memory Formats
- Standardized Financial Formats
- Financial Information eXchange (FIX)
- eXtensible Business Reporting Language (XBRL)
- Financial products Markup Language (FpML)
- Open Financial Exchange (OFX)
- Universal Financial Industry Message Scheme (ISO 20022)
- Data Ingestion Technologies
- Financial APIs
- Financial Data Feeds
- Secure File Transfer
- Cloud Access
- Web Access
- Specialized Financial Software
- Data Ingestion Best Practices
- Meet Business Requirements
- Design for Change
- Enforce Data Governance
- Perform Benchmarking and Stress Testing
- Summary
- Data Transmission and Arrival Processes
- 8. Data Storage Layer
- Principles of Data Storage System Design
- Principle 1: Business Requirements
- Principle 2: Data Modeling
- Principle 3: Transactional Guarantee
- Principle 4: Consistency Tradeoffs
- Principle 4: Scalability
- Principle 5: Security
- Data Storage Modeling
- SQL Versus NoSQL
- Primary Versus Secondary
- Operational Versus Analytical
- Native Versus Non-Native
- Multi-Model Versus Polyglot Persistence
- Data Storage Models
- The Data Lake Model
- Why data lakes?
- Technological implementations of data lakes
- Data modeling with data lakes
- Data governance
- Financial use cases of data lakes
- The Relational Model
- Why relational databases?
- SQL standards
- ACID transactions
- Analytical querying
- Schema enforcement
- Data modeling with relational databases
- Normalization
- Constraints
- Indexing of relational databases
- Technological implementations of relational databases
- Financial use cases of relational databases
- Why relational databases?
- The Document Model
- Why document databases?
- Data modeling with document databases
- Document and collection structure
- Denormalization
- Indexing of document databases
- Technological implementations of document databases
- Financial use cases of document databases
- The Time Series Model
- Why time series databases?
- Data modeling with time series
- Technological implementations of time series databases
- Financial use cases of time series databases
- The Message Broker Model
- Why message brokers?
- Data modeling with message brokers
- Topic modeling
- Message schemas
- Technological implementations of message brokers
- Financial use cases of message brokers
- The Graph Model
- Why a graph model?
- Data modeling with graph databases
- Technological implementations of graph databases
- Financial use cases of graph databases
- The Warehouse Model
- Why data warehouses?
- Data modeling with data warehouses
- Technological implementations of data warehousing
- Financial use cases of data warehouses
- The Blockchain Model
- The Data Lake Model
- Summary
- Principles of Data Storage System Design
- 9. Data Transformation and Delivery Layer
- Data Querying
- Querying Patterns
- Time series queries
- Cross-section queries
- Panel queries
- Analytical queries
- Query Optimization
- Database-side query optimization
- User-side query optimization
- Scenario 1
- Scenario 2
- Scenario 3
- Scenario 4
- Scenario 5
- Scenario 6
- Querying Patterns
- Data Transformation
- Transformation Operations
- Format conversion
- Data cleaning
- Data adjustments
- Data standardization
- Data filtering
- Feature engineering
- Advanced analytical computations
- Transformation Patterns
- Batch versus streaming transformations
- Memory-based versus disk-based transformations
- Full versus incremental data transformations
- Computational Requirements
- Computational performance
- Computational speed
- Throughput
- Computational efficiency
- Scalability
- Computing environments
- Computational performance
- Transformation Operations
- Data Delivery
- Data Consumers
- Delivery Mechanisms
- Summary
- Data Querying
- 10. The Monitoring Layer
- Metrics, Events, Logs, and Traces
- Metrics
- Events
- Logs
- Traces
- Data Quality Monitoring
- Performance Monitoring
- Cost Monitoring
- Business and Analytical Monitoring
- Data Observability
- Summary
- Metrics, Events, Logs, and Traces
- 11. Financial Data Workflows
- Workflow-Oriented Software Architectures
- What Is a Data Workflow?
- Workflow Management Systems
- Flexibility
- Configurability
- Dependency Management
- Coordination Patterns
- Scalability
- Integration
- Types of Financial Data Workflows
- Extract-Transform-Load Workflows
- Stream Processing Workflows
- Microservice Workflows
- Machine Learning Workflows
- Summary
- 12. Hands-On Projects
- Prerequisites
- Project 1: Designing a Bank Account Management System Database with PostgreSQL
- Conceptual Model: Business Requirements
- Entities
- Relationships
- Constraints
- Logical Model: Entity Relationship Diagram
- Physical Model: Data Definition and Manipulation Language
- Project 1: Local Testing
- Project 1: Clean Up
- Project 1: Summary
- Conceptual Model: Business Requirements
- Project 2: Designing a Financial Data ETL Workflow with Mage and Python
- Project 2: Workflow Definition
- Project 2: Database Design
- Project 2: Local Testing
- Project 2: Clean Up
- Project 2: Summary
- Project 3: Designing a Microservice Workflow with Netflix Conductor, PostgreSQL, and Python
- Project 3: Workflow Definition
- Project 3: Database Design
- Project 3: Local Testing
- Project 3: Clean Up
- Project 3: Summary
- Project 4: Designing a Financial Reference Data Store with OpenFIGI, PermID, and GLEIF APIs
- Project 4: Prerequisites
- Project 4: Local Testing
- Project 4: Clean Up
- Project 4: Summary
- Conclusion
- Follow Updates on These Projects
- Report Issues or Ask Questions
- The Path Forward: Trends Shaping Financial Markets
- Financial Integration
- Digitalization of Financial Markets and Cloud Adoption
- Financial Regulation
- Financial Data Sharing and Marketplaces
- Financial Standardization
- Artificial Intelligence and Language Models
- Architectures for Specific Business Domains
- Data Collection
- Speed and Efficiency
- Tokenization, Blockchain, and Digital Currencies
- What Can You Do Next?
- Afterword
- Index