book

Financial Data Engineering

Name: Financial Data Engineering
Author: Tamer Khraisha
ISBN: 9781098159993

by Tamer Khraisha

October 2024

Intermediate to advanced

506 pages

15h 54m

English

O'Reilly Media, Inc.

Audio summary available

Read now

Unlock full access

Foreword
Preface
Who Should Read This Book?PrerequisitesWhat to Expect from This BookBook Resources and ReferencesConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Foundations of Financial Data Engineering
1. Financial Data Engineering Clarified
Defining Financial Data EngineeringFirst of All, What Is Finance?Defining Data EngineeringDefining Financial Data EngineeringWhy Financial Data Engineering?Volume, Variety, and Velocity of Financial DataFinance-Specific Data Requirements and ProblemsFinancial Machine LearningThe Disruptive FinTech LandscapeRegulatory Requirements and ComplianceThe Financial Data Engineer RoleDescription of the RoleWhere Do Financial Data Engineers Work?Responsibilities and Activities of a Financial Data EngineerSkills of a Financial Data EngineerSummary
2. Financial Data Ecosystem
Sources of Financial DataPublic Financial DataSecurity ExchangesCommercial Data Vendors, Providers, and DistributorsSurvey DataAlternative DataConfidential and Proprietary DataStructures of Financial DataTime Series DataCross-Sectional DataPanel DataMatrix DataGraph DataText DataTypes of Financial DataFundamental DataMarket DataTransaction DataAnalytics DataAlternative DataReference DataEntity DataBenchmark Financial DatasetsCenter for Research in Security PricesCompustat FinancialsTrade and Quote DatabaseInstitutional Brokers’ Estimate SystemIvyDB OptionMetricsTrade Reporting and Compliance EngineOrbis Global DatabaseSDC PlatinumStandard & Poor’s Dow Jones IndicesAlternative DatasetsSummary
3. Financial Identification Systems
Financial IdentifiersFinancial Identifier and Identification System DefinedThe Need for Financial IdentifiersWho Creates Financial Identification Systems?Desired Properties of a Financial IdentifierUniquenessGlobalityScalabilityCompletenessAccessibilityTimelinessAuthenticityGranularityPermanenceImmutabilitySecurityFinancial Identification Systems LandscapeInternational Securities Identification NumberClassification of Financial InstrumentsFinancial Instrument Short NameCommittee on Uniform Security Identification ProceduresLegal Entity IdentifierTransaction IdentifiersStock Exchange Daily Official ListTicker SymbolsDerivative IdentifiersFinancial Instrument Global IdentifierFactSet Permanent IdentifierLSEG Permanent IdentifierDigital Asset IdentifiersIndustry and Sector IdentifiersBank IdentifiersSummary
4. Financial Entity Systems
Financial Entity DefinedFinancial Named Entity RecognitionNamed Entity Recognition DescribedHow Does Named Entity Recognition Work?Approaches to Named Entity RecognitionNamed Entity Recognition Software LibrariesFinancial Entity ResolutionEntity Resolution DescribedThe Importance of Entity Resolution in FinanceHow Does Entity Resolution Work?Approaches to Entity ResolutionEntity Resolution Software LibrariesSummary
5. Financial Data Governance
Financial Data GovernanceFinancial Data Governance DefinedFinancial Data Governance JustifiedData QualityDimension 1: Data ErrorsDimension 2: Data OutliersDimension 3: Data BiasesDimension 4: Data GranularityDimension 5: Data DuplicatesDimension 6: Data Availability and CompletenessDimension 7: Data TimelinessDimension 8: Data ConstraintsDimension 9: Data RelevanceData IntegrityPrinciple 1: Data StandardsPrinciple 2: Data BackupsPrinciple 3: Data ArchivingPrinciple 4: Data AggregationPrinciple 5: Data LineagePrinciple 6: Data CatalogsPrinciple 7: Data OwnershipPrinciple 8: Data ContractsPrinciple 9: Data ReconciliationData Security and PrivacyData PrivacyData AnonymizationData EncryptionAccess ControlSummary
II. The Financial Data Engineering Lifecycle
6. Overview of the Financial Data Engineering Lifecycle
Financial Data Engineering Lifecycle DefinedCriteria for Building the Financial Data Engineering StackCriterion 1: Open Source Versus Commercial SoftwareCriterion 2: Ease of Use Versus PerformanceCriterion 3: Cloud Versus On PremisesCriterion 4: Public Versus Private Versus Hybrid CloudCriterion 5: Single Versus Multi-CloudCriterion 6: Monolithic Versus Modular CodebaseSummary

7. Data Ingestion Layer
Data Transmission and Arrival ProcessesData Transmission ProtocolsData Arrival ProcessesData Ingestion FormatsGeneral-Purpose FormatsBig Data FormatsIn-Memory FormatsStandardized Financial FormatsData Ingestion TechnologiesFinancial APIsFinancial Data FeedsSecure File TransferCloud AccessWeb AccessSpecialized Financial SoftwareData Ingestion Best PracticesMeet Business RequirementsDesign for ChangeEnforce Data GovernancePerform Benchmarking and Stress TestingSummary
8. Data Storage Layer
Principles of Data Storage System DesignPrinciple 1: Business RequirementsPrinciple 2: Data ModelingPrinciple 3: Transactional GuaranteePrinciple 4: Consistency TradeoffsPrinciple 4: ScalabilityPrinciple 5: SecurityData Storage ModelingSQL Versus NoSQLPrimary Versus SecondaryOperational Versus AnalyticalNative Versus Non-NativeMulti-Model Versus Polyglot PersistenceData Storage ModelsThe Data Lake ModelThe Relational ModelThe Document ModelThe Time Series ModelThe Message Broker ModelThe Graph ModelThe Warehouse ModelThe Blockchain ModelSummary
9. Data Transformation and Delivery Layer
Data Querying Querying PatternsQuery OptimizationData TransformationTransformation OperationsTransformation PatternsComputational RequirementsData DeliveryData ConsumersDelivery MechanismsSummary
10. The Monitoring Layer
Metrics, Events, Logs, and Traces MetricsEventsLogsTracesData Quality MonitoringPerformance MonitoringCost MonitoringBusiness and Analytical MonitoringData ObservabilitySummary
11. Financial Data Workflows
Workflow-Oriented Software ArchitecturesWhat Is a Data Workflow?Workflow Management SystemsFlexibility Configurability Dependency ManagementCoordination PatternsScalability IntegrationTypes of Financial Data WorkflowsExtract-Transform-Load WorkflowsStream Processing WorkflowsMicroservice WorkflowsMachine Learning WorkflowsSummary
12. Hands-On Projects
PrerequisitesProject 1: Designing a Bank Account Management System Database with PostgreSQLConceptual Model: Business RequirementsLogical Model: Entity Relationship DiagramPhysical Model: Data Definition and Manipulation LanguageProject 1: Local TestingProject 1: Clean UpProject 1: SummaryProject 2: Designing a Financial Data ETL Workflow with Mage and PythonProject 2: Workflow DefinitionProject 2: Database DesignProject 2: Local TestingProject 2: Clean UpProject 2: SummaryProject 3: Designing a Microservice Workflow with Netflix Conductor, PostgreSQL, and PythonProject 3: Workflow DefinitionProject 3: Database DesignProject 3: Local TestingProject 3: Clean UpProject 3: SummaryProject 4: Designing a Financial Reference Data Store with OpenFIGI, PermID, and GLEIF APIsProject 4: PrerequisitesProject 4: Local TestingProject 4: Clean UpProject 4: SummaryConclusionFollow Updates on These ProjectsReport Issues or Ask Questions
The Path Forward: Trends Shaping Financial Markets
Financial IntegrationDigitalization of Financial Markets and Cloud AdoptionFinancial RegulationFinancial Data Sharing and MarketplacesFinancial StandardizationArtificial Intelligence and Language ModelsArchitectures for Specific Business DomainsData CollectionSpeed and EfficiencyTokenization, Blockchain, and Digital CurrenciesWhat Can You Do Next?
Afterword
Index
About the Author

Content preview from Financial Data Engineering

Chapter 1. Financial Data Engineering Clarified

Given all the payments, transfers, trades, and numerous financial activities that take place on a daily basis, can you imagine how much data the global financial sector generates? According to a 2011 report by McKinsey Global Institute, the banking and investment sector in the US alone stores and manages more than one exabyte of data. To put that in perspective, an exabyte is the equivalent of one billion gigabytes, and it translates into trillions of digital records. The same report shows that on average, financial services firms generate and store more data than firms in other sectors. Some statistics are even more astonishing; for instance, JPMorgan Chase, the largest bank in the United States by market capitalization, manages more than 450 petabytes of data. Bank of New York Mellon, a global financial services company specializing in investment management and investment services, manages over 110 million gigabytes of global financial data.

Naturally, we might extrapolate these estimates and figures to tens or even hundreds of exabytes if we take into account the global context and the constantly expanding financial landscape. As a result, data sits at the heart of the financial system, serving as both the input for different financial operations and the output generated from them. Importantly, to guarantee a healthy and well-functioning system, a reliable and secure data infrastructure is needed for generating, exchanging, storing, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Practical Statistics for Data Scientists, 2nd Edition

Publisher Resources

ISBN: 9781098159986Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Financial Data Engineering

by Tamer Khraisha

Chapter 1. Financial Data Engineering Clarified

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.