Idea Transcript
Big Data for Compliance and Industry Trends Nadeem Asghar Field CTO Financial Services Hortonworks
RegOps COE – Project Overview Project Drivers • • •
•
Data requirements for controls implementation of regulatory reports Centralized Repository for all order related data Foundation for CAT regulation
•
Single source for all regulatory reporting, compliance reporting and inquiry request Potential for Business usecases like TCA, Benchmarking, etc.
High Level Scope Data Sourcing & Storage
Internal & External Reporting
The data sourcing and storage components forms the backbone of RegOps COE, and is composed of a data warehouse that ingests, formats and stores order, trade, transaction and reference data.
This component will generate ad-hoc customized reports as well as canned reports for the purpose of internal analysis and external regulatory inquiries like TMMS Exam, etc.
Regulatory Reporting:
Control Framework
This component is the regulatory reporting engine, which will use the data from the core TDP and generate and submit multiple regulatory report, including but not limited to OATS, BlueSheet, Trace, MSRB, Insite, LOPR and Large Trader.
This component is a structural framework that will enable comprehensive data quality, timeliness and completeness checks to help identify inherent issues in firms’ trading data.
Functional Components Data Sourcing
Data Retention
Ingest different types of data from multiple sources, this data can be categorized in three main categories: i. Transaction Data (e.g.: order, execution, fills, allocations, etc.) ii. Position Data (e.g.: Client/Firm SOD, EOD positions, etc.) iii. Reference Data (e.g.: products, accounts, counterparties, etc.)
The submitted data must be retained for analysis, investigation, regulatory and legal purpose. At the minimum, the solution will follow standard data retention requirement from Rule 613 Consolidated Audit Trail (CAT) requirement, minimally 6 years. Data must be retained in such a way that the raw data from each source can be logically separate from all the sources.
Data Storage
Control Framework
All the data from different sources will be stored in central repository . It is expected that the repository will have capacity high enough to store extremely large volume of data. Any regulatory reported data should be stored in WORM compliant format.
The control framework is responsible for maintaining the logical integrity and accuracy of the data submitted by the subscribers. Below are some of the main functionalities of control framework: i. Data quality and completeness controls ii. Reconciliations iii. Error handling/Exception management iv. Audit Trails v. Automatic data quarantine capability
Processing Data obtained from all the sources needs to be processed according to pre-defined rules. Following are some of the key processing functionalities: i. Normalization/Linkages ii. Replays/Reprocessing iii. Symbology/Cross-Currency Support iv. Regulatory report generation and submission v. (Foundation for) Surveillance routine coverage and alert generation 2 | Regulatory Operations Technology | 5/2/2017
Security and Data Access Data from various sources should only be accessed to personnel from that source and/or any personnel with proper level of data access privileges. A hierarchy of data access should be implemented and maintained which grants privileges.
RegOps COE – Technical Overview Technical Components Data Sourcing
Reporting
i. ii. iii. iv.
i. ii. iii. iv.
Structured data from various systems Support for multiple types of structured data High volumes , approx. 200 – 500 million events/day Real-time, intraday & EOD batch processing
Support for all Order based reporting (including global regulatory report) Support for Micro-batches or EOD reporting (real-time not a must) Various type s of Daily, Weekly, Monthly and Quarterly reports Report Level Adjustment e.g.: Regulators rejects & mismatches
Processing
Data Analysis and UI Tool
i. ii. iii. iv. v. vi. vii. viii.
i. ii. iii. iv.
Micro-batches or EOD (real-time not a must) Technical key creation Temporal milestoning/versioning Trade Linkage Symbology/Cross-Currency Support Replays/Reprocessing Data consistency Least /Minimal replication
UI for data query, pivot and mining Usage of graph (nice to have) Canned and Uncanned reports [adhoc reports] Role based authentication
Workflow Tool i. ii.
Process management Data correction
Data Quality & Control Framework i. ii.
iii. iv. v.
Data quality and completeness controls i. Field, across fields and across rows within a group Reconciliations i. Source to target, target to different target Error handling/Exception management Audit Trails Automatic data quarantine capability
Data Storage & Data Retention i. ii.
iii. iv.
Support WORM storage [no deletion] Data usage i. Primary usage for upto 5 days data ii. Secondary usage upto 3 months data iii. Readily available upto 2 years iv. Otherwise upto 6 years Books & Records for regulatory purpose Approx. 15TB of uncompressed data per year
3 | Regulatory Operations Technology | 5/2/2017
Security and Data Access i.
Data security/confidentiality/privacy
Other considerations i. ii. iii. iv. v. vi. vii.
SDLC compliant Logging; metrics Cost Archiving 2.5X peak volume certified Scalability, Stability and Resiliency BCP/quick recovery
4 | Regulatory Operations Technology | 5/2/2017
5 | Regulatory Operations Technology | 5/2/2017
6 | Regulatory Operations Technology | 5/2/2017
7 | Regulatory Operations Technology | 5/2/2017
8
© Hortonworks Inc. 2011 – 2016. All Rights Reserved