Data Lineage: Where did our data come from?

By Johannes Kollross, Head of Product Management, AIM Software.

Understanding where data comes from or answering simple questions such as “who has modified this value and why?” remains a major challenge.

It is widely accepted that data quality underpins an asset manager’s ability to operate its business efficiently whilst also enabling it to meet client reporting and regulatory requirements. But without understanding the origin and evolution of data as it flows through the enterprise, it is impossible to verify the quality of that data. This is where data lineage comes in.

Data lineage is the key to tracking where data comes from, what changes may have been made to that data, by whom, when and why. A solid data lineage solution also enables you to respond to the increasingly frequent request from regulators who are seeking deeper transparency through a variety of regulations such as Dodd-Frank, EMIR, IFRS, AIFMD, Solvency II and MiFID II.

Yet 80% of asset managers state that they are still only in the early stages of understanding data lineage, according to the EDM Council. How can firms get to grips faster with the topic and implement a solution that gives them control over their data lineage?

Consider a typical situation in a financial organisation: The auditor calls and asks your firm to justify the prices used for NAV calculation in the previous week. To do this requires significant time investment and cost as your analysts have to investigate, starting with the week’s NAV report, then looking at the multiple systems involved, including reporting, data warehouse, portfolio management, accounting and the system that collects asset valuations from different pricing sources and selects correct prices. If there are any exceptions, the analyst may also have to export the suspect records, obtain valuations from the most-liquid market and then reimport corrected values into systems, before reporting back to the firm’s auditor.

The Challenges of Understanding Data Lineage

Understanding data lineage is difficult for three main reasons:

  1. Disparate sources of data in multiple systems spread across the enterprise;
  2. Gaps in ownership of data, standards and enforcement processes, which requires strong data governance;
  3. Use of spreadsheets, which are used widely but difficult to monitor and control.

A lack of insight into data lineage means more time is spent on data forensics. According to a study by IDC Research, data stewards can spend up to 50% of their time on data forensics when responding to requests from users in their business. Firms must be able to answer those requests quickly to explain the information to clients or portfolio managers.

Regulators continue to press for more granular transparency on data reported, as well. MiFID II and EMIR revisions include data quality improvement provisions. The Fundamental Review of the Trading Book also requires high-quality granular historical data in internal models, and IFRS-9 has data requirements including loan origination information.

When data sourcing and data quality controls are unknown, several “hidden costs” can also occur:

  • Redundant data control activity: The same controls are performed several times by different departments because there is no shared view of controls previously applied.
  • Incorrect bookings caused by unfit data.
  • Analytics and reporting initiatives requiring data quality stream work may be duplicating quality controls that have already been done.
  • Unnecessarily high data costs due to inefficient sourcing: Different business units acquire data directly from the vendor, even when that data is probably already available to them.
  • Market data usage and compliance risk: Being unable accurately relate data provenance and usage exposes the firm to difficult contract negotiations with data vendors. It also makes it difficult to assess the impact of a revised commercial model or licence agreement.
  • Lack of accuracy of analytics and models.
  • Lack of credible client reporting.
  • Slowdown of growth and M&A initiatives due to inability to integrate data sets from other entities.

Approaches to Take to Make Progress on Data Lineage

Over the next few years, the focus will shift from how much metadata can be collected, to how easily that information can be accessed and integrated.

Four key steps that can help asset managers to make progress on data lineage are:

  • Adopt an end-to-end business process perspective. Seek a solution with end-to-end transparency on the steps in the process that changed a data record. This provides a starting point for analysis of how prices were selected and controlled, and an understanding of changes made to reference data records.
  • Provide access to data lineage information in a meaningful way. Business users must have the independence to reach the information as soon as they need it. This requires applications such as user interface screens that deliver relevant information with the right business perspective. This can also mean exploring the information through visual charts, dashboards and a company glossary of business terms and concepts.
  • Centralize the sourcing and distribution of market and reference data. Having one centralized system that provides a firm-wide view of data request execution and usage makes data lineage information far more accessible, and also facilitates compliance with licencing agreements.
  • Adopt a “data quality firewall” approach. Form a global team to ensure the quality of reference data before it reaches daily operations and systems. Use a central data management platform to enforce data policies and ensure data quality controls. A central platform can systematically record all changes to the data, as well as to policies, quality controls and portfolio parameters.

Regulation likely will continue to change, analytics requirements will evolve and more complex financial products will continue to emerge. The only way firms can stay prepared is to strategically design and plan a consistent way to manage data flows, enforce data governance standards and make lineage information actionable throughout their organisation. That ensures that the technical implementation of the data fabric in an organisation is built intelligently so it can scale.

The right solution will cherry-pick technical assets and allow different lines of business to add processes, data sets and policies run by the organisation. Enabling customisable views that combine both business and technical information is critical to accessing data lineage information and using it effectively, the next step into establishing data as a trusted asset in the organisation.