Enterprise Data Lake and Business Considerations

Akshey Gupta, Vice President – Chief Data Officer, Deutsche Bank

Data is a lifeline to any enterprise or domain today. Based on various studies, Investment Banking has the highest data stored per firm. Actually, it’s the velocity of extremely large and diverse data sets which is posing a bigger challenge. With the aim of increasing the return on assets, enterprises need to understand real-time 360 view of their customers & competitors.

Despite huge investments in Data and Analytics space, Industry is facing massive challenges in the form of Data silos, connecting complex financial products, Data quality, Timeliness of Data availability etc. In fact, it is suggested that decoding the real world scenarios is so complex that most advanced algorithms and analytical models are valid only for a particular instant of time. The key here is the capability to learn and keep feeding these models with real time data.

Smart Phones, internet availability and coverage is adding to the challenges.

Today, the term ‘Data Noise’ has become redundant, as Data, which is noise to one user / Department may be of importance to another. With Organizations, ready to take more risks, exploring ever new ways of acquiring customers, technology changing at an astronomical pace, incorporating a Business-Driven Data Lake is seen as a potential solution.

Enterprise Data Lake Maturity Journey

Enterprise Data Lake is a Journey of Tech architecture evolving as per Business needs. This journey can be divided across four stages.

Stage 1 is ‘Reactive’ and comes live as result of a sudden need to implement certain business use case. This is like testing the waters!

Stage 2 is ‘Consolidation’. While implementing stage 1, an enterprise realizes in the background that there are some inherent benefits of implementing a Data lake. This includes bringing down the overall operational cost by consolidating the Siloed / individual data warehouses or Data stores. Also with bringing the archived data alive; business response time decreases. Data Management costs come down. Most organizations stand at this stage

Stage 3 is ‘Proactive’. By now, business has seen the benefits of consolidation. Enterprise realizes that it can scale up & start internally commercializing the design. It also wants to now bring in external data emerging from social networking sites, Chat windows, Internet of things platforms and combine with internal data to create a 360-degree view. Business wants to focus on re-use, reduce duplication and a platform for exploratory analytics.

Here an Enterprise should be ready and mature enough to drop a Technology initiative, should it not bring the required business benefit. This should not be seen as a failure by the IT department but instead a learning opportunity to reach its final goal. This also brings an important aspect that Design should be flexible and any component changed should have a minimal impact to the entire Business ecosystem.

Stage 4 is about generating efficiency and continuous Optimization with a feedback. It is about feeding the learning back into the system.

ID Administration and Data Security

Data security and Privacy is an important aspect, often forgotten while designing a Data Lake. Quite a few of the new age products in the market don’t have a clear view on authentication and authorization. Integration with ID admin applications existing in an enterprise has also been an area of concern. With stringent regulatory laws and the recent penalties applied on various Banks, ignoring this aspect is far more costly. Data lake design needs to consider the business requirements on amount & type of data that can be shared or visible across line of business. This should be tangibly substantiated with user navigation traceability. 

Apart from above prevention and regulatory considerations, it can also help generate new Business like Monetizing Consumer Data.

Telecomm and Banking sector naturally has enough consumer details to sketch a complete customer profile. They also possess large insights on a consumer transactional, Operational and Geo-Location patterns. This information carries a lot of value for merchants, advertisers and business houses. At the same time, there exist regulations to ensure that privacy of a consumer is not sacrificed. In such a condition, to generate revenue from this data, it needs to be published in a manner that personal information is masked.

Need of a Data Librarian

With Zeta Bytes of data existing in data lakes, there is a strong need of a Data Librarian. The role has the responsibility of organizing, indexing and cataloguing the data to keep it discoverable. Here, technology chosen should have the capability to dynamically discover the data and tag it in the required index. This is very much similar to like organizing a 5000 page book on various laws that exist in a country.

With a need to manage heterogeneous data sources, Validate the Data quality, Lineage, Security engine to manage data access, a presentation layer is a must-have to facilitate Data Discovery, tracing access & facilitating audits.

The Enterprise Data Lake enables the ‘write – back’ capability and hence lays the foundation for predictive and prescriptive analytics. However, Data lake concept will continue to evolve and challenge its own best practices acquired over a period of time.