Lessons I Learned From Building Self-Service Data Platform | by Manvik Kathuria | Jul, 2022

Is distributed data ownership right for your organisation?

Photo by Kenny Eliason on Unsplash

You walk into your nearest supermarket, grab a basket, fill it with whatever you need, and walk to the checkout counter. You scan the barcodes of the items, pay, put them in your bag and leave the place—all of it without interacting with a single person in the supermarket.

Imagine that your business stakeholders are looking to answer a particular question. They bring their data into a data lake or a data warehouse, curate it, and create a BI report for it. Once again, doing this without depending on a team to enable them. Welcome to the self-service data platform — “A platform to enable stakeholders to ingest, curate, share and report on the data they own without relying on any other team.”

Traditionally, data platforms have been built, maintained, and iterated by central teams who manage both infrastructure and data. This creates a dependency and a perception of a bottleneck for new initiatives and requirements. To solve this challenge, a lot of organisations commenced on the path of distributed data ownership or, in simple words, multitenancy.

Each data owner is responsible for building, maintaining and operating their data pipelines. The platform team will only be responsible for the underlying infrastructure and deliver new capabilities to make life easier for the data owners. Theoretically, this is a fantastic idea where you empower your end users to be self-dependent. In reality, it’s not all roses.

Most of the challenges with self-service are not due to the underlying technology that pins the platform. Let’s discuss the noteworthy ones.

Senior management blessing

Moving to a self-service data platform is more a mindset shift than a technology change. Getting buy-in from your senior management is imperative to the success of your initiative. This helps steer the organization and the use of data in the right direction.

Photo by Anton on Unsplash

The senior leadership is primarily responsible for evangelizing the concept of self-service analytics. Unless they see value in it, they won’t be able to convince their counterparts to adopt it.

Lack of relevant skills

One of the most prevalent complaints I hear from data owners is that we don’t have people skilled in data engineering in our team. If you think about the basis of the existence of the self-service concept, you realise that we are empowering and enabling our users.

Photo by Isaac Smith on Unsplash

These users have always been analysts or business users who don’t come from an engineering mindset. Asking them to do data engineering is not an easy feat to accomplish. Hence those teams who hired analysts need to start now hiring data engineers. Although it’s doable, the time it will take to hire, ramp up and start delivering is still a couple of months.

You would think it is a one-time investment to hire data engineers and build a team that can care for everything. Envision if you have tens of such groups hiring data engineers and then multiply it by the cost of acquiring these resources. We are now looking at tens of skilled engineers scattered across your organisation.

Reinventing the wheel

Every team has ways of working that it has adopted based on backlog, priorities, and objectives. The same applies to distributed data engineering teams. Implementing standard ingestion, curation, sharing, and reporting patterns is arduous. Most of the time, your data platform will suffer from a lack of common data engineering practices.

Photo by Markus Spiske on Unsplash

This is a substantial problem due to the interdependence of sharing and delivering data to other internal and external teams. Multiple teams implement different patterns in silo to solve the same problem, essentially re-inventing the wheel.

Managing access

Handing over control of data and pipelines to multiple business teams possesses a new challenge to access and security of data. No matter how many rules you apply, there will always be a risk of data proliferation. The more access teams have, the more the risk of data leakage and incorrect access across the organization is amplified.

Photo by Kyle Glenn on Unsplash

There is a fine line between restricting access and enabling users to do their job. Too much restriction and nothing would get done; too less would mean data is accessible more than needed. Having enough controls to decrease your operational risk and enable self-service analytics is essential.

Socialise

The most potent tool in democratizing your data platform is getting promoters; The only way is to socialize your idea with them. Showcasing the capabilities and the benefit that your platform will bring even before it gets developed will start you on the right foot.

Start small-scale self-service

As much as it sounds great, asking your business teams to self-service end to end might not be a great idea. The goal is to enable complete self-service; However, breaking it down into small steps will only help your cause. Maybe start with perfect self-service reporting and let the platform build standard frameworks and patterns for ingestion and curation. Begin by handing over the responsibilities not to overwhelm the business teams but balance your present with the future direction.

Guardrails

If you give a pen and paper to a group of people and ask them to draw a picture, be assured no two will be the same. The same applies to data engineering. To prevent chaos, implement standard practices and patterns and train the engineering teams to leverage them. The motive is to follow the same path if it already exists and carve out a new one if there’s none.

Improve collaboration

The need for constant increases collaboration with distributed teams working towards a common goal. Training sessions, chat groups, and regular workshops to share findings, learnings, processes, and practices will elevate the acceptance of your data platform.

Data platform has become an essential component of business strategy. Small to large organizations are hungry for insights quicker than ever before. Investing in a data platform needs careful thought and implementation. The last thing you want is to create something slow and unusable. Irrespective of the way you design your platform, make sure to put equal priority on the use of the platform, focusing on its use by stakeholders.

“Build a platform — prepare for the unexpected…you’ll know you’re successful when the platform you’ve built serves you in unexpected ways.” — Pierre Omidyar

Leave a Comment