3 productivity-killing data problems and how to solve them

Three data complexities are at the core of every leader’s challenge to gain business advantages from their data.
30 June 2020

Data lakes can take your data out of silos. Source: Shutterstock

With the typical enterprise using hundreds of SaaS solutions, each with its own database, it’s no wonder business leaders complain their data is siloed. Imagine now that a CEO wants to understand the relationship between data in these disparate systems.

All of a sudden, she’s looking at the world’s most confusing dashboard, all the while wondering: Can I trust this information? The CEO placates themselves with the knowledge that at least they have data to look at. But in the end, it creates more questions than answers.

If you’re in a competitive industry (which we all are), it’s high time that CEOs take their data analysis to the next level. How to do it? Three data complexities are at the core of every leader’s challenge to gain business advantages from their data:

Siloed data

Do you have trouble seeing your data at all? Are you mentally scanning your systems and realizing just how many databases you have? An enterprise organization may be collecting reams of data from its industrial operations but can’t derive the data’s value due to the siloed nature of its datacenter database. The data isn’t reaching any dashboard in a meaningful way – it is a common problem. With enterprise data doubling every few years, it takes modern tools and strategies to keep up.

The company can begin the process of solving the problem by defining the business purpose of its industrial data – to predict demand in the coming months to avoid a shortfall. That business purpose, with buy-in at multiple corporate levels, drives the entire engagement and can allow the company to keep the technology simple and focus on the outcome. The result is clean, trustworthy, valuable data in a dashboard, which can be unlocked from the database and published.

Siloed data takes some elbow grease to access, but it becomes a lot easier if you have a goal in mind for the data. It cuts through noise and helps you make decisions more easily if you know where you are going.

Untrustworthy data

Do you have trouble trusting your data? You have a dashboard, yet you’re pretty sure the data is wrong, or lots of it is missing. You can’t take action on it, because you hesitate to trust it. Data trustworthiness is a prerequisite for making your data action-oriented. But, most data has problems – missing values, invalid dates, duplicate values, and meaningless entries. If you don’t trust the numbers, you’re better off without the data.

Data is there for you to take action on, so you should be able to trust it. One key strategy is to not bog down your team with maintaining systems, but rather use simple, maintainable cloud-based systems that use modern tools to make your dashboard real.

No data

Often you don’t even have the data you need to make a decision. “No data” comes in many forms:

  • You don’t track it. For example, you’re an e-commerce company that wants to understand how email campaigns can help your sales, but you don’t have a customer email list.
  • You track it but you can’t access it. For example, you start collecting emails from customers, but your email SaaS system doesn’t let you export your emails. Your data is so siloed that it effectively doesn’t exist for analysis.
  • You track it but need to do some calculations before you can use it. For example, you have a full customer email list, a list of product purchases, and you just need to join the two together. This is a great place to be and is where we see the vast majority of companies.

That means finding patterns and insights not just within datasets, but across datasets. This is only possible with a modern, cloud-native data lake.

Data Lakes

Step one for any data project – today, tomorrow and forever – is to define your business need.

Do you need to understand your customer better? Whether it is click behavior, email campaign engagement, order history, or customer service, your customer generates more data today than ever before, and the data can give you clues as to what she cares about.

Do you need to understand your costs better? Most enterprises have hundreds of SaaS applications generating data from internal operations. Whether it is manufacturing, purchasing, supply chain, finance, engineering, or customer service, your organization is generating data at a rapid pace.

Don’t be overwhelmed. You can cut through the noise by defining your business case.

The second step in your data project is to take that business case and make it real in a cloud-native data lake. Yes, a data lake. I know the term has been abused over the years, but a data lake is very simple; it’s a way to centrally store all (all!) of your organization’s data, cheaply, in open source formats to make it easy to access from any direction.

Data lakes used to be expensive, difficult to manage, and bulky. Now, all major cloud providers (AWS, Azure, GCP) have established best practices to keep storage dirt-cheap and data accessible and very flexible to work with. But data lakes are still hard to implement and require specialized, focused knowledge of data architecture.

How does a data lake solve the above problems?

  1. Data lakes de-silo your data. Since the data stored in your data lake is all in the same spot, in open-source formats like JSON and CSV, there aren’t any technological walls to overcome. You can query everything in your data lake from a single SQL client. If you can’t, then that data is not in your data lake and you should bring it in.
  2. Data lakes give you visibility into data quality. Modern data lakes and expert consultants build in a variety of checks for data validation, completeness, lineage, and schema drift. These are all important concepts that together tell you if your data is valuable or garbage. These sorts of patterns work together nicely in a modern, cloud-native data lake.
  3. Data lakes welcome data from anywhere and allow for flexible analysis across your entire data catalog. If you can format your data into CSV, JSON, or XML, then you can put it in your data lake. This solves the problem of “no data.” It is very easy to create the relevant data, either by finding it in your organization, or engineering it by analyzing across your data sets. An example would be joining data from Sales (your CRM) and Customer Service (Zendesk) to find out which product category has the best or worst customer satisfaction scores.

If you’re struggling with one of these three core data issues, the solution is to start with a crisp definition of your business need, and then build a data lake to execute on that need. A data lake is just a central repository for flexible and cheap data storage. If you focus on keeping your data lake simple and geared towards the analysis you need for your business, these three core data problems will be a thing of the past.

This article was contributed by Robert Whelan, data engineering & analytics practice manager at 2nd Watch.