Over the years, software companies have given data a permanent seat at the table when it comes to making decisions. Making decisions on product, understanding customer behavior, breaking down growth activity is significantly easier when you have data to support your decisions. In the early days, startups can play by ear and use the intuition of founders to help make decisions faster.
These start off as simple questions like “Should we continue to run the facebook ad campaign for another couple of months? How can we guide more users to upgrade to a paid plan sooner?” But very quickly these seemingly trivial questions began having a significant impact on the bottom line depending on whether you get them right or wrong, and it makes sense to have some data to back your intuition.
Data is a great decision support system. It separates factual evidence from gut feelings, cause from effect, confidence from spray-and-pray when teams arrive at deadlocks. From experience, the best data teams do more than wrangle numbers. They seek to understand the business goals and dynamics and help break deadlocks, support or validate intuition.
For a company to have a data culture, anyone on the team should be able to ask questions of their data and more crucially not just any questions, the systems built around data should nudge them towards asking the right questions.
The complexity of being able to ask good questions of your data varies based on team size and who the asker is.
Large teams that have a team dedicated to data usually have the means to answer questions. The difficult bit here is to align this standalone data team with the question. To make them really understand why you’re asking what you’re asking, and what it signifies for the business. There is back-and-forth via tickets between data analysts, engineers, and the asker — which is slow and often resource intensive, but the end result is usually acceptable.
Needless to say, small teams beg-borrow-steal whatever they can via spreadsheets, fragmented tooling, and sometimes lots of manual labor.
Besides team size, whether or not you’re an engineer plays a huge role on how good or bad your “Data UX” will be at a team of any size. Engineers can write SQL queries by themselves, but often don’t know what they’re looking for. Non-engineers are data literate and often know precisely what they want to ask, but are locked out by the technical jargon that sits between them and the data. They rely on embedded dashboards and underestimate how complex simple questions can be.
If you’re an engineer, you still have to navigate access to raw data sources, credentials being shared with marketing/sales teams, and aligning with them on what the outcome is.
Okay, not that dramatic, but bear in mind that these are just the tip of the iceberg. A huge chunk of the problem is actually figuring out what to ask. Truth is, nobody knows the exact metrics that will be a gamechanger for their team of the top of their head. There is a huge element of experimentation involved.
This is great, in my opinion, because it implicitly treats every “metric” or “report” as a hypothesis and curbs biased questions. If you don’t know what to ask, you’re always on the lookout for evidence that you’re measuring the wrong thing.
In reality, having your data readily-explorable is the best way to be data-driven. Besides some key metrics (like revenue, churn, etc) which should be set in stone from day one and not fiddled with unnecessarily, your team needs to be constantly wrangling with data and presenting their findings.
In a way, the most effective kind of analytics is “disposable analytics”. It allows the “everything is a hypothesis” culture to take root and lets teams to constantly set up experiments, metrics, and measure how well they did without the inertia or reluctance to introduce new levers or metrics thinking it will take too long or cost too much to constantly change dashboards and reports. It also lets teams abandon experiments quickly and avoid a sunk cost fallacy.
Sadly the state of data tools is abysmal by today's standards. There is no slick UI that gets you from zero to hero (or even close). The modern data stack turned out to be a misnomer for “hosted data tools''. The reality is software from 7-8 vendors stitched together, many moving parts, and just overall poor user experience.
To reiterate, the drop in UX is not a nitpick. Modern teams use beautiful tools like Slack, Notion, Figma, but when it comes to data there are hardly any tools that come close to that level of attention to interface and user experience. Instead, there are many tools that no doubt do individual components very well, but without a doubt are designed with developers in mind, which sacrifices speed and resourcefulness.
That said, Python and SQL have emerged as winners as far as languages go. The user experience of tools that enable data exploration on top of them is poor, save a few new products in the category.
Over all, what data teams today want to do is:
First, get the data in one place. If your data lives in other SaaS tools like Google Analytics, Stripe, your CRM of choice, etc, you need to bring it to a single place (usually a data warehouse like Redshift or Bigquery) via an import script which is usually a Python snippet that fetches data from their API. There are many considerations here like rate-limits of the API, authentication, validating the result of the API response, and dealing with periodic changes to the API schema. Alternatively, there are gazillion tools like Fivetran/Airbyte that take care of this source-to-destination for a hefty price.
If you want to take a look at how your users use your product, you must also instrument the actual codebase to emit “events” that are sent to a tool like Segment, Amplitude, Mixpanel, or Posthog. Besides having an actual code footprint, there are many modes of failure here from events being renamed to multiple events tracking the same thing. There is no way to backfill historic data if you didn’t track a certain event but want to start doing so now.
Second, you want to transform the data into a usable state. Assuming you’ve somehow managed to get all the raw data in a single place, you’ll notice the data in its raw format isn’t all that useful. For example, if you’re interested in seeing your engagement metrics go up or down, having a table full of raw campaign data isn’t helpful. This is where “data transformation” comes into play. This is usually done with either purely SQL or a combination of SQL and Python (dbt, or SQLMesh). Bear in mind that you need an orchestration tool like Airflow or Dagster to run these periodically.
The more complex the question, the more complex this transformation will be. But in the end, you will have a table with usable data in there.
Finally, after five tools are setup, you get to the part where you can visualize the data. This is arguably the simplest part. You pick your data visualization or business intelligence (which isn’t very intelligent now that you think about it) tool of choice and point it at the cleaned, transformed data and write some SQL to load the data in (although most BI tools have a no-code interface these days),
Finally, you set up the charts and visuals, a nice color scheme, labels, and viola! You are up and running. At least for that one data source or metric. You need to repeat significant parts of this process if you want to, say, ask another question from a different data source or have a completely different metric that came to mind.
There are other must-haves (like tracking data lineage to account for incorrect metric values, etc,) in the process, and this is a gross simplification of what is goes on under the hood but by and large this is what is called “ETL” (or ELT to be precise)
Now, as you can imagine, this all requires a lot of fugly code, many moving parts. So much so that most small teams default to using excel sheets and manually pasting numbers from CSV exports and hack their way around it. And rightfully so, the point is not to have the best data pipeline, but to make decisions using data.
Setting up a data culture early in your team's life is important. If you kick this can down the road, a likely outcome is your data culture will lag behind the rate at which your business is growing. To setup a data culture early, your team needs a “data hub” that grows with them — it must be simple and instantly surface important metrics to your team when it is young and has no time for setting up pipelines, but afford them the flexibility and power to do more sophisticated things as the team and business grows.
As a general rule of thumb, a solid foundation for a data hub must be:
Start simple, take the jargon out of the way. Important metrics like revenue, churn, activation must be defined once and used everywhere. It is surprisingly common to see teams with multiple definitions and values for the exact same metric even when their business is mature and growing quickly.
A data "catalog" of your metrics, what they mean, their "health" and how they are calculated would be the first installation in a data hub.
Allow everyone to experiment and prototype metrics. This is only possible if the learning curve to data tools is brought down. The software should adapt to the user, not the other way round. A huge component of this is allowing users to go from "hey we should measure X" to presenting their work in a lightweight notebook environment.
I’m excited to share what we’ve been working on these past couple of months. Say hello to Livedocs!
Livedocs is your team's first data hub. It’s a hosted data platform that brings your data tooling under one roof. We’re building a robust and simple way for teams to explore and present data, with an emphasis on great UX.
We’re solving the cold start problem of data stacks — go from sign up to sharing your first Livedoc in minutes. Here’s a few simple things to what Livedocs can do for your team, today:
Livedocs connects to 300+ SaaS tools that your team uses, and has 180+ pre-built metrics so you can begin building documents with live data in them instantly.
You can also write custom import scripts to fetch data from your API with Python, and connect directly with your database and instantly build stunning reports and “live docs” with your data.
Livedocs lets you define custom metrics with SQL and build a gallery of metrics that anyone on your team can use reliably in reports, dashboards, and docs. You can then embed beautiful graphs in your teams Notion, on public websites or share a link with a few clicks.
You can also schedule reports to be sent to your team via Slack or email, so everyone on your team has access to the right data at the right time.
Here’s a demo of me walking through Livedocs
Take control of your data. Livedocs gives your team data superpowers with just a few clicks.