Data unleashed: A chat with Amplitude’s Chief Architect

In the highly competitive digital landscape, businesses cannot afford to be uninformed about their customers, not even for one second.

Data analytics seeks to make a difference by simplifying the complex and making the seemingly impossible possible. Enterprises gain access to valuable insight in a timely manner.

The challenge is that not everyone has in-house data scientists capable of performing this work. Companies like Amplitude, a product analytics solutions provider, seek to make data analytics accessible to a wider audience. Jeffrey Wang, its co-founder and Chief Architect, recently sat down for a conversation with Frontier Enterprise.

Could you walk us through how Amplitude started?

I have two co-founders, Spenser Skates and Curtis Liu. They were working together on the voice-to-text apps called Sonalight before I joined them. They went through Y Combinator, the start-up accelerator in the Bay Area, and that app was quite successful. While building it, they realised two things: First, they needed good analytics to understand whether their product was effective, whether users liked it, and whether they were returning to use it. They were surprised to find that there weren’t many suitable solutions on the market, so they ended up creating some of their own.

Jeffrey Wang, co-founder and Chief Architect, Amplitude. Image courtesy of Amplitude.

The second thing they realised was that better voice recognition technology, specifically the machine learning behind it, was the main obstacle to Sonalight’s success. Since they were more aligned with product and engineering rather than machine learning, they understood that it probably wasn’t the right problem for them to tackle.

In the subsequent years, with the release of Siri, Google’s voice recognition, and Alexa by big tech companies with substantial machine learning budgets, they decided to abandon that direction. Meanwhile, other companies in their YC batch saw the analytics they were building and expressed interest in it. That’s how they got the idea to take the analytics they had created for Sonalight and try to make it more widely available. That’s when I got involved. I had worked at Microsoft, Google, Palantir, and Sumo Logic, focusing on data and analytics. We knew each other through mutual friends, and it kicked off from there about 10 years ago.

Where exactly did Amplitude sit then post-Sonalight, and where does it sit now?

You have your product, which can be a website or a mobile app, and it will typically be instrumented with the Amplitude SDK. From the product, every time a user performs an action, like sending a text message, selecting a contact, purchasing something, or watching a video, it will track an event to Amplitude. For example, this user performed this action, and there’s additional context around it, such as what they did, what they watched, how long they spent on it. That’s the interface between the actual product and Amplitude.

Amplitude collects all of this data from all of your customers, all the users of your app, and presents charts, dashboards, aggregate views, and analyses to help you understand user behaviour inside your app. Questions like, ‘What is the user behaviour inside your app?’ are answered.

Our aim is to provide these insights to help companies build better products. We believe that by looking at the data and understanding what’s happening, companies can gain valuable insights into how to improve their products.

Could you talk a little bit about Amplitude’s behavioural graph? 

The behavioural graph is a key technology behind what we’re doing, and to understand its value, it’s essential to grasp what makes Amplitude unique as a product. 

Amplitude’s uniqueness lies in our ability to answer complex questions quickly through a user interface that’s quite easy to navigate. Essentially, non-technical people—such as product managers, designers, marketers, or growth personnel—who traditionally don’t spend much time writing SQL or doing analytics, can use Amplitude to easily answer in-depth questions like, ‘What is the conversion rate of people through a particular funnel end-product?’

Amplitude can also provide insights into the conversion rate over time, or broken down by country or device. If you delve into what causes people to convert from one step to another, translating those into SQL or the actual math behind the question becomes very complicated. We present it in a way that doesn’t feel complex; it feels like asking natural questions about your product.

Why does the behavioural graph matter? If you try to ask those questions and turn them into SQL running on a traditional database, it would take forever to run. That’s why those questions are challenging, and many people don’t end up asking them. You might end up building data science teams to optimise those queries or divide thousands of queries all the time.

Even for one query, translating the equivalent Amplitude query to a data warehouse query might never finish. The end user, who’s non-technical, isn’t going to think about rewriting the query. That’s why it’s challenging to build product analytics on top of generic databases like Snowflake, Redshift, BigQuery. Those are great technologies but they don’t solve our problem well, as they either make it impossible or require time to make the query run reasonably. Our users don’t think about that at all.

The behavioural graph is what you get if you take all the high-performance data analytics technologies, like column storage and compression techniques, and build a custom database specifically designed around answering user behaviour questions. We have a good understanding of all of those technologies, and instead of a generic data warehouse-type solution, we build a custom database tailored to this purpose.

When we talk about a funnel, it’s just a progression of people through steps. Representing a funnel query in SQL in a generic data warehouse results in a massive construct that’s extremely hard to compute and even describe or represent, because SQL doesn’t care about files. It doesn’t know what a file is. While SQL provides a very flexible and powerful language, it becomes very complicated and expensive to run a simple concept like a funnel.”

Amplitude, instead of designing a language like SQL, which can do everything but mostly very slowly, has essentially our own language for representing user behaviour queries. This language is reflected in the innermost layer of the database, allowing us to understand a funnel not just as giant SQL but as a very specific concept that lets us compute it extremely efficiently.

So when a non-technical user asks, ‘What’s the conversion rate through this funnel?’, we can return the result in one second instead of one hour. They can then ask the next 20 questions about that funnel, like, ‘How does that break down? Which step is the most important one? How does this compare between my two releases?’ All of those questions are only unlocked if you can answer the first question really fast, and that’s the whole reason why we built our own database, because we need to do that quickly to enable the idea of Amplitude at all.

How do you see AI affecting Amplitude’s business strategy?

AI has reached a point where it understands language well enough to turn it into concepts and charts, and that’s super cool, so we’re going to take advantage of that.

The AI models and algorithms are somewhat of a commodity, in that they spread very quickly once they’re created. Who benefits from this? It’s the people who have the data at the end of the day. They can take those now-commoditised AI algorithms and apply them to their data. This is particularly interesting for us because we have a unique data set, a first-party behavioural data set. Each of our customers is tracking the valuable actions that people are taking in their product.

That’s why product analytics is such a big deal; that data tells you everything about your business. There’s very little behavioural data out there in the world. The internet is full of text, but it doesn’t have many behavioural data sets, as it’s a personal thing for companies. Facebook, for example, doesn’t publish their behavioural data because they use it to generate revenue. We have that data for several customers, and they rely on us to develop AI techniques and apply what OpenAI has proven with ChatGPT, for instance, to behavioural data. This allows them to gain insights with much less effort.

For our use case, AI is undoubtedly valuable. The data will be the foundational differentiator of why Amplitude will become even more valuable than before, given the rise of AI. 

Part of our goal is to make all products better. We want to create a shared set of techniques to help all products improve. That’s our vision. We don’t think of that as a walled garden, but we’re also not going to open source all of our code.