Data Moats

Over the past decade, AI and Machine Learning have become the buzzwords du jour, with virtually every new startup and most incumbents claiming to employ AI in novel and exciting ways to create economic value. These claims often implicitly or explicitly cite data scale as their key source of competitive advantage. The argument is simple: as we amass more data, our models will get more accurate and will therefore be more valuable to our customers. This effect will compound over time, giving us an enduring edge. Such logic is seductive, but only occasionally correct.

To better understand why this is true, we first require a framework for thinking about sources of competitive advantage. Fortunately, Hamilton Helmer’s increasingly popular book 7 Powers provides such a framework for analyzing said advantages, which he terms “power.” In Helmer’s formulation, data scale advantages are a variant of switching costs. This is counterintuitive, as most of us think of switching costs as encompassing things like extensive systems integrations and long, painful SAP implementations.

Helmers frames each type of power in terms of a barrier to competition and a benefit to customers. Switching costs may encompass several benefits, but the most relevant to data scale effects is that they allow firms to provide a superior deliverable, generally in the form of more accurate predictions. The barrier is the cost for a competitor to build a superior offering relative to the incremental value that such an offering would provide. These criteria allow us to more clearly assess the relative strength of a hypothetical data moat by asking:

  • How much do customers value incremental improvements in accuracy?
  • How much would it cost a competitor to build a functionally equivalent product?

These questions are more complex than they may seem on their surface. In particular, assessing the accuracy of a machine learning system requires careful consideration of the distribution of outcomes. For many of the most interesting problems to which machine learning is applied, the majority of the economic value is created from long-tail events. This distinction, which is elided by the coarse notion of accuracy, is critical. Indeed, understanding the mapping from event frequencies to economic value is fundamental to informing corporate strategy. Andreesen Horowitz has written about this at length in their excellent article, Taming the Tail. For our purposes, it is sufficent to say that if the ability to accurately predict the outcomes from long tail events are a significant driver of economic value for customers, then the resulting data moat is likely to be much more robust.

If value creation is dominated by the short head of the event distribution, there is generally diminishing marginal value to data. Machine learning models in general exhibit declining marginal accuracy as data scale improves. This is compounded by the fact that, for many domains, customers assign diminishing marginal value to incremental improvements in accuracy. If you’re trying to predict what films I might like, I don’t care much if your model is 90% accurate vs 95% accurate. Functionally equivalent accuracy occurs when the performance of two competing products is close enough that accuracy is no longer a primary driver of purchasing decisions. This happens at a much higher level of absolute accuracy in domains like medicine, autonomous vehicles, or finance.

Finally, it is important to consider the degree to which data is actually portable across customers. In many enterprise use cases, customers data is highly proprietary and cannot be pooled to train a single, superior model. In such cases, engineers and data scientists must rely on techniques like federated learning and ensemble models to create data scale effects. These techinques can certainly be effective, but they are also quite expensive and require specialized skill sets. Significantly, model training becomes a very significant variable cost in these businesses. In the absence of such tactics, these businesses must fall back on scale economies, amortizing the fixed costs of model development across more customers.

The takeaway is that in each case, we must evaluate the relationship between data scale and economic value. Naively accepting that data scale will create a meaningful moat or source of power can lead us to some nasty surprises in the long run.


We tend to (perhaps out of wishful thinking) think of data scale as more akin to network effects, but this is clearly wrong. To see this, consider your experience as a customer of XYZ Co, a firm which benefits from strong data scale effects. As a customer, you benefit when XYZ Co acquires more data to train its models. However, you are indifferent to the number of other customers that XYZ Co has. If this new data was sourced from 1 firm or 100, you are indifferent. Thus it cannot be the case that the value of XYZ Co’s solution scales with the number of customers; it is instead driven by the number of datapoints. While these two dimensions are obviously strongly correlated, the distinction is significant.

Written on January 12, 2021