Drowning in data, starved for insight

Worth a billion dollars? The proof of the press release is in the data exhaust.

By Heather Vescent
February 2, 2016
Steam Phase eruption of Castle geyser with double rainbow. Steam Phase eruption of Castle geyser with double rainbow. (source: Brocken Inaglory on Wikimedia Commons)

Anand Sanwal founded CB Insights in 2010 to understand the global private markets—an area notorious for its lack of accurate data. We talked about sourcing and verifying private company data to understand emerging industries and trends, and it became clear that, while CB Insights covers fintech and other technology companies, it’s also exemplary of one type of fintech organization: sniffing out information from a sea of both structured and unstructured data (and combining this with old-fashioned shoe-leather research). Along the way, we a had a little reality check on unicorns—billion-dollar startups.

Heather Vescent: What led you to found CB Insights?

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Anand Sanwal: When I was running a $50 million innovation fund at American Express in 2006 and 2007, I was dissatisfied with the data we had for the global private markets. There was really good data on the public markets—Bloomberg does that really well. But there wasn’t really good data or sources on the private markets, which are a signal of what’s coming next. So, I saw that opportunity, and coupled with the fact that I’ve always wanted to do my own thing, led me to say, “Okay, let’s leave the safe confines of corporate America and start my own thing.”

HV: How do you collect data about private companies?

AS: We get our data two ways. We get about 75% of it programmatically. That means we built machine learning software that crawls unstructured sources, like the Kleiner Perkins portfolio page or an SEC filing or millions of blogs, press release sites, local, national, regional, and international newspapers. This software identifies content that looks like a financing or exit, and extracts the structured data from it.

The other 25% (and this number is increasing—I expect it will be about 40% by mid 2016) of our data comes from direct submissions from investors. The main reason they submit their data to us is 1) a lot of potential buyers of their companies use our product, the corporate M&A teams, so they obviously want to be in front of them, and 2), we produce a lot of content. We just did a partnership with the New York Times to rank venture capital partners. As a result, investors have a vested interest making sure their data is accurate and complete.

HV: How does the data that you collect about private companies differ from data about public companies?

AS: Public companies have to file quarterly financial statements, but private companies don’t have to publicly disclose financials. So, at the most basic level, there’s not as much data, and as a result, understanding the private markets becomes more challenging. You can’t go to the SEC website, pull down financial documents and analyze them.

As a result, you have to get creative.

We realized that in the absence of a profit-and-loss statement, you’d have to look at non-traditional signals. You get data crawling millions of sources daily, from business journals to the Kleiner Perkins website, looking for data on private companies. Once you find data, the next step is determining who is doing well and who isn’t. You look at everything from sentiment on Twitter to the number of jobs they posted to partner and customer signings to mobile downloads. You’re solving for a riddle of how these companies are doing by building this mosaic of data. The data is disparate and dirty, and therein lies the opportunity.

HV: So, you pull data from a combination of sources you crawl as well as data that is submitted to you. How do you verify the quality of both of these kinds of data?

AS: Great question. There are two ways. When somebody submits data to us, we require a source. If you’re an investor saying you invested in this company, that source could be a press release or an SEC filing, or it could be something else that indicates that you’ve actually worked with that company. We have a team that curates and reviews all the data that’s been submitted.

With the crawled data, we look at a veracity score for the source. If it’s from the SEC, we believe it. But, for example, if CB Insights puts out a blog post that says we raised $50 million from Sequoia Capital, the software would look for more credible confirmatory sources. For example, are we listed on Sequoia’s website? Does the SEC filing mention a partner from Sequoia?

When the source is less credible, we have to find a higher credibility source to confirm that data. That said, we’ve only had one instance of outright fraud. It’s usually not in the company’s best interest to put out false information. But we built the technology to make sure that if there was a bad actor out there, we could have some ways to sniff it out.

HV: It’s easy to understand evaluations and analysis of public companies because there is a set of financial standards. But you don’t have that for private companies. And especially with startups, sometimes the vision that is put forth is more aspirational than based in reality. How do you get the reality check to make sure your data is really substantial? A company could have great numbers, but they could also be drinking some Kool-Aid, and there can even be some investors drinking the Kool-Aid as well.

AS: It’s a really good point. In the private markets, especially with startups, it’s narrative driven: “Hey, this sounds like a great idea. The team looks impressive.” Ultimately, there’s a lot of data exhaust out there, and that’s what we track.

If you’re a consumer tech company, and people on Twitter or Facebook aren’t talking about you, that’s probably a bad sign. As you mature and time passes, if your Web traffic’s not trending up or if you’re not ranking in the mobile app stores, at some point, the narrative is no longer sufficient. You have to perform. The data does not lie.

If you’re doing well and you’re growing, we look at what types of jobs you are hiring for. You can get a sense of where a company is in its maturity based on hiring. If you’re hiring a lot of sales people, it probably means you have product market fit and you’re scaling up, while if you’re still growing the engineering team significantly, maybe that’s more an indication that you’re still developing the product.

There’s a lot of data exhaust out there that gives us insights into how a company is doing. We can look at partnerships and customer signings; we’ll look at press and press releases; and if you see two companies that are in the same space and one is signing a lot of deals and hiring a lot and Twitter mentions are significantly up and the other isn’t, you get a good sense, at least relatively, of who’s doing well and who isn’t.

HV: You have a list of 145 unicorns that are ranked based on your private data collection. Isn’t it a little audacious for these companies to value themselves at this amount? I want a little bit of a reality check.

AS: It’s a list of 145 companies. I don’t want to paint them all with the same brush, but I think there is definitely a lot of this valuation-chasing today. In terms of what determines valuation, it is a negotiation between a buyer and a seller, so if somebody is willing to pay it, that then becomes the valuation. One of the challenges today is that some of these valuations have become a little or a lot disconnected from public markets, and that’s where there’s going to be problems.

Like when you have a private company and a similar public company that are valued at very different metrics and ratios. If the private company keeps growing really, really quickly, that’s fine because it’ll grow into its valuation.

HV: I love the point that you brought up comparing these private valuations to public valuations. It’s not an apples-to-apples comparison. It seems to me that the private valuations are an estimate of potential. Like you said, these companies could grow into the markets, but there are many different variables that need to line up perfectly to hit that billion dollar unicorn number.

How does this myth of the unicorn impact the startup world and these companies?

AS: It’s hard to make a broad generalization. What you get in an environment like this, there are a lot of people who are, for a lack of a better term, “startup tourists.” They think, “Startups are the new hot thing and so I want to be part of a startup.” People who have been working in a big corporation all of a sudden want to be at a startup. So, you end up getting some of the tourist crowd. And that’s not necessarily a bad thing. Eventually, when the market turns, some of those folks might go back to the confines of corporate America, but some of them may have gotten the bug and want to start their own thing.

You alluded to people getting a lot of credit today in their valuations for future growth, and that’s fine as long as that growth materializes. It’s when it doesn’t that you have a problem. I think the people who get “hurt,” and I’d put “hurt” in quotes, are the employees of these proverbial unicorns. The late-stage investors have all these protections on their investments, so from an equity perspective, it’s the employees who are the lowest people on the totem pole.

I’ve read some great commentary how this obsession with unicorns is bad for entrepreneurship, in that people are focused on the scoreboard and not just playing the game. So, I think culturally, maybe there’s some credence to that point, but these are more philosophical arguments. I’m net bullish on what’s happening now.

If you said, “Hey, you can buy a basket of these 145 unicorns or you could buy HP, Google, or Microsoft,” I’d buy the basket of unicorns. There’s going to be some breakout companies from this group. Of course, there are companies that don’t deserve to have the valuation they have and they will struggle or die. But even with those casualties considered, I’d still be bullish on the overall unicorn herd over the long term.

HV: Among the companies you’re tracking, more than 4,000 are in fintech. What are the most exciting fintech trends?

AS: Fintech as a category is probably the hottest area we’re seeing. Folks talk a lot about disruption, but in reality, there’s a lot of technology companies founded today that complement large financial services companies. Areas like cyber security and predictive analytics are really hot and they’re complementary to existing financial services firms.

On the predictive analytics side, a lot of these big banks, financial services firms, and insurance companies are sitting on immense amounts of data. But many readily acknowledge they under optimize what they’re doing with their data.

We’ve had a big wave in lending, person-to-person lending, and a lot of companies have gone public. It’s slowed down a bit on the private market side, but I think that’s still a hot area.

Wealth management is another big area. When you think about the baby boomers retiring and taking money out of their 401Ks, what happens to those firms that managed all this money that’s no longer there because people are withdrawing it? How are those companies that manage 401Ks, those asset managers, how are they going to get more money in? That’s why many are focusing on millennials and other demographics. Wealthfront and Betterment are actively targeting the millennial segment.

It’s still in its early days, but I think insurance is going to be a big area. There’s all sorts of interesting things like drones being used to survey damage at a site, or being able to do things that fundamentally change how companies do business—satellites to take pictures of places, Internet of Things to send back data from a piece of machinery that they are underwriting.

HV: What’s your advice for banks?

AS: It’s easy to say, “All these startups will fail.” They should be looking at these emerging companies and understand their technologies, business models, how they are marketing, and what their pricing is because these companies are an indicator of where the world is going.

Look back at 1998 to 2000. A lot of companies were getting funded, a bunch of them failed. Pets.com and Webvan and boo.com and a bunch of others. Focusing on those folks is the wrong thing to do because, as an incumbent, you only need one Amazon to upend your business.

It’s harder to stay on top than ever. It’s going to be interesting to see which incumbents don’t just talk about innovation because there’s so much useless chatter about embracing innovation and other nonsense right now.

Part of innovation is that you’re going to fail, and you have to be okay with that. So, we’re going to launch some products and they’re not all going to work. We’re going to do some acquisitions, and not all of them are going to work. We’re going to make some investments, and not all of them are going to work. But you have to try. So, it’ll be interesting to see who does it.

Post topics: Data science