New York Startup Showcase reveals what’s needed in the data industry

Collaboration and data security tools win the day.

By Alistair Croll
October 20, 2015
The Lorenz machine was used by the Germans to encrypt high-level teleprinter communications. The Lorenz machine was used by the Germans to encrypt high-level teleprinter communications. (source: Ludovic Ferre of Privacy Canada)

At this year’s Startup Showcase at Strata + Hadoop World New York, the judges’ choices were for collaboration, awarding the top three spots to tools that help developers share and use each other’s work. At the same time, attendees’ eyes were on security and data privacy.

The audience favorite was BlueTalon, which delivers a data security tool that enforces policies around visibility within a data repository. The old way to secure data was to make multiple copies of the original data during the extract, transform, and load stage, each copy containing different subsets of the original data, and then to give users access to one based on their permissions. This led to multiple independent versions, inconsistencies, and data corruption. Instead, BlueTalon moves the decision about data permission to the moment when a user asks for data, thus avoiding the inefficiencies and complexity that are often the results of this kind of standardization. At the Startup Showcase, Blue Talon launched HDFS-level security enforcement, providing fine-grained data protection for Hadoop. The company plans to launch more tools, specifically ones for regulated industries that apply consistent security policies across data stores, while maintaining a verifiable audit trail.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

The judges’ winner, Algorithmia, is an algorithm-as-a-service marketplace. Today, the platform has more than 1,600 algorithms and more than 14,000 developers using the tool. Their sales team is a mix of businesspeople and algorithm developers, and their focus is on convincing customers to re-use or stitch together existing tool chains rather than reinventing the proverbial wheel. For example, earlier this year, the company demonstrated how several algorithms could be combined to detect nudity online.

Second-place is a plaform for managing data science projects, creating a pipeline of tools; it puts tools like R, Python, Julia, Spark, Hive, Impala, Presto, and Redshift to work for a team. Much of its emphasis is on re-use and collaboration, particularly on quickly updating a model or a visualization without having to start from scratch.

Timbr, which earned third place, built its tech on work for DARPA, the White House, Twitter, and fintech. The company makes a repository of algorithms and data that can be re-purposed, with a goal of sharing tools across projects and organizations. With an emphasis on collaborative algorithm development, the startup offers “Lego bricks” of data science atop a scalable computing platform.

Taking a look at these winners, it becomes clear that while we have a number of robust algorithms to handle common problems, what’s needed are ways to discover, access, and adapt these tools for a variety of business models and industry verticals. At the same time, the audience vote reminds us that understanding the value of the data we’re working with, and governing it accordingly, is of the utmost importance as well.

Post topics: Big Data Tools and Pipelines