Data: Emerging Trends and Technologies
How sensors, fast networks, AI, and distributed computing are affecting the data landscape
Cheap Sensors, Fast Networks, and Distributed Computing
The trifecta of cheap sensors, fast networks, and distributing computing are changing how we work with data. But making sense of all that data takes help, which is arriving in the form of machine learning. Here’s one view of how that might play out.
Clouds, edges, fog, and the pendulum of distributed computing
The history of computing has been a constant pendulum, swinging between centralization and distribution.
The first computers filled rooms, and operators were physically within them, switching toggles and turning wheels. Then came mainframes, which were centralized, with dumb terminals.
As the cost of computing dropped and the applications became more democratized, user interfaces mattered more. The smarter clients at the edge became the first personal computers; many broke free of the network entirely. The client got the glory; the server merely handled queries.
Once the web arrived, we centralized again. LAMP (Linux, Apache, MySQL, PHP) buried deep inside data centers, with the computer at the other end of the connection relegated to little more than a smart terminal rendering HTML. Load-balancers sprayed traffic across thousands of cheap machines. Eventually, the web turned from static sites to complex software as a service (SaaS) applications.
Then the pendulum swung back to the edge, and the clients got smart again. First with AJAX, Java, and Flash; then in the form of mobile apps where the smartphone or tablet did most of the hard work and the back-end was a communications channel for reporting the results of local action.
Now we’re seeing the first iteration of the Internet of Things (IoT), in which small devices, sipping from their batteries, chatting carefully over Bluetooth LE, are little more than sensors. The preponderance of the work, from data cleaning to aggregation to analysis, has once again moved to the core: the first versions of the Jawbone Up band doesn’t do much until they send their data to the cloud.
But already we can see how the pendulum will swing back. There’s a renewed interest in computing at the edges—Cisco calls it “fog computing”: small, local clouds that combine tiny sensors with more powerful local computing—and this may move much of the work out to the device or the local network again. Companies like realm.io are building databases that can run on smartphones or even wearables. Foghorn Systems is building platforms on which developers can deploy such multi-tiered architectures. Resin.io calls this “strong devices, weakly connected.”
Systems architects understand well the tension between putting everything at the core, and making the edges more important. Centralization gives us power, makes managing changes consistent and easy, and cuts on costly latency and networking; distribution gives us more compelling user experiences, better protection against central outages or catastrophic failures, and a tiered hierarchy of processing that can scale better. Ultimately, each swing of the pendulum gives us new architectures and new bottlenecks; each rung we climb up the stack brings both abstraction and efficiency.
Transcendence aside, machine learning has come a long way. Deep learning approaches have significantly improved the accuracy of speech recognition, and many of the advances in the field have come from better tools and parallel computing.
Critics charge that deep learning can’t account for changes over time, and as a result its categories are too brittle to use in many applications: just because something hurt yesterday doesn’t mean you should never try it again. But investment in deep learning approaches continues to pay off. And not all of the payoff comes from the fringes of science fiction.
Faced with a torrent of messy data , machine-driven approaches to data transformation and cleansing can provide a good “first pass,” de-duplicating and clarifying information and replacing manual methods.
What’s more, with many of these tools now available as hosted, payas- you-go services, it’s far easier for organizations to experiment cheaply with machine-aided data processing. These are the same economics that took public cloud computing from a fringe tool for early-stage startups to a fundamental building block of enterprise IT. (More on this in “Data as a service”, below.) We’re keenly watching other areas where such technology is taking root in otherwise traditional organizations.