Social Data Normalization

Assume we've solved the data access latency issue by using HTTP POSTs, à la Web Hooks, to handle events across the WAN. This resolves general API access issues, but not the diverse nature of the data itself. XML provides structure for data, but it does nothing for commonality. Social data aggregation applications today are stricken with one-off understandings of each social data API they integrate with. The overhead in understanding the intricacies of the data structure that comes back from a particular API is high—too high. While protocol muxing gateways have existed for a long time, generally only strict XML transformation translators exist for consolidating common, normalized data from disparate sources. Unfortunately, strict parsing of data rarely works, as the set of services actually creating the data is so diverse. Their understanding of the standards, encodings, and escape sequences all varies. In addition, the software creating the XML for consumption inevitably contains bugs, which result in poorly formatted output, further complicating its consumption.

We learned from strict HTML-parsing web browsers that standards unfortunately do not result in perfectly formatted data that software adhering to those standards can flawlessly consume. The reality is that standards are interpreted differently, software is buggy, and the most powerful de facto standard is what users are already doing. The power of the people can never be denied.

If you've ever spent ...

Get Beautiful Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.