Chapter 25. Expanding Your Data Warehouse with Unstructured Data
In This Chapter
Recognizing the limits of today's data warehousing
Getting data through multimedia
Looking at business intelligence and unstructured data
Going from unstructured information to structured data
Today's data landscape now encompasses a dizzying array of new information channels, new sources of data, and new analysis and reporting imperatives. According to analyst groups, nearly 80 to 85 percent of today's data is unstructured, and new information channels such as Web, e-mail, voice over IP, instant messaging (IM), text messaging, and podcasts are rapidly creating huge stores of nontraditional data. Data from any of these sources will be requested from your users to be integrated into your data warehouse.
Traditional Data Warehousing Means Analyzing Traditional Data Types
Unless you've used an extraordinary, state-of-the-art data warehouse, your business intelligence functionality has probably been limited to these types of data:
Numbers: Numeric data in the technical form of integers and decimal numbers
Text: Character data, typically fixed-length alphanumeric information that's rarely more than about 255 characters per occurrence, although (very rarely) it might go up to 4,000 characters
Dates and times: Either actual dates and times or, more likely, ranges of dates (such as a month and year for which product sales are grouped and stored)
That's about it.
To be fair, data warehousing in its original incarnation, ...