The primary architectural consideration of managing large volumes of text in an unstructured data warehouse is that of not placing actual text in the unstructured data warehouse. Instead, the text that is placed in the unstructured data warehouse is that text that is most useful in decision making. Stated differently, the unstructured data warehouse should contain only the distilled data that is useful for decision making, while the actual text remains at its source.
As a simple example, suppose there was this email:
I received your invoice for $238.18 yesterday. I will see to it that AM Rogers is paid posthaste. Thank you for the opportunity to do business with you.
The email would remain in ...