11.6 Architecture
In this section, we shortly give an overview of the total hyponymy extraction process.
1. In the first step the corpus containing the hyponyms (here the German Wikipedia) is parsed by the deep linguistic parser WOCADI
2 [23]. For that WOCADI makes use of the semantic lexicon HaGenLex
3 [24] and a given knowledge base KB. The output of the WOCADI analysis for a single sentence is a token list, a dependency tree, and a semantic network.
2. Shallow extraction rules (similar to Hearst patterns) are applied on the token list.
3. Deep extraction rules are applied on the semantic network representation.
4. A validation module is applied that filters out incorrect hypotheses by looking on the semantic properties of these hypotheses [25].
5. Not all of the hypotheses that pass this filter are actually correct. Therefore, a support vector machine is additionally applied to validate the accepted hypotheses. Validation scores are calculated for all hypotheses and stored with them together in the hypotheses knowledge base HKB.
6. The best hypotheses of HKB, according to the scores, are stored in the knowledge base KB after manual inspection.
The entire validation process is illustrated in Figure 11.2.
A deep extraction rule consists of a conclusion sub0(a1, a2) (sub0: hyponymy/instance-of/troponymy relation) ...