February 2018
Intermediate to advanced
262 pages
6h 59m
English
For this example, we use a dataset called WikiText2. The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. Compared to the preprocessed version of Penn Treebank (PTB), another popularly-used dataset, WikiText-2 is over two times larger. The WikiText dataset also features a far larger vocabulary and retains the original case, punctuation, and numbers. The dataset contains full articles and, as a result, it is well suited for models that take advantage of long term dependency.
The dataset was introduced in a paper called Pointer Sentinel Mixture Models (https://arxiv.org/abs/1609.07843). The paper talks about solutions ...