Let's start by creating a vocab.go file in the root of our project directory. In here, you will define a number of reserved unicode characters that will represent the beginning/end of our sequences, as well as a BLANK character for padding out our sequences.
Note that we do not include our shakespeare.txt input file here. Instead, we build a vocabulary and index, and split up our input corpus into chunks:
package mainimport ( "fmt" "strings")const START rune = 0x02const END rune = 0x03const BLANK rune = 0x04// vocab relatedvar sentences []stringvar vocab []runevar vocabIndex map[rune]intvar maxsent int = 30func initVocab(ss []string, thresh int) { s := strings.Join(ss, " ") fmt.Println(s) dict := make(map[rune]int) ...