Now, let's combine the previously defined functions to form the full Tacotron model.
But first, let's define some extra parameters that characterize the network:
NB_CHARS_MAX = 200 # maximum length of the input textEMBEDDING_SIZE = 256 K1 = 16 # number of 1-D convolution blocks in the encoder CBHGHK2 = 8 # number of 1-D convolution blocks in the postprocessing CBHGBATCH_SIZE = 32
The two input objects correspond to the encoder input and the decoder input. The former is expected to be the input text. The latter should be the last mel-spectrogram frame, among the r frames predicted by the decoder before the postprocessing CBHG. The ...