Before the rise of deep learning in TTS tasks, either concatenative or parametric models where used.
To create concatenative models, one needs to record high quality-audio content, split it into small chunks, and then recombine these chunks to form new speech. With parametric models, we have to create the features with signal processing techniques, which requires some extra domain knowledge.
Concatenative models tend to be intelligible, but lack naturalness. They require a huge dataset that takes into account as many human-generated audio units as possible. Therefore, they usually take a long time to develop.
In general, parametric models perform worse than concatenative models. ...