Given the graphical model, we can write the joint distribution over the concept hyperparameters () and model weights (, ) as follows:
Two simple but reasonable assumptions are made to make the machinery computationally tractable:
- First, the hidden weights, and , from the last hidden layer to softmax are treated ...