The architecture of the discriminator network of pix2pix is as follows:
A pair of samples (one from each collection) are concatenated along the depth channel, and this 6-channel image is treated as the actual input of the discriminator network. The discriminator network maps the 6-channel 256x256 image to a 1-channel 30x30 image, which is used to calculate the discriminator loss.
The discriminator network, netD, is created by the models.networks.define_G method. By default, it takes "basic" as the argument value of netD, which is defined at line 33 in options/base_options.py ...