4.8 Modelling from a Bit-stream
As described in Section 4.5, DCT can be used in building computational models of visual attention. This section will introduce a computational model of visual attention in the compressed domain . Most existing saliency detection models are built in the image (uncompressed) domain. However, images in storage and over the internet are typically in the compressed domain such as JPEGs. A novel saliency detection model in the compressed domain is proposed in . The intensity, colour and texture features of the image are extracted from the DCT coefficients from a JPEG bit-stream. The saliency value of each DCT block is obtained based on the Hausdorff distance calculation and feature map fusion. As DCT is used in JPEG compression at 8 × 8-px block level, the DCT coefficients are used to extract intensity, colour and texture features for each 8 × 8-px block for saliency detection. Although the minimum coded unit (MCU) can be as large as 16 × 16-px (for 4: 2: 0 subsampling format), the saliency detection in this model is performed at the 8 × 8 block level for each DCT block. The saliency map for an image is calculated based on weighted feature differences between DCT blocks.
4.8.1 Feature Extraction from a JPEG Bit-stream
The Baseline method of JPEG, which is implemented based on DCT, is the most widely used image compression method . Entropy decoding is used to decode the JPEG bit-stream to obtain the quantized DCT coefficients. As Huffman coding ...