--------------28472C677566
Content-Type: text/html; charset=iso-8859-1
Content-Disposition: inline; filename="temp.html"
Content-Transfer-Encoding: quoted-printable
We propose a watermarking technique for digital images that is based on utilizing visual models which have been developed in the context of image compression. The visual models give us a direct way to determine the maximum amount of watermark signal that each portion of an image can tolerate without affecting the visual quality of the image. This allows us to provide the maximum strength watermark which in turn, is extremely robust to common image processing and editing such as JPEG compression, rescaling, and cropping. Our watermarking scheme is based on a DCT framework which allows for the possibility of directly watermarking the JPEG bitstream. Our scheme is shown to provide very good results both in terms of image transparency and robustness.
We focus our efforts on a scheme that is best suited for destination-based applications. The requirements necessary for a destination-based scheme to be effective include transparency and robustness. Robustness includes being able to detect the watermark after some type of signal processing such as compression, resampling, requantization, cropping, halftoning and xeroxing as well as illegal attempts to remove or alter the watermark. The most straightforward way to introduce a transparent watermark results in a watermark that is very vulnerable to attack. Many of the earlier techniques used such approaches to produce visually pleasing but not robust results.
Watermark schemes fall under two basic categories: spatial-domain and frequency-domain techniques. Frequency domain techniques include [KOC95], [COX96], [SZT96]. The technique described in [SZT96] is similar to the work presented here in that the authors take advantage of some type of visual properties in designing their watermarking scheme. A very interesting frequency--domain approach introduced in [COX96] is based on the idea of spread spectrum communications. Their technique yields very impressive results both in terms of image quality and robustness. Our work is motivated by the initial ideas introduced in this paper. We choose a frequency-domain approach because this offers us a natural framework for incorporating perceptual models into the scheme. Visual models which have been designed for image compression are directly extended to the watermarking application by providing upper bounds on watermark intensity levels in every part of the image which guarantees perceptual image quality. We develop such a scheme in a DCT-based framework.
We describe a general framework for the watermark encoding scheme which consists of a frequency decomposition based on an 8x8 DCT framework followed by JND calculation and watermark insertion. The block-based approach provides local control which allows us to incorporate local visual masking effects. Such a scheme also allows for the direct watermarking of the JPEG bitstream.
We use the visual model developed by Watson [WAT92] for JPEG compliant image compression. The Watson model is based on an image independent component utilizing frequency sensitivity as determined by measurements of specific viewing conditions as described in [PAW92] with a minimum viewing distance of four picture heights and a D65 monitor white point. We refer to the frequency sensitivity portion of the model as t f (u,v) where a frequency threshold value is derived for each DCT basis function and in this case results in an 8x8 matrix of threshold values. Watson's model also contains a luminance sensitivity and contrast masking component. Luminance sensitivity is estimated by the formula
t L (u,v,b) = t f (u,v) ( X(0,0,b)/X d (0,0) ) a
Eq. (1)
where X(0,0,b) is the DC coefficient of the DCT for block b in the original image, X d (0,0) is the DC coefficient corresponding to the mean luminance of the display and a is a parameter which controls the degree of luminance sensitivity. The authors in [PAW93] suggest setting a to 0.649. Contrast masking refers to the detectability of one signal in the presence of another signal. Given a DCT coefficient X(u,v,b) in location (u,v) of block b and a corresponding threshold value derived from the viewing conditions and local luminance masking, t L (u,v,b), a contrast masking threshold, t C (u,v,b) is derived as
t C (u,v,b) = Max [ t L (u,v,b), | X(u,v,b) | w u,v t L (u,v,b) 1 - w u,v ]
Eq. (2)
where w is a number between 0 and 1 and can assume a different value for each DCT basis function. A typical empirically derived value for w is 0.7. For more details please refer to the paper [ WAT92]. The image dependent masking thresholds are used to determine the location and maximum strength of the watermark which consists of a sequence of real numbers generated from a Gaussian distribution with zero mean and unit variance as proposed in the spread spectrum technique of [COX96].
The watermark insertion is described by,
X* (u,v,b) = X(u,v,b) + J(u,v,b)w(u,v,b)
if X(u,v,b) > J(u,v,b)
otherwiseEq. (3)
where X(u,v,b) refers to the DCT coefficients at location (u,v) in block b, X * (u,v,b) refers to the watermarked DCT coefficients, w(u,v,b) is the sequence of real valued watermark values and J(u,v,b) is the computed just noticeable difference calculated from the visual models. At times we do have a priori knowledge about some of the image transformations that will be applied to the watermarked image and it is best to take advantage of this knowledge in the watermark insertion process. However, in this case, we do not assume any prior knowledge and unlike [COX96] we do not limit watermark insertion only to perceptually significant parts of the image. A slight modification of Equation(3), by limiting watermark insertion to locations corresponding to values J(u,v,b) less than a predetermined threshold value, allows for a scheme which restricts the watermark to perceptually significant regions only. Watson's model is used directly to determine J(u,v,b). Note that since the watermark is generated from a normal distribution, watermark insertion as given in Equation(3) will occasionally result in values that exceed the JND. Informal studies show that exceeding the JND occasionally does not result in any visibly objectionable results. This might signify that there are other masking effects that could be incorporated into the visual models that we are not currently taking advantage of. However, we have not run formal tests in order to make any definite conclusions. Currently, the watermark is only inserted into the luminance component of the image.
Watermark detection is based on classical detection theory. The original image is subtracted from the received image and the correlation between the signal difference and a specific watermark sequence is determined. The maximum correlation value is compared to a threshold to determine whether the received image contains the watermark in question. The correlation detection scheme can be expressed in vector space as
w * s (u,v,b) = X(u,v,b) - X *
R (u,v,b)
w * (u,v,b) = w * s (u,v,b) / J(u,
v,b)
R w w* = w * * w / sqrt( w *
* w * )
Here w * s (u,v,b) denotes the possible received, perhaps distorted watermark scaled by the JND thresholds, w * (u,v,b) denotes the received watermark, and R w w* is the normalized correlation coefficient between the two signals w and w * given by the dot product w * w. The watermark detection is performed by comparing the correlation coefficient to a threshold value which can be modified according to the tradeoff between probability of detection, PD, and the probability of false detection, PF. The final step for watermark detection is
R w w* > T R
watermark w detectedR w w* <= T R
watermark w is not detected
For the experiments here, we set the threshold T R = 5.