How to remove a watermark from a document image?

We Are Going To Discuss About How to remove a watermark from a document image?. So lets Start this Python Article.

How to remove a watermark from a document image?

  1. How to solve How to remove a watermark from a document image?

    Since we know the watermark is pink colored, we can use a two pass HSV color threshold approach. The first pass is to remove the majority of the watermark while keeping letters intact, the second is to filter out even more pink. Here's a potential solution:
    1st pass HSV color threshold. Load the image, convert to HSV format, then HSV color threshold for binary image.
    Dilate to repair contours. Because any type of thresholding will cause the letters to become washed out, we need to repair contours by dilating to reconstruct some of the characters.
    2nd pass HSV color threshold. Now we bitwise-and the original image with the 1st pass HSV mask to get an intermediate result but there are still pink artifacts. To remove them, we perform a 2nd pass HSV threshold to remove pink around characters by generating a new mask.
    Convert image to grayscale then remove pink contours. We convert the result of the 1st HSV color threshold to gray then switch the background from black to white. Finally we apply the result of the 2nd pass HSV mask to get our final result.

    Input image -> 1st HSV mask + dilation -> bitwise-and



    Notice how the background pink is gone but there are still pink artifacts around letters. So now we generate a 2nd mask for the remaining pink.
    2nd mask -> convert to grayscale + invert -> applied 2nd mask to get result



    Enlarged result

    Code
    import numpy as np import cv2 # Load image, convert to HSV, then HSV color threshold image = cv2.imread('1.jpg') original = image.copy() hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) lower = np.array([0, 0, 0]) upper = np.array([179, 255, 163]) mask = cv2.inRange(hsv, lower, upper) # Dilate to repair kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3)) dilate = cv2.dilate(mask, kernel, iterations=1) # Second pass of HSV to remove pink colored = cv2.bitwise_and(original, original, mask=dilate) colored_hsv = cv2.cvtColor(colored, cv2.COLOR_BGR2HSV) lower_two = np.array([96, 89, 161]) upper_two = np.array([179, 255, 255]) mask_two = cv2.inRange(colored_hsv, lower_two, upper_two) # Convert to grayscale then remove pink contours result = cv2.cvtColor(colored, cv2.COLOR_BGR2GRAY) result[result <= 10] = 255 cv2.imshow('result before removal', result) result[mask_two==255] = 255 cv2.imshow('dilate', dilate) cv2.imshow('colored', colored) cv2.imshow('mask_two', mask_two) cv2.imshow('result after removal', result) cv2.waitKey()
    Depending on the image, you may need to adjust the lower/upper HSV ranges. To determine the HSV lower/upper ranges, you can use this HSV thresholder script with sliders so you don't need to guess and check. Just change the image path
    import cv2 import numpy as np def nothing(x): pass # Load image image = cv2.imread('1.jpg') # Create a window cv2.namedWindow('image') # Create trackbars for color change # Hue is from 0-179 for Opencv cv2.createTrackbar('HMin', 'image', 0, 179, nothing) cv2.createTrackbar('SMin', 'image', 0, 255, nothing) cv2.createTrackbar('VMin', 'image', 0, 255, nothing) cv2.createTrackbar('HMax', 'image', 0, 179, nothing) cv2.createTrackbar('SMax', 'image', 0, 255, nothing) cv2.createTrackbar('VMax', 'image', 0, 255, nothing) # Set default value for Max HSV trackbars cv2.setTrackbarPos('HMax', 'image', 179) cv2.setTrackbarPos('SMax', 'image', 255) cv2.setTrackbarPos('VMax', 'image', 255) # Initialize HSV min/max values hMin = sMin = vMin = hMax = sMax = vMax = 0 phMin = psMin = pvMin = phMax = psMax = pvMax = 0 while(1): # Get current positions of all trackbars hMin = cv2.getTrackbarPos('HMin', 'image') sMin = cv2.getTrackbarPos('SMin', 'image') vMin = cv2.getTrackbarPos('VMin', 'image') hMax = cv2.getTrackbarPos('HMax', 'image') sMax = cv2.getTrackbarPos('SMax', 'image') vMax = cv2.getTrackbarPos('VMax', 'image') # Set minimum and maximum HSV values to display lower = np.array([hMin, sMin, vMin]) upper = np.array([hMax, sMax, vMax]) # Convert to HSV format and color threshold hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) mask = cv2.inRange(hsv, lower, upper) result = cv2.bitwise_and(image, image, mask=mask) # Print if there is a change in HSV value if((phMin != hMin) | (psMin != sMin) | (pvMin != vMin) | (phMax != hMax) | (psMax != sMax) | (pvMax != vMax) ): print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax , vMax)) phMin = hMin psMin = sMin pvMin = vMin phMax = hMax psMax = sMax pvMax = vMax # Display result image cv2.imshow('image', result) if cv2.waitKey(10) & 0xFF == ord('q'): break cv2.destroyAllWindows()

  2. How to remove a watermark from a document image?

    Since we know the watermark is pink colored, we can use a two pass HSV color threshold approach. The first pass is to remove the majority of the watermark while keeping letters intact, the second is to filter out even more pink. Here's a potential solution:
    1st pass HSV color threshold. Load the image, convert to HSV format, then HSV color threshold for binary image.
    Dilate to repair contours. Because any type of thresholding will cause the letters to become washed out, we need to repair contours by dilating to reconstruct some of the characters.
    2nd pass HSV color threshold. Now we bitwise-and the original image with the 1st pass HSV mask to get an intermediate result but there are still pink artifacts. To remove them, we perform a 2nd pass HSV threshold to remove pink around characters by generating a new mask.
    Convert image to grayscale then remove pink contours. We convert the result of the 1st HSV color threshold to gray then switch the background from black to white. Finally we apply the result of the 2nd pass HSV mask to get our final result.

    Input image -> 1st HSV mask + dilation -> bitwise-and



    Notice how the background pink is gone but there are still pink artifacts around letters. So now we generate a 2nd mask for the remaining pink.
    2nd mask -> convert to grayscale + invert -> applied 2nd mask to get result



    Enlarged result

    Code
    import numpy as np import cv2 # Load image, convert to HSV, then HSV color threshold image = cv2.imread('1.jpg') original = image.copy() hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) lower = np.array([0, 0, 0]) upper = np.array([179, 255, 163]) mask = cv2.inRange(hsv, lower, upper) # Dilate to repair kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3)) dilate = cv2.dilate(mask, kernel, iterations=1) # Second pass of HSV to remove pink colored = cv2.bitwise_and(original, original, mask=dilate) colored_hsv = cv2.cvtColor(colored, cv2.COLOR_BGR2HSV) lower_two = np.array([96, 89, 161]) upper_two = np.array([179, 255, 255]) mask_two = cv2.inRange(colored_hsv, lower_two, upper_two) # Convert to grayscale then remove pink contours result = cv2.cvtColor(colored, cv2.COLOR_BGR2GRAY) result[result <= 10] = 255 cv2.imshow('result before removal', result) result[mask_two==255] = 255 cv2.imshow('dilate', dilate) cv2.imshow('colored', colored) cv2.imshow('mask_two', mask_two) cv2.imshow('result after removal', result) cv2.waitKey()
    Depending on the image, you may need to adjust the lower/upper HSV ranges. To determine the HSV lower/upper ranges, you can use this HSV thresholder script with sliders so you don't need to guess and check. Just change the image path
    import cv2 import numpy as np def nothing(x): pass # Load image image = cv2.imread('1.jpg') # Create a window cv2.namedWindow('image') # Create trackbars for color change # Hue is from 0-179 for Opencv cv2.createTrackbar('HMin', 'image', 0, 179, nothing) cv2.createTrackbar('SMin', 'image', 0, 255, nothing) cv2.createTrackbar('VMin', 'image', 0, 255, nothing) cv2.createTrackbar('HMax', 'image', 0, 179, nothing) cv2.createTrackbar('SMax', 'image', 0, 255, nothing) cv2.createTrackbar('VMax', 'image', 0, 255, nothing) # Set default value for Max HSV trackbars cv2.setTrackbarPos('HMax', 'image', 179) cv2.setTrackbarPos('SMax', 'image', 255) cv2.setTrackbarPos('VMax', 'image', 255) # Initialize HSV min/max values hMin = sMin = vMin = hMax = sMax = vMax = 0 phMin = psMin = pvMin = phMax = psMax = pvMax = 0 while(1): # Get current positions of all trackbars hMin = cv2.getTrackbarPos('HMin', 'image') sMin = cv2.getTrackbarPos('SMin', 'image') vMin = cv2.getTrackbarPos('VMin', 'image') hMax = cv2.getTrackbarPos('HMax', 'image') sMax = cv2.getTrackbarPos('SMax', 'image') vMax = cv2.getTrackbarPos('VMax', 'image') # Set minimum and maximum HSV values to display lower = np.array([hMin, sMin, vMin]) upper = np.array([hMax, sMax, vMax]) # Convert to HSV format and color threshold hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) mask = cv2.inRange(hsv, lower, upper) result = cv2.bitwise_and(image, image, mask=mask) # Print if there is a change in HSV value if((phMin != hMin) | (psMin != sMin) | (pvMin != vMin) | (phMax != hMax) | (psMax != sMax) | (pvMax != vMax) ): print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax , vMax)) phMin = hMin psMin = sMin pvMin = vMin phMax = hMax psMax = sMax pvMax = vMax # Display result image cv2.imshow('image', result) if cv2.waitKey(10) & 0xFF == ord('q'): break cv2.destroyAllWindows()

Solution 1

Since we know the watermark is pink colored, we can use a two pass HSV color threshold approach. The first pass is to remove the majority of the watermark while keeping letters intact, the second is to filter out even more pink. Here’s a potential solution:

  1. 1st pass HSV color threshold. Load the image, convert to HSV format, then HSV color threshold for binary image.

  2. Dilate to repair contours. Because any type of thresholding will cause the letters to become washed out, we need to repair contours by dilating to reconstruct some of the characters.

  3. 2nd pass HSV color threshold. Now we bitwise-and the original image with the 1st pass HSV mask to get an intermediate result but there are still pink artifacts. To remove them, we perform a 2nd pass HSV threshold to remove pink around characters by generating a new mask.

  4. Convert image to grayscale then remove pink contours. We convert the result of the 1st HSV color threshold to gray then switch the background from black to white. Finally we apply the result of the 2nd pass HSV mask to get our final result.


Input image -> 1st HSV mask + dilation -> bitwise-and



Notice how the background pink is gone but there are still pink artifacts around letters. So now we generate a 2nd mask for the remaining pink.

2nd mask -> convert to grayscale + invert -> applied 2nd mask to get result



Enlarged result

enter image description here

Code

import numpy as np
import cv2

# Load image, convert to HSV, then HSV color threshold
image = cv2.imread('1.jpg')
original = image.copy()
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 0])
upper = np.array([179, 255, 163])
mask = cv2.inRange(hsv, lower, upper)

# Dilate to repair
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
dilate = cv2.dilate(mask, kernel, iterations=1)

# Second pass of HSV to remove pink
colored = cv2.bitwise_and(original, original, mask=dilate)
colored_hsv = cv2.cvtColor(colored, cv2.COLOR_BGR2HSV)
lower_two = np.array([96, 89, 161])
upper_two = np.array([179, 255, 255])
mask_two = cv2.inRange(colored_hsv, lower_two, upper_two)

# Convert to grayscale then remove pink contours
result = cv2.cvtColor(colored, cv2.COLOR_BGR2GRAY)
result[result <= 10] = 255
cv2.imshow('result before removal', result)
result[mask_two==255] = 255

cv2.imshow('dilate', dilate)
cv2.imshow('colored', colored)
cv2.imshow('mask_two', mask_two)
cv2.imshow('result after removal', result)
cv2.waitKey()

Depending on the image, you may need to adjust the lower/upper HSV ranges. To determine the HSV lower/upper ranges, you can use this HSV thresholder script with sliders so you don’t need to guess and check. Just change the image path

import cv2
import numpy as np

def nothing(x):
    pass

# Load image
image = cv2.imread('1.jpg')

# Create a window
cv2.namedWindow('image')

# Create trackbars for color change
# Hue is from 0-179 for Opencv
cv2.createTrackbar('HMin', 'image', 0, 179, nothing)
cv2.createTrackbar('SMin', 'image', 0, 255, nothing)
cv2.createTrackbar('VMin', 'image', 0, 255, nothing)
cv2.createTrackbar('HMax', 'image', 0, 179, nothing)
cv2.createTrackbar('SMax', 'image', 0, 255, nothing)
cv2.createTrackbar('VMax', 'image', 0, 255, nothing)

# Set default value for Max HSV trackbars
cv2.setTrackbarPos('HMax', 'image', 179)
cv2.setTrackbarPos('SMax', 'image', 255)
cv2.setTrackbarPos('VMax', 'image', 255)

# Initialize HSV min/max values
hMin = sMin = vMin = hMax = sMax = vMax = 0
phMin = psMin = pvMin = phMax = psMax = pvMax = 0

while(1):
    # Get current positions of all trackbars
    hMin = cv2.getTrackbarPos('HMin', 'image')
    sMin = cv2.getTrackbarPos('SMin', 'image')
    vMin = cv2.getTrackbarPos('VMin', 'image')
    hMax = cv2.getTrackbarPos('HMax', 'image')
    sMax = cv2.getTrackbarPos('SMax', 'image')
    vMax = cv2.getTrackbarPos('VMax', 'image')

    # Set minimum and maximum HSV values to display
    lower = np.array([hMin, sMin, vMin])
    upper = np.array([hMax, sMax, vMax])

    # Convert to HSV format and color threshold
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv, lower, upper)
    result = cv2.bitwise_and(image, image, mask=mask)

    # Print if there is a change in HSV value
    if((phMin != hMin) | (psMin != sMin) | (pvMin != vMin) | (phMax != hMax) | (psMax != sMax) | (pvMax != vMax) ):
        print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax , vMax))
        phMin = hMin
        psMin = sMin
        pvMin = vMin
        phMax = hMax
        psMax = sMax
        pvMax = vMax

    # Display result image
    cv2.imshow('image', result)
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()

Original Author nathancy Of This Content

Solution 2

The Concept

For this, I used two simple HSV masks; one to fade out the logo (using a simple formula), and one to finish off the masking by completely removing the logo.

Here is the original image, the pre-masked image, and the completely-masked image, in that order:



Here is what the two masks look like:


The Output

enter image description here

The Code

import cv2
import numpy as np

def HSV_mask(img_hsv, lower):
    lower = np.array(lower)
    upper = np.array([255, 255, 255])
    return cv2.inRange(img_hsv, lower, upper)
    
img = cv2.imread("image.jpg")
img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_gray[img_gray >= 235] = 255
mask1 = HSV_mask(img_hsv, [0, 0, 155])[..., None].astype(np.float32)
mask2 = HSV_mask(img_hsv, [0, 20, 0])
masked = np.uint8((img + mask1) / (1 + mask1 / 255))
gray = cv2.cvtColor(masked, cv2.COLOR_BGR2GRAY)
gray[gray >= 180] = 255
gray[mask2 == 0] = img_gray[mask2 == 0]

cv2.imshow("result", gray)
cv2.waitKey(0)

The Explanation

  1. Import the necessary libraries:
import cv2
import numpy as np
  1. Define a function, HSV_mask, that will take in an image (that has been converted to HSV color space), and the lower range for the HSV mask (the upper range will be 255, 255, 255), and return the HSV mask:
def HSV_mask(img_hsv, lower):
    lower = np.array(lower)
    upper = np.array([255, 255, 255])
    return cv2.inRange(img_hsv, lower, upper)
  1. Read in the image, image.jpg, and define two more variables that will hold the image converted to HSV and grayscale. For the grayscale image, replace all pixels of it that is greater or equal to 235 with 255; this will remove some noise from the white parts of the image:
img = cv2.imread("image.jpg")
img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_gray[img_gray >= 235] = 255
  1. Define 2 variables, mask1 and mask2, using the HSV_mask function defined before. mask1 will mask out everything but the text, and mask2 will mask out everything but the logo:
mask1 = HSV_mask(img_hsv, [0, 0, 155])[..., None].astype(np.float32)
mask2 = HSV_mask(img_hsv, [0, 20, 0])
  1. Mask the original image with mask1 and a formula that will fade out (but not remove) the logo. This is just a preprocessing step so that we can remove the logo cleanly later:
masked = np.uint8((img + mask1) / (1 + mask1 / 255))
  1. Convert the image with the faded logo to grayscale, and apply mask2 so that all pixels masked out by the mask will be converted back to the original image:
gray = cv2.cvtColor(masked, cv2.COLOR_BGR2GRAY)
gray[gray >= 180] = 255
gray[mask2 == 0] = img_gray[mask2 == 0]
  1. Finally, show the result:
cv2.imshow("result", gray)
cv2.waitKey(0)

Original Author Ann Zen Of This Content

Conclusion

So This is all About This Tutorial. Hope This Tutorial Helped You. Thank You.

Also Read,

ittutorial team

I am an Information Technology Engineer. I have Completed my MCA And I have 4 Year Plus Experience, I am a web developer with knowledge of multiple back-end platforms Like PHP, Node.js, Python and frontend JavaScript frameworks Like Angular, React, and Vue.

Leave a Comment