Computer Vision 101: Working with Color Images in Python

Learn the basics of working with RGB and Lab images to boost your computer vision projects!

Every computer vision project — be it a cat/dog classifier or bringing colors to old images/movies — involves working with images. And in the end, the model can only be as good as the underlying data — garbage in, garbage out . That is why in this post I focus on explaining the basics of working with color images in Python, how they are represented and how to convert the images from one color representation to another.

Setup

In this section, we set up the Python environment. First, we import all the required libraries:

import numpy as npfrom skimage.color import rgb2lab, rgb2gray, lab2rgb
from skimage.io import imread, imshowimport matplotlib.pyplot as plt

We use scikit-image , which is a library from scikit-learn ’s family that focuses on working with images. There are many alternative approaches, some of the libraries include matplotlib , numpy , OpenCV , Pillow , etc.

In the second step, we define a helper function for printing out a summary of information about the image — its shape and the range of values in each of the layers.

The logic of the function is pretty straightforward, and the slicing of dimensions will make sense as soon as we describe how the images are stored.

Grayscale

We start with the most basic case possible, a grayscale image. Such images are made exclusively of shades of gray. The extremes are black (weakest intensity of contrast) and white (strongest intensity).

Under the hood, the images are stored as a matrix of integers, in which a pixel’s value corresponds to the given shade of gray. The scale of values for grayscale images ranges from 0 (black) to 255 (white). The illustration below provides an intuitive overview of the concept.

Source

In this article, we will be working with the image you already saw as the thumbnail, the circle of colorful crayons. It was not accidental that such a colorful picture was selected :)

We start by loading the grayscale image into Python and printing it.

image_gs = imread('crayons.jpg', as_gray=True)fig, ax = plt.subplots(figsize=(9, 16))
imshow(image_gs, ax=ax)
ax.set_title('Grayscale image')
ax.axis('off');

As the original image is in color, we used as_gray=True to load it as a grayscale image. Alternatively, we could have loaded the image using the default settings of imread (which loads an RGB image — covered in the next section) and converted it to grayscale using the rgb2gray function.

Next, we run the helper function to print the summary of the image.

print_image_summary(image_gs, ['G'])

Running the code produces the following output:

--------------
Image Details:
--------------
Image dimensions: (1280, 1920)
Channels:
G : min=0.0123, max=1.0000

The image is stored as a 2D matrix, 1280 rows by 1920 columns (high-definition resolution). By looking at the min and max values, we can see that they are in the [0,1] range. That is because they were automatically divided by 255, which is a common preprocessing step for working with images.

RGB

Now it is time to work with colors. We start with the RGB model . In short, it is an additive model, in which shades of red, green and blue (hence the name) are added together in various proportions to reproduce a broad spectrum of colors.

In scikit-image , this is the default model for loading the images using imread :

image_rgb = imread('crayons.jpg')

Before printing the images, let’s inspect the summary to understand the way the image is stored in Python.

print_image_summary(image_rgb, ['R', 'G', 'B'])

Running the code generates the following summary:

--------------
Image Details:
--------------
Image dimensions: (1280, 1920, 3)
Channels:
R : min=0.0000, max=255.0000
G : min=0.0000, max=255.0000
B : min=0.0000, max=255.0000

In comparison to the grayscale image, this time the image is stored as a 3D np.ndarray . The additional dimension represents each of the 3 color channels. As before, the intensity of the color is presented on a 0–255 scale. It is frequently rescaled to the [0,1] range. Then, a pixel’s value of 0 in any of the layers indicates that there is no color in that particular channel for that pixel.

A helpful note: When using the OpenCV’s imread function, the image is loaded as BGR instead of RGB. To make it compatible with other libraries, we need to change the order of the channels.

It is time to print the image and the different color channels:

fig, ax = plt.subplots(1, 4, figsize = (18, 30))ax[0].imshow(image_rgb/255.0) 
ax[0].axis('off')
ax[0].set_title('original RGB')for i, lab in enumerate(['R','G','B'], 1):
    temp = np.zeros(image_rgb.shape)
    temp[:,:,i - 1] = image_rgb[:,:,i - 1]
    ax[i].imshow(temp/255.0) 
    ax[i].axis("off")
    ax[i].set_title(lab)plt.show()

In the image below, we can see the original image and the 3 color channels separately. What I like about this image is that by focusing on individual crayons, we can see which colors from the RGB channels and in which proportions constitute the final color in the original image.

我来评几句
登录后评论

已发表评论数()

相关站点

热门文章