2.3 The Data

How Computers Read Pictures Step by Step Image Reading Implementation with CV2

How Computers Read Pictures

Unlike humans, who see using photoreceptor cells in the eye, all neural networks, including CNNs, work by inputting numbers as data, leaving the tricky task of turning a picture into numbers. Fortunately, we can use the pixel values of images as our ‘numbers’, allowing a computer to use images as data without sacrificing on quality.

Step by Step Image Reading

In a black and white image, reading the image is a simple task. First, the computer reads the image and, going through each pixel, converts it to a number between 0 and 255. The process is the same for rgb pictures, but we have to change the number of channels - we use the word channels to describe how many pixel values a computer extracts from a single pixel in the image. Because black and white pixels are just a range between black and white, we only use one channel, but in rbg images, we have to use 3 channels because the computer extracts 3 values - red, green, and blue - from each pixel in the image. For example, in the image below, the computer would extract 3 values from each pixel, a red value, a green value, and a blue value.

Implementation with CV2

If we take a basic picture, such as the one above, we can turn it into a list of pixel values using libraries like CV2 or Tensorflow. In the example below, we use CV2’s imread function to extract the pixel values from the image. After we finish extracting the pixel values, we have to reshape the list to make sure that our neural network can read it. This is because the neural network needs certain information, such as the number of images, the image size, and the number of channels in order to work. When we reshape our image data below, we use the -1 to show how many images are in the dataset (-1 tells the computer to carry the number of images from the old list). Finally, the number of channels should be 1 for grayscale images and 3 for rgb images. For more information, you can look at the official documentation for the imread function here.


import cv2

image_file = 'image_file_path'

image_pixel_values = cv2.imread(image_file) 

#if we wanted a grayscale image, we can add format = grayscale at the end

image_pixel_values.reshape(-1, image_size, image_size, num_channels)

Previous Section

Next Section

2.1 Conceptual Understanding

2.3 Building the Model

⚖️

Copyright © 2022 Code 4 Tomorrow. All rights reserved. The code in this course is licensed under the MIT License. If you would like to use content from any of our courses, you must obtain our explicit written permission and provide credit. Please contact classes@code4tomorrow.org for inquiries.