3.1 Search Algorithms

Linear Search

Linear search is an easily-understood algorithm to check if a value is in an array. The logic behind it is quite simple:

Start at the beginning of the array. If the value there is equal to the value that you’re trying to find, stop. If not, go to the next value and repeat until you’ve reached the end of the array.

For example, if we were trying to see if 3 was in the array [1, 6, 7, 2, 3, 56]:

index	stored value	equal to 3?
0	1	No
1	6	No
2	7	No
3	1	No
4	3	Yes

Pretty intuitive, right? Linear Search is very intuitive, and thus very easy to code, as we see below.


def linear_search(arr, val) -> int:
    """
    Search the provided array for the provided value
    and get the index, if found
    Arguments:
        arr - the array to search in
        val - the value to search for
    Returns:
        int - the index of the value if it was found, else -1
    """
    for i in range(len(arr)):
        if arr[i] == val:
            return i
    return -1

View code on GitHub.

Linear Search - Pros and Cons

Cons - Linear Search has O(n) time complexity. This is because, in cases where the item is not found, it will search all the items in the list.

Pros - Linear Search works on all lists and all datatypes. It doesn’t matter if the list is sorted or not since Linear Search will just check if it is equal and, if not, keep checking every other item

Binary Search

Let’s say you have an array containing many values, and you want to check whether a value is present in the array. If the value is present in the array, then return the index of the value, else return a value of -1. Binary Search is a fast way to quickly find objects in ordered arrays or confirm that the value isn’t present in the array. We’ll get to why the array has to be ordered later. A simple way to learn how Binary Search works is by playing a simple game.

I’m thinking of a random number between 1 and 100. Your job is to guess the number I’m thinking of in the least possible number of guesses, and every time you guess a number, I’ll tell you whether your guess is larger or smaller than the number I’m thinking of. So, what’s your first guess?

Many people would guess 50 presented the problem above. And it makes the most sense, because after I say whether your guess is too high or too low, you’ve trimmed the number of possible numbers I’m thinking of in half. Let’s continue playing our game: my number is larger than 50.

The next logical guess most people would pick is 75. Since we know from our last guess that the number I’m thinking of must be between 50 and 100, we want to cut our possibilities in half again. The number directly in the middle of 50 and 100 is 75, so once I say whether my guess is higher or lower, you’ve once again cut the number of possible numbers in half. Continuing our game: my number is smaller than 75, good guess.

This process would repeat again and again until you get the correct answer. Since we know that the number is between 50 and 75, we have an upper and lower bound (both are exclusive). Once again our strategy is to pick the number exactly in between the upper and lower bound. In this case, 62.5 is the number in the middle, but since my number is a whole number, we can round up our guess to 63. The game would continue as follows:

Guess #	Boundaries	Guess	Result
3	50 - 75	63	my number is larger than 63
4	63 - 75	69	my number is larger than 69
5	69 - 75	72	my number is larger than 72
6	72 - 75	73	my number is larger than 73
7	73 - 75	74	you got it!

In this way, we were able to guess the number in 7 guesses. This was actually the worst case scenario, because every time we made a guess between our upper and lower bound, we had a small chance of that being the correct answer. In our case, we had to narrow it down to exactly one possibility before we got the answer. Even so, 7 guesses is quite small considering we had 100 possible answers in the beginning of the game. Let’s see how this principle of trimming our possibilities in half after every guess can be applied to checking for a value in an array as fast as possible.

Say we have the following array:

my_array = [3, 5, 20, 42, 43, 51, 92, 132, 430];

We need to check whether the array contains the value 20. Without binary search, we might iterate through every element in the array and check if the value is equal to 132. But with binary search, and the knowledge that the array is ordered, we can do this much faster. Let’s start with the index in the middle of the array. In this case, the length of the array is 9, so the middle index is 4, which gives us the value 43. We can check if 43 is greater than, less than, or equal to our number, 20. Since 43 is greater than 20, and the list is ordered, we don’t need to check any of the indices higher than 4. Our next index would be 2, because that is the index in between 0 and 4. The value of index 2 is 20, and we would again check if 20 is greater than, less than, or equal to our number. Since 20 is equal to 20, we can return index 2.

Suppose we were looking for the value 16 in my_array. We would once again start with index 4, realize that 43 is larger than 16, and go to index 2. We would realize that 20, the value at index 2, is also greater than 16, and go to index 1. The value of index 1 is 5, which is less than 16. Since we know that all the indices at 2 and higher have a value greater than 16, and the indices 1 or less have values less than 16, so 16 can not be in the array.

This is the general idea as to how binary search works. We can see it in code below:


def binary_search(lst, item):
    """
    Arguments:
        lst - a list sorted in ascending order
        item - the item that we want to find
    Returns:
        the idx of the item or -1 if not found
    """
 
    low_bound = 0
    upper_bound = len(lst) - 1
 
    # take the average, but make sure it's an integer
    cur_idx = (low_bound + upper_bound) // 2
 
    while low_bound <= upper_bound:
        if lst[cur_idx] == item:
            return cur_idx
        if lst[cur_idx] < item:
            # it was an undershot, so set this as the new lower bound
            low_bound = cur_idx + 1
        else:  # lst[cur_idx] > item)
            # it was an overshot, so set this as the new upper bound
            upper_bound = cur_idx - 1
        # update cur_idx
        cur_idx = (low_bound + upper_bound) // 2
    return -1

View code on GitHub.

Binary Search - Pros and Cons

Pros - Binary search has a great runtime. Compared to linear search, which has a time complexity of O(n), binary search has a time complexity of O(log(n)).

Cons - It only works on sorted arrays since we can’t make any of the assumptions we made during the process if the array was not ordered.

Practice

Searching Algorithms

Let's see the difference between linear and binary searches! Some of the algorithm is already done for you, but you will have to fill in some areas. Then, run the code and you can see the results.

💻 Template Code

✅ Solution Code

Next Session

2️⃣

3.2 Sorting Algorithms

⚖️

Copyright © 2021 Code 4 Tomorrow. All rights reserved. The code in this course is licensed under the MIT License. If you would like to use content from any of our courses, you must obtain our explicit written permission and provide credit. Please contact classes@code4tomorrow.org for inquiries.