Three Interesting Algorithms you should know before you die

At university, about eight or nine years ago, I learned all these algorithms. At that time, I didn't know if I was ever going to use them. Today, I see just how important all of them are and... no, c'mon, seriously, I can't lie to you. The truth is a lot of them will never be useful to the average developer. Ever! However, some of them are essential if you want to become an well-rounded developer.

Now, you can say: The most important algorithms are already implemented in the libraries of most languages! Why should I learn them?. While it's true that most important algorithms and data structures are already implemented, like quicksort for sorting or KMP for substring search, many times you'll find a problem that can't be tackle by the standard use of known algorithms or data structures. Sometimes you must use modifications of these algorithms or apply the techniques used to conceive them to solve it. Below, I briefly explain three simple algorithms you should learn before you die.

Although very basic and simple, binary search is a must in any developer's Swiss army knife. Suppose that you have an array of size n and you want to find whether a specific object is in that array. Note that this situation comes up frequently in the life of every developer. Normally, you just sweep the array until you either find the object or reach the end of it. In the worst case, the number of comparisons to do this is n. Indeed, if you know nothing about the objects in the array, this is basically the way to do it. However, what if you know beforehand that the array is sorted? Is there a better way than sweeping the array looking for the object? Indeed there is.

Suppose that you have an array A of numbers sorted in ascending order and you want to find a specific number k. The main idea of binary search is, considering that the array is sorted, if you have that k < A[i], then you know that k < A[j] for all ji, and, therefore, in the next steps, you only have to search up to the index i-1. Analogously, if k > A[i], then you know that k > A[j] for all ji, and, therefore, in the next steps, you only have to search from the index i+1. The algorithm that performs the search in the range begin:end of the array is the following:

binary_search(A,k,begin,end)
    while(begin <= end)
        mid := (begin+end)/2 # compare k and A[mid]
        if(A[mid] = k) # found it
            return mid
        if(k < A[mid]) # k is lesser, new range is begin:mid-1
            end := mid-1
        else # k is greater, new range is mid+1:end
            begin := mid+1
    return -1

In the worst case, binary search makes only roughly log2 n comparisons to find the index of an object in a sorted array. To put things into perspective, if we want to find an object within an array with one million elements, in the worst case, sweeping will make one million comparisons while binary search will make only twenty. The only drawback of the binary search approach is that the array must be sorted for it to work. So, if you're going to do a lot of searching, sort the array first and then use binary search. Otherwise, the good ol' sweeping is the way to go.

Mergesort

Mergesort is one of those algorithms that is not used too much because there is another one that does the same job better. Mergesort is a sorting algorithm outshined by quicksort. Although both of them make roughly the same number of comparisons in the worst case, in practice quicksort is faster and has a low memory requirement if compared to mergesort, which requires an auxiliary array as long as the original. Now, you must be asking If quicksort is better, why should I learn mergesort? Mergesort is interesting not for what it does, but because of the technique used to conceive it. Let us see how mergesort works.

First, it sorts each half of the array recursively. Then, it merges the two halves to sort the entire array. The trick here is how it merges the two halves: since they are already sorted, the algorithm only have to sweep each half once. The algorithm that sorts the array in ascending order in the range begin:end is below.

mergesort(A,begin,end)
    if (begin = end) # If the range has only one element, do nothing
        return
    mid := (begin+end) / 2
    mergesort(A,begin,mid) # sort the first half
    mergesort(A,mid+1,end) # sort the second half
    merge(A,begin,mid,end) # merge the two halves
merge(A,begin,mid,end)
    i := begin, j := mid + 1, count := 0
    while((i <= mid) and (j <= end)) # merge time!
        count := count + 1
        if(A[i] <= A[j])
            aux_A[count] := A[i]
            i := i + 1
        else
            aux_A[count] := A[j]
            j := j + 1
    if(j > end) # if the second half ended first
        append the rest of the first half (A[i:mid]) to aux_A
    else # if the first half ended first
        append the rest of the second half (A[j:end]) to aux_A
    A[begin:end] = aux_A # copy aux_A into A ranging from begin to end

The technique used to devise mergesort is called Divide and Conquer. Divide and Conquer is a technique employed in many algorithms and has basically three phases: Divide, Conquer and Combine. First, you divide the problem into smaller subproblems (Divide), then you recursively solve these smaller subproblems (Conquer) and, finally, you combine the solutions to obtain the solution of the larger problem (Combine). In almost all divide and conquer algorithms, the Combine phase is the most complex. Mergesort is no different.This technique often produces efficient and clean algorithms.

There are many problems that can be solved by divide and conquer algorithms such as closest pair of points, fast matrix multiplication and fast fourier transform. In fact, binary search and quicksort can also be considered divide and conquer algorithms.

Kadane’s algorithm

The maximum subarray problem consists in, given an array A of numbers, determine the subarray of A with the greatest sum. It's not hard to imagine real world applications where this problem might appear. For instance, given the prevision of the value of some stock for the next n days, determine the day one should purchase and sell the stock for maximum profit. In this case, if you construct an array A of oscillations of the stock's prices for each day, you task is really to determine the maximum subarray of A.

In an array of size n, the naive approach computes the value of every subarray, which gives you an algorithm that makes a number of operations in the order of n³ (or O(n³) operations for those familiar with asymptotic notation). The almost naive approach, first, precomputes the sum of all prefixes of the array, making a number of operations in the order of n and, then, proceeds in the same way as the naive approach, but now the sum of the range i:j can be computed instantly. Thus, the almost naive algorithm makes O(n²) operations.

Excited by the simplicity and power of the divide and conquer algorithms, someone could try to employ it to solve the maximum array subproblem. In fact, this person would be successful and his algorithm would make O(n · log n) operations. However, there is an algorithm that solves the problem and makes O(n) operations, which is Kadane's algorithm. The main idea of this algorithm is to store the value of the maximum subarray that ends at index i. The following algorithm computes the value and indexes of the maximum subarray.

kadane(A)
    max_value := 0, max_beg := 0, max_end := 0
    # At the end of each iteration i, curr_value will store...
    # ... the max subarray value that ends at index i
    curr_value := 0, curr_beg := 1
    i := 1, n := size of A
    while(i <= n)
        if (curr_value + A[i] < A[i])
            curr_value := A[i]
            curr_beg := i
        else
            curr_value := curr_value + A[i]
        if(max_value < curr_value)
            max_value := curr_value
            max_beg := curr_beg
            max_end := i
        i := i + 1
    return [max_beg,max_end,max_value]

Kadane's algorithm above is not only useful, but, as is the case of mergesort, the technique used to create it is also very powerful. The technique is called Dynamic Programming, which is a fancy way to say storing stuff so you don't have to recompute it when you need it again later. The complexity of such algorithms resides in determining the right stuff that needs storing. This technique is used in many algorithms such as 0-1 Knapsack Problem, DNA Sequence Alignment and Matrix Chain Multiplication.

Conclusion

In this post, I explored three simple algorithms that are relevant and presented very important algorithmic techniques used in them. Nowadays, in order to be a well-rounded developer, in addition to have good coding skills (knowing languages and design patterns, having a clean code, etc), you also must know about construction of algorithms and how to analyze their efficiency a I hope this post helped you do that.