How complexity classes can help you avoid a bar fight!

Bar Fight

Imagine that you work as a programmer in a company that manages a very big bar that receives a lot of customers on weekends. Since you spend all your day watching videos and reading the DailyDrip blog, your job is hanging by a thread. One day, your boss arrives at your desk and says that they have been having problem with brawls at the bar. The problem is that every time two people who don't like each other get drunk in the bar, they fight!

Your boss gives you one week to write a program that, given the list of n people that will go to the bar and the list of pairs of people that don't like each other, determines who the bouncer should accept or reject at the door of the bar in order to avoid fights, but with no more than k rejections, to keep up the profits, or you're fired! The figure below shows an instance and a solution for k = 3, where an edge between two people means they don't like each other.

Pairs of people who don't like each other

So you immediately think: Piece of cake! All I have to do is try all possibilities, right?. So you devise and implement the following algorithm:

Bar_Fight_Prevention(People,Pairs,k)
    for each subset S of People where |S| <= k do:
        if {p1,p2} are not in Pairs for every p1,p2 in People - S then
            return S
    return "It's not possible!"

After implementing it, you begin the testing with just 50 people. You test it for k=3 and works all right. Then for k=6 and it works just fine. However, when you test it for k=12, you give up after waiting 1 hour for the program to finish. In fact, after analyzing the algorithm for a bit, you conclude that the problem is that, in the worst case, you try all subsets of size at most k, that is, the number of subsets verified has a growth rate of roughly number of peoplek which, for 50 people and k = 12, is over 9000 172 billions subsets. Obviously, any algorithm that takes hours to finish just won't do in this kind of situation. You need something faster to present to your boss!

It's not hard to see that any algorithm whose worst case's time (number of instructions) depends exponentially on the number of people or k will take too long to finish its computation when the number of people or k is a (not so) large number. So, as the week goes by, you try to devise an algorithm whose worst case's time depends only linearly (or, at least, polynomially with a low degree) on the number of people and k without success. You begin to wonder: does such an algorithm even exist?

Complexity Classes

Such question very frequently arises in the computational complexity field: Does problem X have an algorithm with properties Y or such algorithm does not exist? and, although we can't always have a definitive answer, there are some tools that can help us to reason about this type of question. One of those tools is the classification of computational problems in complexity classes.

A complexity class is nothing more than a set of computational problems with some property in common. Some of the most prominent of these classes are the classes P, NP and NP-complete. Perhaps you've heard of the famous one million dollar problem P vs NP.

The class P is the set of all decision problems that can be solved by a polynomial-time algorithm. Roughly speaking, a decision problem is a problem that is answered by a simple YES or NO, and a polynomial-time algorithm is an algorithm whose number of instructions that it executes in the worst case is a polynomial function on the size of the problem's input. One example of problem in P is the problem to determine whether a number is prime or not.

The class NP is the set of all decision problems that can be verified by a polynomial-time algorithm. A verifier of a decision problem is an algorithm that takes an input of the problem and a witness, which is supposed to be a proof that the right answer for the input is YES and must have polynomial size, and outputs YES if the witness is indeed such proof or, if the witness is no such proof, outputs NO. Thus, a problem can be verified in polynomial time if there is a polynomial-time verifier such that the answer of an input x is YES if and only if there is a witness w such that the verifier outputs yes with input (x,w). Clearly, we have that PNP since if a problem is solved in polynomial time it also can be verified in polynomial time.

There are many problems that arise in the everyday of a developer that are in NP. An example of problem in NP is the Hamiltonian Path. The Hamiltonian Path Problem consists in, given a graph, determining whether there is a path that goes through all vertices of the graph. In this case, the witness of the Hamiltonian Path is a sequence of the vertices of the graph and the verifier outputs YES if and only if this sequence is a permutation of the vertex set of the graph and if there is an edge between any two consecutive vertices in the permutation, both which clearly can be done in polynomial time.

Before we present the class NP-complete, we will define a Karp reduction. A Karp reduction from a decision problem A to a decision problem B is a polynomial-time algorithm that transforms inputs x of A into inputs y of B such that the answer of x is YES if and only if the answer of y is YES. Basically, if a decision problem B has a polynomial-time algorithm and there is a Karp reduction from A to B, then A also has a polynomial-time algorithm.

The class NP-complete is the subset of all problems A in NP such that there is a Karp reduction from each problem in NP to A. However, in order to prove that a problem is in NP-complete, it is enough to prove that it's in NP and that there is a Karp reduction from another NP-complete problem to it, because Karp reductions are transitive. Due to the definition of the NP-complete class, if someone can find a polynomial-time algorithm for a problem in NP-complete, then all problems in NP also have a polynomial-time algorithm, which would imply that P = NP.

An example of problem in NP-complete is the Independent Set Problem, which consists in, given a graph and an integer k, determine if there is a set with, at least, k vertices of the graph that are not neighbors between themselves.

Ok, now you can ask me How will all of this stuff help me find a fast algorithm for my problem! and my answer is: It won't. However, it will help to convince your boss not to fire you. How? Well, basically there is a very strong belief among scientists that if a problem is in NP-complete, then it's not in P. This belief comes from the fact that no one has ever succeeded in finding a polynomial-time algorithm for an NP-complete problem despite the huge amount of work in that direction and the large number of problems that we already know to be in NP-complete. So far, there are two possibilities for the P vs NP question, which are depicted below.

Pairs of people who don't like each other'

So, if you can prove that a problem is NP-complete, then you will have strong evidence that it's not in P and, then, you can say to your boss Ok, I can't really find a good algorithm for this problem, but neither can everyone else!.

Actually, the problem in the beginning of this post is the Vertex Cover Problem which consists in, given a graph and an integer k, determine whether there is a vertex cover of the graph with size at most k (which is a set S of vertices of the graph such that every edge of the graph has at least one end in S) and it's in NP-complete. To prove that, note that it's in NP, the witness is a set S of vertices of the graph and the verifier just has to check whether the size of S is at most k and whether all edges have an endpoint in S, and that there is a Karp reduction from the Independent Set Problem to it. In fact, a set of vertices is an independent set of a graph if and only if its complement is a vertex cover of the same graph. Thus, the reduction would simply transform the input of the Independent Set Problem (G,k) into the input of the Vertex Cover Problem (G,n-k), where n is the number of vertices of the graph G.

Conclusion

The fact that a problem is in P doesn't always mean that it can be solved fast since the degree of the polynomial of the function describing the time of the algorithm can be very high. Also, if a problem is in NP-complete it doesn't exclude the possibility of it having an approximation or fixed parameter tractable algorithm or even a good algorithm for some special cases of the input. For example, although there is (possibly) no polynomial-time algorithm for the Vertex Cover Problem if the number of clients is around 25 at most, the strategy of trying all subsets with size k at most will work.

Nevertheless, it's important for a developer to understand some of the tools used in theoretical computer science like complexity classes. It can save a lot of time and trouble for someone who's trying to find a really fast algorithm for his problem when there's probably none.