P vs. NP

An algorithm has polynomial running time if there is a constant c1c \geq 1 such that for every input of length nn, the algorithm takes O(nc)O(n^c) time.

An algorithm has exponential running time if there is a constant c1c \geq 1 such that for every input of length nn, the algorithm takes O(2nc)O(2^{n^c}) time.

Complexity Class P

A decision problem is a problem for which the answer is "yes" or "no". The complexity class P=P = set of all decision problems that can be solved in polynomial time. For example a sorting problem is not in PP, since it does not output "yes" or "no".

Examples of problems in PP:

  • is a given input graph connected
  • is a given input graph bipartite
  • is a given input sequence sorted
  • does a given input graph contain an Euler tour? An Euler tour is a path that traverses each edge exactly once

Other Problems

Hamiltonian Cycle

  • input: An undirected graph G=(V,E)G = (V, E) stored using adjacency lists
  • question: Does GG contain a Hamiltonian cycle? A Hamiltonian cycle is a cycle that traverses each vertex exactly once.

Right now on this planet, nobody knows how to solve this problem in polynomial time. Not known if this problem is in PP!

Travelling Salesman Problem

  • input:
    • A complete directed graph G=(V,E)G = (V, E), where each edge (u,v)E(u,v) \in E has a weigh wt(u,v)>0wt(u,v) > 0
    • An integer K
  • question: Does GG contain a Hamiltonian cycle of total weight at most KK?

Again we don't know if this problem is in PP!

Subset-Sum

  • input: A set SS of integers together with an integer tt
  • question: Is there a subset SS' of SS such that the sum of the subset is equal to tt?

Not known if this problem is in PP!

Clique

  • input: An undirected graph G=(V,E)G = (V,E) together with an integer KK
  • question: DOes GG contain a clique (complete subgraph) with KK vertices?

Not known if this problem is in PP!

Proving

For each of the 4 previous problems

  • Not know if it can be solved in polynomial time
  • If the answer to the question is YES, then
    • there is a "short" proof for this
      • "short" means the length of the proof is "polynomial in the length of the input"
    • if someones gives us such a short proof, then we can "easily" verify this proof
      • here "easily" means "in polynomial time"

How to check if proof is correct?

Hamiltonian cycle

proof/solution: sequence v1,...,vkv_1, ..., v_k of vertices

verification:

  • k=nk=n?
  • no duplicates in v1,...,vkv_1, ..., v_k
  • For each 1i<k1,vi,vi+11 \leq i < k-1, {v_i, v_{i+1}} is an edge?
  • ${v_k, v_1} is an edge?

TSP

proof/solution: sequence v1,...,vkv_1, ..., v_k of vertices

verification:

  • k=nk=n?
  • no duplicates in v1,...,vkv_1, ..., v_k
  • Do we have i=1k1wt(vi,vi+1)+wt(vk,v1)K\sum_{i=1}^{k-1} wt(v_i, v_{i+1}) + wt (v_k, v_1) \leq K?

Subset-sum

proof/solution: A set SS'

verification:

  • is SS' a subset of S?
  • Do we have sum of subset =t=t?

Clique

proof/solution: A set VV'

verification:

  • is VV' a subset of V?
  • Do we have V=K|V'| = K?
  • For each u,vVu,v \in V' such that uvu \neq v, is u,v{u,v} an edge in EE?

Complexity Class NP

A decision problem AA is in NPNP if

  • for a given input II, the answer to the question A(I)A(I) is YES, then there exists a proof/solution/certificate CC such that
    • CC is short (polynomial size in the length of II)
    • in polynomial time, we can verify that CC is a correct proof for the fact that A(I)A(I) = YES

NPNP stands for Nondeterministic Polynomial

Formal P vs. NP

Language of a Decision Problem

The language of a decision problem is the set of all inputs (encoded as finite strings) for which the answer is YES.

Examples:

  • HAM-CYCLE = {GG | GG is a graph that contains a Hamiltonian cycle}
  • TSP = {(G,KG, K) | GG is a complete directed graph G=(V,E)G = (V,E), where each edge (u,v)E(u,v) \in E has a weight wt(u,v)>0wt(u,v) > 0, KK is an integer and GG contains a Hamiltonian cycle with total weight at most KK.}
  • SUBSET-SUM = {(S,t)S(S, t) | S is a set of integers, tt is an integer and SS\exists S' \subseteq S such that xSx=t\sum_{x \in S'} x = t}

Complexity Class P

The language LL (of a decision problem) is in PP if the following is true. There exists an algorithm AA and a constant c1c \geq 1 such that for any input xx,

  • if xL    A(x)x \in L \iff A(x) returns YES.
  • the running time of A(x)A(x) is O(nc)O(n^c), where nn is the length of xx.

Complexity Class NP

The language LL (of a decision problem) is in NPNP if the following is true. There exists an algorithm VV and a constant c1c \geq 1 such that for any input xx,

  • xL    x \in L \iff there exists a certificate yy such that
    • y=O(xc)|y| = O(|x|^c)
    • V(x,y)V(x,y) returns YES
    • and the running time of V(x,y)V(x,y) is polynomial in the length of xx.

Observe that VV is a verification algorithm. It has 2 input parameters.

Example: Hamiltonian Cycle

{G:GG : G is a graph that has a Hamiltonian cycle} is in NP

Verification algorithm V takes as input

  • graph G=(V,E)G = (V,E), where n=vn = |v|
  • certificate v1,v2,...,vkv_1, v_2, ..., v_k

It does the following:

  1. check is k=nk = n
  2. check if v1,v2,...,vkv_1, v_2, ..., v_k are all different
  3. check if v1,v2,v2,v3,...,vn,v1{v_1, v_2}, {v_2, v_3}, ..., {v_n, v_1} are edges in GG
  4. if steps 1-3 were successful, return YES, otherwise NO.

Theorem

PNPP \subseteq NP

Proof: Let LL be an arbitrary language (of a decision problem) in PP. By definition, there is an algorithm AA such that for any input xx,

  • xL    A(x)x \in L \iff A(x) returns YES.
  • The running time of A(x)A(x) is polynomial in the length of x.

We have to show that LL is in NPNP. That is, we have to show that LL satisfies the definition of NPNP.

Therefore, we need a verification algorithm VV. We define it in the following way: VV takes as input.

  • the input xx for AA
  • and a certificate yy

V(x,y)V(x,y) does the following: it runs A(x)A(x) and that's it! (It ignores yy).

We then have the following:

  • xLx \in L
    •     A(x)\iff A(x) returns YES
    •     V(x,\iff V(x, empty string yy) returns YES
    •     \iff there exists a certificate yy such that
      • y|y| is polynomial in the length of xx,
      • V(x,y)V(x,y) returns YES
      • and the running time of V(x,y)V(x,y) is polynomial in the length of xx.

Therefore LL is in NP.

Big Question

Is P=NPP=NP or PNPP \neq NP? Most people believe the latter.

If we want to prove that PNPP \neq NP, then we have to show that there exists a language LL (of a decision problem) such that

  • LNPL \in NP
  • L∉PL \not\in P.

Such an LL must be "difficult". So we should look at the "most difficult" problems.

But what does this mean? How can we measure how difficult a problem is?

Reductions

Polynomial-Time Reduction

Let LL and LL' be two languages. We say that LL is a polynomial-time reducible to LL' if the following is true: there exists a function ff which satisfies the following famous 3 properties:

  • ff maps inputs for LL to inputs for LL'
  • for ever input xx for LL, xL    f(x)Lx \in L \iff f(x) \in L'
  • for every input xx for LL, f(x)f(x) can be computed in time that is polynomial in the length of xx.

Notation: LPLL \leq_P L'

Theorem

If LPLL \leq_P L' and LPL' \in P, then LPL \in P.

Intuition:

  • LPL' \in P means "LL' is easy"
  • LPLL \leq_P L' means "LL is easer than LL'".
  • So LL is easy, LPL \in P.

Consider the following algorithm AA: on input xx,

  • Compute f(x)f(x)
  • Run A(f(x))A'(f(x))

We have xLx \in L

  •     f(x)L\iff f(x) \in L' by definition of reduction
  •     A(f(x))\iff A'(f(x)) returns YES by definition of AA'
  •     A(x)\iff A(x) returns YES by definition of AA

The running time of AA is polynomial in the length of xx. So LPL \in P.

About 3SAT

Consider a Boolean formula ϕ\phi with variables x1,x2,...,xnx_1, x_2, ..., x_n of the form ϕ=C1C2...Cm\phi = C_1 \land C_2 \land ... \land C_m, where each CiC_i is of the form Ci=l1il2il3iC_i = l_1^i \lor l^i_2 \lor l^i_3. Each ljil^i_j is a variable or the negation of a variable. CiC_i is called a clause and ljil_j^i is called a literal.

phi=(x1¬x1¬x2)(x2x3x4)(x1x3¬x4)(¬x1x3¬x4)phi = (x_1 \lor \neg x_1 \lor \neg x_2) \land (x_2 \lor x_3 \lor x_4) \land (x_1 \lor x_3 \lor \neg x_4) \land (\neg x_1 \lor x_3 \lor \neg x_4)

We say that ϕ\phi is satisfiable if there exists a truth value for each x1,x2,...,xnx_1, x_2, ..., x_n such that ϕ\phi is true.

3SAT = ϕϕisoftheformaboveandϕissatisfiable{\phi | \phi is of the form above and \phi is satisfiable}

We want to show that 3SAT P\leq_P INDEP - SET

Let ϕ\phi be an input for 3SAT, ϕ=C1C2...Cm\phi = C_1 \land C_2 \land ... \land C_m, each clause CiC_i is the disjunction (V)(V) of 3 literals.

(G;K)(G;K) is obtained as follows:

  • K=mK=m(number of clauses)
  • GG has 3m3m vertices, one vertex for each literal.
    • For each clause, the literals in this clause form a triangle in GG.
    • Additionally, there is an edge between any pair of opposite literals.

Theorem

The relation P\leq_P is transitive: LPLL \leq_P L' and LPLLPLL' \leq_P L'' \Rightarrow L \leq_P L''

NP-Hard and NP-Complete

The language LL is NPNP-Hard if

  • for all LNPL' \in NP, LPLL' \leq_P L

The language LL of a decision problem is NPNP-Complete if

  • LNPL \in NP
  • and for all LNPL' \in NP, LPLL' \leq_P L

It is not clear if a NPNP-Complete problem exists.

Intuitively, this means that LL belongs to the most difficult problems in NPNP. This what we were looking for in Formal P vs. NP.

Theorem

Assume that LL is NPNP-Complete. Then LP    P=NPL \in P \iff P = NP.

Intuition:

  • If LP,LL \in P, L is easy
  • LL is NPNP-Complete, so LL belongs to the most difficult problems in NPNP.
  • If the most difficult problem in NPNP turns out to be easy, then all problems in NPNP are easy!

Theorem

LL is NPNP-Complete, LPLL \leq_P L', LNPLL' \in NP \Rightarrow L' is NPNP-Complete.

Intuition:

  • LL is NPNP-Complete, so LL belongs to the most difficult problems in NPNP.
  • LL' is at least as difficult as LL
  • Then LL' also belongs to the most difficult problems in NPNP

Here is how to use this theorem; to show that LL' is NPNP-Complete,

  • Show that LNPL' \in NP
  • Look for a problem LL that is "similar" to LL' and that is known to be NPNP-Complete
  • Show that LPLL \leq_P L'.

CIRCUIT-SAT

input: A Boolean circuit

  • Directed acyclic graph, where vertices are gates
  • AND-gates and OR-gates have indegree 2
  • NOT-gates have indegree 1
  • Known input gats have indegree 0 and are labeled TRUE or FALSE
  • Unknown gates have indegree 0 and are labeled "?"
  • There is one output gate (whose outdegree is 0)

question: is it possible to assign a truth-value to each unknown input gate, such that the output of the circuit is TRUE?

CIRCUIT-SAT = {BB | BB is a Boolean circuit for which there exist truth values for the unknown input gates such that the output of BB is TRUE}

To show that CIRCUIT-SAT is NPNP-Complete, we have to show two things:

  1. CIRCUIT-SAT is in NPNP
    • Certificate: sequence of truth values for the unknown input gates
    • Verification: evaluate the circuit (in topological order)
  2. Show that for all LL in NPNP, LPL \leq_P CIRCUIT-SAT
    • Let LNPL \in NP. We need a function ff such that
      1. ff takes any input xx for LL and produces an input B=f(x)B = f(x) for CIRCUIT-SAT
      2. xL    Bx \in L \iff B \in CIRCUIT-SAT
      3. The time to compute BB is polynomial in the length of xx

We know that LNPL \in NP. So there is a verification algorithm VV such that

  • the input to VV is (x,y)(x,y), where xx is an input for LL and yy is a certificate
  • For every input xx to LL,
    • xL    x \in L \iff there exists a certificate yy such that
      • yxc|y| \leq |x|^c
      • V(x,y)V(x,y) returns YES
      • and the running time of V(x,y)V(x,y) is at most xc|x|^c

We now define the function ff. Let xx be an input for LL. Define a new algorithm VxV_x:

  • input is a string yy of length at most xc|x|^c
  • Vx(y)V_x(y) runs V(x,y)V(x,y)
  • If V(x,y)V(x,y) terminates in at most most |x|^c' steps, then Vx(y)V_x(y) terminates and returns the output of V(x,y)V(x,y)
  • If V(x,y)V(x,y) has not terminated after |x|^c' steps, then Vx(y)V_x(y) terminates and returns NO.

Observe:

  • Running time of algorithm VxV_x is at most |x|^c'