An Introduction to Support Vector Machines (SVM): Convex Optimization and Lagrangian Duality Principle
Published:
In the last post we have conquered how to use gradient descent algorithm to train a SVM. Although using GD can solve the SVM optimization, GD has some shortcomings:
- Gradient procedure is time consuming and the solution may be suboptimal.
- GD method cannot explicitly identify support vectors (points) which determine the hyperplane.
To overcome these shortcomings, we can take advantage of the Lagrangian duality. First we convert original SVM optimization problem into a primal (convex) optimization problem, then we can get the Lagrangian dual problem. Luckily we can solve the dual problem based on KKT condition using more efficient methods.
First of all, we need to briefly introduce Lagrangian duality and Karush-Kuhn-Tucker (KKT) condition.
Lagrangian Duality Principle
Primal Problem
A primal convex optimization problem has the following expression:
where
- The constraint that
are convex defines a convex region. - The constraint
are linear confines the region into the intersections of multiple hyperplanes (potential reduces the dimensionality.)
We can get the Lagrangian function:
Since
The difference between minimum and infimum:
means the smallest element in set ; means the largest value which is less than or equal to any element in . - In the case where the minimum value is reachable, infimum = minimum. e.g.
, then - In the case where the minimum is not reachable, infimum may still exist. e.g.
,
Dual Problem Based on the dual function we can get the dual optimization problem:
Strong Duality and Slater’s Condition
Let
Under certain circumstances, the duality gap can be 0, which means the strong duality holds. This condition is called Slater’s condition:
- Apart from the constraints in primal problem, Slater’s condition requires that the constraints
are linear (or affine). This guarantees that there must exist an , such that all strict inequality holds.
If Slater’s condition is satisfied, strong duality holds, and furthermore for the optimal value
Karush-Kuhn-Tucker (KKT) Conditions
KKT conditions contain four conditions:
- primal constraints
- dual constraints
- Stationarity compute the infimum of
w.r.t - Complementary Slackness
Therefore, if strong duality holds, we can first solve the dual problem and get the optimal
Apply Lagrangian Duality to SVM
Now we are able to solve the SVM optimization problem using Lagrangian duality. As introduced in the first post An Introduction to Support Vector Machines (SVM): Basics, the SVM optimization problem is:
The Lagrangian dual function is
To compute the Lagrangian dual function, we can compute the partial derivative of
Then we get
Substitute these two constraint equations into
Then the dual problem is:
We can solve this dual problem using Gradient descent algorithm or Sequential Minimal Optimization (SMO). This will be discussed in the next post.
Once we get the dual optimum
KKT conditions for SVM
Since the primal constraints
This looks interesting. From dual constraints we know that
Therefore, we find a way to identify support vectors using Lagrangian duality:
- Compute the dual optimum, if
, then is a support vector.
Let
Since we know for support vectors
or
In practice, in order to avoid influence of noise, we may use a more stable way to compute
Use SVM for Classification
Given a new point
This implies that we only need the support vectors to determine the separating hyperplane and classify new points. Furthermore, we notice that either in the dual problem or in the classification,
In the next post I will introduce how to solve the dual problem.