Why is the Weight Vector orthgonal to the Separating Hyperplane?

February 22, 2024 / 4 min read

Last Updated: February 24, 2024

What is a Hyperplane?

In a -dimensional space, a hyperplane is a flat affine subspace of dimension . The keyword here is affine, for reasons we will discuss soon. For now, just take it to mean that we can move the origins around.

We are all familar with a hyperplane in 2D, which is just a line. In mathematical lingo, we call it a flat one-dimensional subspace. In 3D, a hyperplane is a flat two-dimensional subspace, which is just a plane.

Mathematically, a hyperplane is defined by the equation:

where are the coefficients of the hyperplane and are the features. When we say "defines", we mean that any point that satisfies this equation lies on the hyperplane. A simple 2D example: if we have a line , we can rewrite it as . Any point that satisfies this equation lies on the line.

Let's visualize a hyperplane in 3D space with random s and s. The red line is the weight vector and the green line is the feature vector ::

Now interestingly, the weight vector of the hyperplane is orthogonal to the hyperplane. I have seen numerous discussions on why this is the case, but I don't think they are quite doing a rigorous job/as intuitive as they could be.

It's Affine, not Linear

So why is the weight vector orthogonal to the hyperplane? Note we can generally write the equation of a hyperplane as:

Where is the weight vector, and is the feature vector.

The interesting part here is , the bias term. This seemingly innocuous term is what makes the hyperplane affine, and not linear.

To see what a rigorous definition of hyperplane is needed here, let's first follow Stack Exchange's answers. Suppose we have two points defined on the hyperplane, and , which means:

Subtracting these two equations gives us . But note that does not lie on the hyperplane as the answers suggested! To see why, let's visualize this on our cloud of points again, but this time with a new blue line for , and also the purple line :

Noticed anything strange? does not lie on the hyperplane! The reason for that is for vectors and , is in the same vector space, but not the same affine space.

Let's reconsider the problem with rigorous setups. We have vectors in vector space . and are linear transformation defined on . The moment we include , we have the affine transformation defined on the affine space . And in affine space, we don't consider the origin as the head of .

Final Proof

Now by the dot product definition of vectors, with a new vector :

But notice that in defining our hyperplane , so . If , we know that and are always positive, so the only way for the dot product to be zero is if , which means . Note if , note it does not mean that --- it is a separate term that shifts the hyperplane. The orthogonality of the weight vector and the hyperplane is a result of the definition of the hyperplane, not the bias term.

Have a wonderful day.

– Frank

2024-02-22T08:00:00.000+01:00

Visualizing separability in high-dimensional space