The Hessian matrix appears in the optimization literature, but the intuition for how the Hessian and its inverse transform vectors is opaque to me. Let's review second order partial derivatives, and then try to build intuition for the Hessian matrix.

For the purpose of this intuition-building exercise, we'll work with functions $\Reals^2 \mapsto \Reals^1$. I'll also use partial derivative notations $\frac{\partial}{\partial y} f(x, y) = \frac{\partial f}{\partial y} = f_y$ interchangeably.

## 1. Partial Derivatives

Take the $\Reals^2 \mapsto \Reals^1$ function $f(x, y) = x^2 + 2y^2$.

A partial derivative is the change in an "output" variable (in our case, $f$) with respect to infinitesimal changes in an "input" variable (in our case, $x$ or $y$). For example, $\frac{\partial}{\partial y} f(x, y) = 4y$, which is to say, for any point in the domain, moving infinitsimally in the y direction changes f propotional to 4 times the y coordinate of the starting point point.

f(x, y) = x^2 + 2y^2

x = 6

xlim=[-10, x]
ylim=[-10, 10]

xs = LinRange(xlim..., 101)
ys = LinRange(ylim..., 101)
zs = [f(x, y) for x in xs, y in ys]

y = 4
dy = 4
f_y(y) = 4y

f_y (generic function with 1 method)

We can plot the function $f_y$ for every starting point:

We can do the exact same exercise with $f_x$:

f_x(x) = 2x

f_x (generic function with 1 method)
traces = GenericTrace[]

push!(traces, PlotlyJS.surface(x=xs, y=ys, z=zs,
showscale=false, opacity=0.8))
push!(traces, PlotlyJS.surface(x=ylim, y=ylim, z=[[0, 0] [0, 0]],
showscale=false, colorscale="Greys", opacity=0.3))
push!(traces, PlotlyJS.surface(x=xs, y=ys, z=[f_x(x) for x in xs, y in ys],
showscale=false))

plot(traces, Layout(scene=scene))


So the way the second order partial derivative is defined is as a composition, e.g. $f_{xx} = \frac{\partial}{\partial x} \left( \frac{\partial}{\partial x}\left( f(x, y) \right) \right)$. As second derivatives do, it captures the [change in the [change in the [output variable]]] with respect to infinitesimal changes in the input variable. This notion coincides with the curvature of the function: a positive second derivative at a particular point indicates that the output variable is concave up at a that point, and a negative second derivative indicates the output variable is concave down at that point.

In the case of the function we've chosen, $f_{xx} = 2$ and $f_{yy} = 4$ which informs us that $f$ is concave up everywhere in the domain, which makes sense from looking at the plot.

However, we've omitted the "mixed" partial derivatives here: $f_{xy}$ and $f_{yx}$. We can compute them to both be zero for this particular function. What does that tell us?