# Implementing simple translation and rotation gizmos

Fed up with entering numbers manually via the UI for each game object transform in my home-made game engine, I wanted to integrate a gizmo tool. Gizmos, at least in the context of game engine editors, are little 3d tools that you can use to manipulate objects in 3d space. While there are a few easy-ish-to-integrate transform gizmo libraries on github, I was not completely happy with them. The interface of tinygizmo is nice, but it intersects camera rays with the triangle meshes of the primitive. This means that the graphical representation of the gizmo is tied to the geometric representation which is used to test for mouse positions. ImGuizmo takes an easy-to-integrate approach by integrating via ImGui, but exposes the transform construct as a matrix.

In this post, I’ll detail the approach that I took in writing my own gizmo tool: a basic overview of how my tool works, and how to detect when the mouse is hovering over, or near, a gizmo. The code samples use my own IO and math abstractions, but they should be pretty self-explanatory.

## The approach

The C++ interface to the gizmo tool is very simple.

enum class GizmoMode
{
none,
translate,
rotate,
scale
};

// The gizmo tool is active if the tool is selected (the mouse is pressed and near the tool).
// Use this to prevent e.g. the camera from moving when dragging the gizmo.
bool gizmo_is_active();

void gizmo_manipulate(GizmoMode mode, Transform& transform, const Mat4f& camera_matrix);


You pass gizmo_manipulate the mode, the Transform struct, and your camera matrix (the combined view and projection matrix) every frame. My transform struct looks like struct Transform { Vec3f position; Quatf rotation; Vec3f scale; };

The gizmo is an immediate-mode tool, so if gizmo_manipulate doesn’t get called, then no manipulation happens and no tool is rendered.

Every frame, we first need to generate a camera ray from the current cursor position. The ray can be constructed using the camera matrix. The pixel coordinates are first converted to the $$[-1, 1]$$ range in both x and y, and then the ray is taken from normalized device coordinates to world-space using the inverse of the camera matrix:

// gives mouse pixel coordinates in the [-1, 1] range
Vec2f n = platform().mouse.normalized_coordinates();

Vec4f ray_start, ray_end;
Mat4f view_proj_inverse = inverse(camera_matrix);

ray_start = view_proj_inverse * Vec4f(n.x, n.y, 0.f, 1.f);
ray_start *= 1.f / ray_start.w;

ray_end = view_proj_inverse * Vec4f(n.x, n.y, 1.f, 1.f);
ray_end *= 1.f / ray_end.w;

context.camera_ray.origin = (Vec3f&)ray_start;
context.camera_ray.direction = (Vec3f&)normalize(ray_end - ray_start);

context.camera_ray.t = FLT_MAX;


The context struct is just a global containing stuff used accross functions. Icky I know, but it gets the job done for a small collection of variables.

Now that the camera ray has been obtained, the distance from the mouse to the gizmo can be computed. The update function proceeds as follows.

• Compute the smallest distance between the ray and the gizmo, as well as the point on the gizmo which corresponds to the smallest distance. The current and previous such points are stored in the context variable.
• If the distance to the gizmo is below some threshold value, and the mouse button was pressed, then that gizmo becomes active.
• If a gizmo is active, use the gizmo’s nearest point to the camera ray to update the transform.
• If the mouse is released, then the gizmo is deactivated.

In this way, a gizmo becomes active even when the mouse is not directly over it, making the tool a bit easier to work with.

Let’s take a look at how to compute the distance to a translation or rotation gizmo.

### Translation

The translation gizmo is visually a set of three coordinate axes. The following functions is used to test how far away the camera ray is from each individual axis.

float closest_distance_between_lines(Rayf& l1, Rayf& l2)
{
const Vec3f dp = l2.origin - l1.origin;
const float v12 = dot(l1.direction, l1.direction);
const float v22 = dot(l2.direction, l2.direction);
const float v1v2 = dot(l1.direction, l2.direction);

const float det = v1v2 * v1v2 - v12 * v22;

if (std::abs(det) > FLT_MIN)
{
const float inv_det = 1.f / det;

const float dpv1 = dot(dp, l1.direction);
const float dpv2 = dot(dp, l2.direction);

l1.t = inv_det * (v22 * dpv1 - v1v2 * dpv2);
l2.t = inv_det * (v1v2 * dpv1 - v12 * dpv2);

return norm(dp + l2.direction * l2.t - l1.direction * l1.t);
}
else
{
const Vec3f a = cross(dp, l1.direction);
return std::sqrt(dot(a, a) / v12);
}
}


If you want to know where the math came from, scroll further down to Shortest distance between two lines section.

The function returns the distance, and computes the parameter t for both rays so that the points corresponding to the smallest distance can be coputed.

So, given the translation gizmo, how is the position component of the user’s transform manipulated? Easy, the translation gizmo gives us this frame’s and last frame’s 3d point on the gizmo axis in world space coordinates! All we have to do is

Vec3f delta = context.current_intersect - context.previous_intersect;
user_transform.position += delta;


### Shortest distance between two lines

Given a line $$\mathcal{L}=\mathbf{p}+t\mathbf{v}$$, where $$\mathbf{p}$$ is a point on the line, $$\mathbf{v}$$ is the direction of the line, and t is the scaling factor, the squared distance function between line 1 and line 2 is given by

$$d^2=(\mathcal{L}_2-\mathcal{L}_1)^2=(\mathbf{p}_2+t_2\mathbf{v}_2-\mathbf{p}_1-t_1\mathbf{v}_1)^2$$

The function will have a minimum at the shortest distance; in other words, the derivative of this function w.r.t. t will be zero at the minimum. We can use this to solve for $$t_1$$ and $$t_2$$.

Computing the derivates of the squared distance function w.r.t. both $$t_1$$ and $$t_2$$ gives us a system of equations.

$$\begin{cases} \frac{\partial d^2}{\partial t_1}=2\left(\mathbf{v}_1 \cdot (\mathbf{p}_1 - \mathbf{p}_2) - t_2 \mathbf{v}_1 \cdot \mathbf{v}_2 + t_1 v_1^2 \right) = 0 \\ \frac{\partial d^2}{\partial t_2}=2\left(\mathbf{v}_2 \cdot (\mathbf{p}_2 - \mathbf{p}_1) - t_1 \mathbf{v}_1 \cdot \mathbf{v}_2 + t_2 v_2^2\right) = 0 \end{cases}$$

The system of equations can be expressed in matrix form.

$$\begin{bmatrix} v_1^2 & - \mathbf{v}_1 \cdot \mathbf{v}_2 \\ - \mathbf{v}_1 \cdot \mathbf{v}_2 & v_2^2 \end{bmatrix} \begin{bmatrix} t_1 \\ t_2 \end{bmatrix} = \begin{bmatrix} \mathbf{v}_1 \cdot (\mathbf{p}_2 - \mathbf{p}_1) \\ \mathbf{v}_2 \cdot (\mathbf{p}_1 - \mathbf{p}_2) \end{bmatrix}$$

Solving the matrix equation is easy, because the inverse of 2x2 matrix can be looked up and yields us the following equation.

$$\begin{bmatrix} t_1 \\ t_2 \end{bmatrix} = \frac{1}{v_1^2 v_2^2 - (\mathbf{v}_1 \cdot \mathbf{v}_2)^2} \begin{bmatrix} v_2^2 & \mathbf{v}_1 \cdot \mathbf{v}_2 \\ \mathbf{v}_1 \cdot \mathbf{v}_2 & v_1^2 \end{bmatrix} \begin{bmatrix} \mathbf{v}_1 \cdot (\mathbf{p}_2 - \mathbf{p}_1) \\ \mathbf{v}_2 \cdot (\mathbf{p}_1 - \mathbf{p}_2) \end{bmatrix}$$

This equation is the one used in the function closest_distance_between_lines.

When the two lines are parallel, the determinant $$v_1^2 v_2^2 - (\mathbf{v}_1 \cdot \mathbf{v}_2)^2$$ is zero. In that case, we just compute the perpendicular distance between points $$\mathbf{p}_1$$ and $$\mathbf{p}_2$$.

### Rotation

The rotation gizmo is visually a set of three circles, each one centered around an axis of rotation. The following function can be used to get the point on the circle nearest to the camera ray.

float closest_distance_line_circle(const Ray& line, const Circle& c, Vec3f& point)
{
plane f = make_plane(c.orientation, c.center);
ray r = line;

if (intersect_ray_plane(f, r))
{
// get the ray's intersection point on the plane which
// contains the circle
const Vec3f on_plane = r.origin + r.t * r.direction;
// project that point on to the circle's circumference
point = c.center + c.radius * normalize(on_plane - c.center);
return norm(on_plane - point);
}
else
{
// the required point on the circle is the one closest to the camera origin
point = c.radius * normalize(reject(context.camera_ray.origin - c.center, c.orientation));

return distance_between_point_ray(point, context.camera_ray);
}
}


In order to compute the rotation during the frame, the current and previous points on the circle are again used. By subtracting the circle’s center from them, two direction vectors are obtained. The angle between the direction vectors represents the rotation made during the frame.

The angle between the two direction vectors is always positive when computing via the dot product. The sign should change if the previous and current points were to change position, or the gizmo would rotate in the same direction no matter which direction the user is dragging the mouse. The sign can be obtained by calculating

$$sign = \frac{\left(\mathbf{p}_p \times \mathbf{p}_c\right) \cdot \hat{\mathbf{n}}}{||\left(\mathbf{p}_p \times \mathbf{p}_c\right) \cdot \hat{\mathbf{n}}||}\,,$$

using the vectors identified in the above diagram.

### Rendering the gizmos

The gizmos are currently rendered by simply drawing lines via OpenGL. I used my engine’s debug draw API for that, since it has functions for drawing primitives such as lines and circles in immediate mode. The rendering code for e.g. the translation gizmo ends up being

void draw_translation_gizmo(const Transform& transform)
{
for (int i = axis_x; i < axis_count; i++)
{
Vec3f axis_end = Vec3f(0.f, 0.f, 0.f);
axis_end[i] = 1.f;

vec3f axis_color = Vec3f(0.f, 0.f, 0.f);
axis_color[i] = 1.f;

if (i == context.selected_axis)
{
axis_color = Vec3f(1.f, 0.65f, 0.f);
}