Open Access Paper
17 October 2022 2D-3D motion registration of rigid objects within a soft tissue structure
Nargiza Djurabekova, Andrew Goldberg, David Hawkes, Guy Long, Felix Lucka, Marta M. Betcke
Author Affiliations +
Proceedings Volume 12304, 7th International Conference on Image Formation in X-Ray Computed Tomography; 123042C (2022) https://doi.org/10.1117/12.2647250
Event: Seventh International Conference on Image Formation in X-Ray Computed Tomography (ICIFXCT 2022), 2022, Baltimore, United States
Abstract
The study of rigid body dynamics in soft tissue is an essential part of orthopaedic imaging. Our focus is the foot and ankle structure, which consists of 28 bones surrounded by a variety of soft tissue such as 30 muscles, numerous ligaments, bursae and nerves. The importance of understanding the involved dynamics is evident by the frequent need to find functional joint replacements to fight diseases such as hindfoot arthritis. We study a simplified problem by constructing a phantom with two ”bones” submerged in silicone and a remotely controlled LEGO robot that moves one of the two bones. We perform motion registration by manipulating the bone segmented from an initial static scan and matching its digitally reconstructed radiographs to a sequence of scanned X-ray projections. The registration of the 3D volume to 2D projections requires the right combination of optimization and interpolation to achieve an optimal result. We test four different approaches with the help of the Flexible Algorithms for Image Registration (FAIR) toolbox.

1.

INTRODUCTION

This project covers the approach to dynamic imaging of the foot and ankle through the lens of image registration, in particular volume to projection registration. We explore the parametric, bone-by-bone approach as we assume that the movement of the individual bones can be modelled through rigid transformation. Each bone is segmented out of the initial static scan and the volume containing one of the bones is matched to a sequence of fluoroscopic projections. We use the term fluoroscopy here to mean an ”X-ray movie”, i.e. photons shooting in quick succession to track the movement of the scanned object while the source, detector and the sample stage remain stationary. The registration is performed with the help of the Flexible Algorithms for Image Registration (FAIR) package1 for MATLAB and to test this approach we use the phantom described in section 1.1.

1.1

Data acquisition

The data for this project was acquired at the CWI research institute in Amsterdam, using the state-of-the-art X-ray scanner called FleXray. During the acquisition, the FleXray is completely sealed and can only scan small (up to 10 cm tall) objects, so we constructed a remotely controlled base and a simple phantom consisting of two ”bones” made from gypsum plaster submerged in some silicone. Silicone was chosen to reproduce the effect of cartilage and other soft tissue between and surrounding the bones.

The data was collected at 78 kV with maximum power due to the high attenuation of gypsum and silicone. To remove lower energy photons, layers of thin copper plates were attached to the source as can be seen in fig. 1. Static frame-by-frame data of 40 individual positions was acquired as well as data from continuously shooting X-ray beams (i.e. fluoroscopic data) from a few selected angles.

Figure 1.

The FleXray scanner set-up with the CWI phantom.

00085_PSISDG12304_123042C_page_2_1.jpg

1.2

Image registration

Image registration is an umbrella term for aligning two or more sets of images. It is often used in image processing for combining images to either extend the view, such as in panoramic images, or to enhance an existing view and reduce signal-to-noise ratio. The purpose of image registration is often to identify similar features in different images in order to acquire further information about the features. This has a very clear application in medical imaging, where scans are often performed multiple times during treatment and sometimes even using different devices (e.g. MR and X-ray systems). To find out the exact differences between the images, it is crucial to align the images correctly, i.e. perform registration.

To avoid the need for experts to identify correct features in order to perform landmark-based registration, we instead choose to rely on intensity based registration. Intensity based methods align images by comparing their intensity values and contrasts, and often require little to no preparation in terms of marker placement etc., although they can be quite time consuming during the registration process. An example of a fully intensity based method can be found in2 where individual vertebrae in a 3D CT volume were robustly registered to the corresponding vertebrae in a 2D X-ray image. The registration was performed by optimising for one of the six degrees of freedom of the 3D bone at a time. We propose a method in which we use the FAIR toolbox to optimise for all six transformation parameters simultaneously.

1.3

Practical background

Image registration can occur when both images are in the same domain. That is, both are in Ω ⊂ ℝd, where d is the common dimension, usually 2 or 3. So for some reference image uR: Ω → ℝ, we try to transform a template image uT: Ω → ℝ using the transformation ρ : ℝd → ℝd in a way that minimizes the following objective function:

00085_PSISDG12304_123042C_page_2_2.jpg

Here, dist is some distance measure function and 𝓡 is a regularizer, x ∈ Ω is the set of coordinates and w is the vector of parameters for the transformation ρ. In our case, the template is a 3D bone s segmented out of the whole image uT and the reference is a projection (or a set of projections) gR. Since the registration is to be performed between a 3D volume s and 2D measurement data gR, one must first generate digitally reconstructed radiographs (DRRs) from the 3D volume, and then perform the registration between the DRR(s) and given X-ray measurement data. Let s : Ω → ℝ be the template image and gR : Ω → ℝ be a reference image. Domain Ω is a set of one or more stacked X-ray projections from different angles, Ω ⊂ ℝd–1 x [0, 2π). Let also Ag be the X-ray cone-beam forward operator, where θ is the set of indices representing the present projection angles. Then for 2D3D registration, the eq. 1 becomes

00085_PSISDG12304_123042C_page_3_1.jpg

As we are working with rigid bodies, our parameter vector w is going to consist of 6 elements corresponding to 6 degrees of freedom, 3 for rotation w1,w2,w3 and 3 for translation w4,w5,w6.

2.

NUMERICAL REALIZATION AND RESULTS

Recall that the objective function we want to minimize is

00085_PSISDG12304_123042C_page_3_2.jpg

We set the distance function to be the squared 2 norm, also known as the sum of squared differences, which we refer to as 𝜓. So if the residual r = Aθ(s(ρ(w, x))) − gR(x), then the dist function is

00085_PSISDG12304_123042C_page_3_3.jpg

In the discrete setting, where the image is defined on a grid of cells, 𝜓(r) is multiplied by a scaling factor h which is defined as the cell width. This is particularly useful for the multiscale optimization, where different levels/scales correspond to different grid densities.

As a constraint on transformation parameters w, we use Tikhonov’s regularization, i.e. given a reference set of parameters wref,

00085_PSISDG12304_123042C_page_3_4.jpg

where M is a diagonal matrix of weights for the transformation parameters.

2.1

Pre-processing the data

For the two-dimensional reference images, one or two projection images are used for each of the 41 static scans (one initial position + 40 poses along a programmed motion path). The template image is obtained by reconstructing the static scan in initial position and segmenting it with K-means3 into 3 separate masks - air, silicone (as well as lego) and gypsum, corresponding to air, soft tissue and bone. Noise from the reconstruction uT is smoothed by a convolution with a Gaussian filter. The bone mask is split into two masks corresponding to the two shapes coded as the mtibia (top) and the mtalus (bottom). These bones are then segmented out of the reconstruction

00085_PSISDG12304_123042C_page_3_5.jpg

by means of element-wise multiplication (⊚) to obtain a background volume which can be projected with the forward operator Aθ into the data domain

00085_PSISDG12304_123042C_page_3_6.jpg

These background projections can then be removed from all following reference images to give the best chance to our intensity based registration algorithm,

00085_PSISDG12304_123042C_page_4_1.jpg

The masks, segmented bones and an example of a projected background for the static initial position can be found in fig. 2.

Figure 2.

Demonstration of data pre-processing. Left: A slice through the bone masks volume, top: mtibia, bottom: mtalus; Middle: A slice through the segmented bones volume sbones acquired by applying the bone masks to the full reconstruction of the initial pose; Right: A simulated projection of the ’’background”, gbg.

00085_PSISDG12304_123042C_page_4_3.jpg

As the bones move further away from the initial position, the background extracted from it becomes a worse approximation to the true background due to decreasing overlap of the occlusion in the initial position with the occlusion in the present frame. However, as this only affects a small part of the background volume, there is still a benefit to subtracting the background.

2.2

Results

We run the registration problem on all 40 frames + the first initial position using 4 different strategies. The registration itself is performed using Gauss-Newton method and the template image s1 is always the bone (in our case tibia, or stibia) in the initial position, while 00085_PSISDG12304_123042C_page_4_2.jpg is set to w0 = [0, 0, 0,0,0,0]T.

Setup 1.

For each consecutive frame to be registered (iteration k + 1), the GN method is warm started by setting wref to the parameters wk obtained from the registration of the previous frame k

00085_PSISDG12304_123042C_page_5_1.jpg

Meanwhile the template image s for a consecutive frame registration is updated based on the previous iteration’s output parameters, using rigid transformation Trigid and spline interpolation TSI, so

00085_PSISDG12304_123042C_page_5_2.jpg

To simplify notation, we combine the rigid transformation and the interpolation into one transformation operator T. The updated sk+1 is used as a template image for the following iteration. Results for setup 1 are visualized in fig. 4(a).

Setup 2.

In this setup, we obtain the template image sk (for k > 1) directly from s1 via rigid transformation with parameters wk, while the reference parameters wref for the optimization stay the same for all iterations. Then in every iteration we obtain global transformation parameters as opposed to setup 1, where the parameters are computed locally. For every iteration k = 1, 2,…, 40,

00085_PSISDG12304_123042C_page_5_3.jpg

Note that wref and s1 remain unchanged through all iterations. Results for setup 2 are visualized in fig. 4(b).

Setup 3.

As in setup 2, except that the registration is now performed using the coarse-to-fine multiscale approach for registration with the goal to register over larger distances. Results for setup 3 are visualized in fig. 4(c).

Setup 4.

In this setup we propose a scheme to aggregate the optimized parameters and use those along with the s1. As can be seen in fig. 3 this scheme aggregates incremental transformations w1,…,wk. In practice, this is performed stepwise using the nested_parameters function described in appendix A. This way, interpolation is applied only once to s1 to move it into position of frame k. The variables for this setup are updated as follows:

Figure 3.

Diagram of parameter aggregation via the nested_parameters function.

00085_PSISDG12304_123042C_page_4_4.jpg

For k = 0,

00085_PSISDG12304_123042C_page_5_4.jpg00085_PSISDG12304_123042C_page_5_5.jpg00085_PSISDG12304_123042C_page_5_6.jpg00085_PSISDG12304_123042C_page_5_7.jpg

For k = 1, 2, 3,…, 40

00085_PSISDG12304_123042C_page_5_8.jpg00085_PSISDG12304_123042C_page_5_9.jpg00085_PSISDG12304_123042C_page_5_10.jpg00085_PSISDG12304_123042C_page_5_11.jpg

The updated template s is used as initialization for the following iterations. Results for setup 4 are visualized in fig. 4(d).

Figure 4.

Demonstration of the four different setups. Registration performed with 1 projection at angle 45° (Left) and with 2 projections at angles 0° and 90° (Right). Rows (a) to (d) correspond to registration results for frame 20 for setups 1 to 4 respectively.

00085_PSISDG12304_123042C_page_6_1.jpg

3.

DISCUSSION

In this work we consider the 2D3D parametric image registration between fluoroscopic X-ray data and a segmented volume. The registration is performed with spline interpolation and Gauss-Newton optimization as implemented in the FAIR toolbox,1 adjusted for our 2D3D problem. We create an objective function eq. 2, which we optimize for rigid transformation parameters that would align the template bone with the reference projections. Gauss-Newton algorithm functions as the optimizer. The main difficulty arises with the initialization, so we test out four different setups.

The first setup involves initializing the optimization of a consecutive frame with a previous frame’s transformed template and optimal parameters. This leads to very blurry results due to what amounts to recursive interpolation (see fig. 4(a)). To avoid this, the second setup initializes the rigid transformation with all parameters w set to 0 to directly compute the transformation from the initial position to the currently processed timestep. However, as the bone moves further away from its initial position, the results become consistently worse. And then as the bone moves back, closer to the initial position, the algorithm is able to recover and realign the bones (see fig. 4(b)). To try and make up for the second setup’s inability to account for large displacements, we repeat its initialization in setup 3 but now the registration is performed with a coarse-to-fine grid approach. As can be seen from fig. 4(c), though, this multiscale approach does not lead to better results. This is likely due to the presence of background elements in the reference image(s). In particular, the presence of the other bone causes the template to be aligned with it, as can be most clearly seen in fig. 4(c) with 2 projections. Finally, for setup 4, we propose initializing the Gauss-Newton optimization with the previously optimised parameters and updating the template image from the initial position with aggregated parameters. These aggregated parameters come from nesting previous transformations together to achieve one single transformation (using nested_parameters function, appendix A). In fig. 4(d), we can see that this approach does not manage to perform correct registration with just one projection, since if it wrongly estimates in one frame, the mistake only gets amplified with each frame and the algorithm can no longer recover. The moving bone can even end up completely out of the visible region of interest in the final few frames. However, two orthogonally directed projections (at 0° and 90°) constrain the problem enough to eliminate the ambiguity between motion and scaling, leading to promising results depicted in fig. 4(d) on the right, with the transformed template bone tracking the true motion quite accurately.

ACKNOWLEDGEMENTS

The authors would like to thank NVIDIA Corporation for the GeForce Titan Xp GPU used to create the reconstructions presented here.

Appendices

APPENDIX A.

NESTED_PARAMETERS FUNCTION

The nested_parameters function is a function that aggregates previous registration parameters to move a rigid object from the initial position at t = 0 to all the positions in the following timeframes. The nesting or aggregation of parameters is derived as follows. Recall that rigid transformation ρ on x can be expressed as

00085_PSISDG12304_123042C_page_8_1.jpg

where 00085_PSISDG12304_123042C_page_8_2.jpg is the extended translation vector that contains the shift to the centre of the domain c, rotation Rc, and shift back, as well as the actual translation vector τ = [τ1, τ2, τ3]T. This extended translation vector τext = (I − R)c + τ corresponds to the last three elements of the rigid transformation parameters vector w. The first three parameters of w are the angles of rotation α, β and γ (with respect to the axes x, y and z), which are used to calculate the corresponding rotation matrix R. We want to be able to perform the transformation 3 recursively to compute the effective rotation Rnest and translation τnest of two timeframes together. Let R1 and τext refer to the rotation and translation performed on the first timeframe and R2 and 00085_PSISDG12304_123042C_page_8_3.jpg on the second one. Then the effective transformation ρ can be computed with the nested rotation matrix Rnest and nested translation vector τnest as follows

00085_PSISDG12304_123042C_page_8_4.jpg

Computing the translation parameters of τnest is straightforward

00085_PSISDG12304_123042C_page_8_5.jpg

while finding the rotation angles a, b and c of Rnest is a little more involved. We know that as a 3D rotation matrix, Rnest has the form

00085_PSISDG12304_123042C_page_8_6.jpg

To solve for the rotation parameters (a, b, and c), we consider the following equation

00085_PSISDG12304_123042C_page_8_7.jpg

To make the calculation simpler to follow, we shall replace cos α, cos β and cos γ by c(1), c(2) and c(3). For sin α, sin β, sin γ we have s(1), s(2), s(3) and analogously for the α’, β’, γ’ parameters, their sine and cosine functions turn into s’(1), s’(2), s’(3), and c’(1), c’(2), c’(3). This gives us

00085_PSISDG12304_123042C_page_8_8.jpg

where

00085_PSISDG12304_123042C_page_9_1.jpg00085_PSISDG12304_123042C_page_9_2.jpg00085_PSISDG12304_123042C_page_9_3.jpg00085_PSISDG12304_123042C_page_9_4.jpg00085_PSISDG12304_123042C_page_9_5.jpg00085_PSISDG12304_123042C_page_9_6.jpg00085_PSISDG12304_123042C_page_9_7.jpg00085_PSISDG12304_123042C_page_9_8.jpg00085_PSISDG12304_123042C_page_9_9.jpg

Then the rotation parameters a, b, c are:

00085_PSISDG12304_123042C_page_9_10.jpg00085_PSISDG12304_123042C_page_9_11.jpg00085_PSISDG12304_123042C_page_9_12.jpg

So the whole nested transformation vector wnest is then defined as

00085_PSISDG12304_123042C_page_9_13.jpg

and serves as the output to the nested_parameters(w, w’) function, where w is the vector of transformation parameters of the first timeframe and w’ of the second. The process can be repeated to compute the effective rotation and transformation for any number of steps, assuming wnest is a good estimate of all the previous timeframes’ transformations.

REFERENCES

[1] 

Modersitzki, J., “[FAIR: Flexible Algorithms for Image Registration],” SIAM, Philadelphia(2009). https://doi.org/10.1137/1.9780898718843 Google Scholar

[2] 

Penney, G., Batchelor, P., Hill, D., Hawkes, D., and Weese, J., “Validation of a two- to three-dimensional registration algorithm for aligning preoperative ct images and intraoperative fluroscopy images,” Medical physics, 28 1024 –32 (2001). https://doi.org/10.1118/1.1373400 Google Scholar

[3] 

Jin, X. and Han, J., “[K-Means Clustering],” 563 –564 Springer US, Boston, MA(2010). Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Nargiza Djurabekova, Andrew Goldberg, David Hawkes, Guy Long, Felix Lucka, and Marta M. Betcke "2D-3D motion registration of rigid objects within a soft tissue structure", Proc. SPIE 12304, 7th International Conference on Image Formation in X-Ray Computed Tomography, 123042C (17 October 2022); https://doi.org/10.1117/12.2647250
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Bone

Image registration

X-rays

Tissues

Image segmentation

Data acquisition

Silicon

Back to Top