1. Introduction
The global navigation satellite system (GNSS) is widely applied in autonomous driving and vehicle navigation. However, the performance of GNSS degrades in challenging environments such as dense urban areas, tunnels, or underground parking lots due to GNSS signal blockage and reflection (Groves and Jiang, Reference Groves and Jiang2013). Dead reckoning (DR) systems such as inertial navigation system (INS) or stereo visual odometry (VO) are widely used as systems complementary to GNSS in GNSS-denied environments (Liu, Reference Liu2018; F. Liu et al., Reference Liu, Balazadegan and Gao2018a, Reference Liu, Balazadegan Sarvrood and Gao2018b). However, DR systems accumulate errors fast without external information. To reduce the accumulation of errors in DR systems, different sensors are integrated to reduce the rapidity of drift and increase system reliability.
The inertial measurement unit (IMU) is widely used in multi-sensor integration applications as the core sensor since it works in most circumstances (Hartley and Zisserman, Reference Hartley and Zisserman2001; Balazadegan Sarvrood, Reference Balazadegan Sarvrood2016; F. Liu et al., Reference Liu, Balazadegan and Gao2018a, Reference Liu, Balazadegan Sarvrood and Gao2018b). However, the main drawback of IMU is quick error accumulation without external information update. The results obtained from a low-cost micro-electrotechnical system (MEMS) IMU manifest drift in a short time (Jekeli, Reference Jekeli2012; Noureldin et al., Reference Noureldin, Karamat and Georgy2012). In theory, any self-dependent system can be deployed as the core component in an integrated navigation system. Stereo VO is a self-dependent DR system that is increasingly being utilised as the core sensor in multi-sensor fusion. For VO, the monocular camera is an orientation device only since the translation is estimated up to a scale ambiguity (Zhang et al., Reference Zhang, Kaess and Singh2014). Adding another camera means that depth information is achievable by the triangulation of stereo cameras (Scaramuzza and Fraundorfer, Reference Scaramuzza and Fraundorfer2011). With VO, drift is inevitable because of error accumulation. To reduce the VO drift, feature matching or tracking is of significance in pose estimation. Random sample consensus (RANSAC) is applied in some studies to filter out incorrect features in estimation (Scaramuzza, Reference Scaramuzza2011; Geiger et al., Reference Geiger, Lenz, Stiller and Urtasun2013; Liu et al., Reference Liu, Gu, Li and Zhang2017). Key-frame-based bundle adjustment optimisation (Strasdat et al., Reference Strasdat, Montiel and Davison2012; Campos et al., Reference Campos, Elvira, Rodríguez, Montiel and Tardós2020) and loop detection and relocalisation (Gálvez-López and Tardos, Reference Gálvez-López and Tardos2012; Campos et al., Reference Campos, Elvira, Rodríguez, Montiel and Tardós2020) are also applied to improve the VO performance. By application of effective feature detection and matching, key-frame based optimisation, loop detection, etc., the accuracy and robustness of VO has been greatly improved (Cvišić and Petrović, Reference Cvišić and Petrović2015; Forster et al., Reference Forster, Zhang, Gassner, Werlberger and Scaramuzza2016; Bloesch et al., Reference Bloesch, Burri, Omari, Hutter and Siegwart2017; Wang et al., Reference Wang, Schworer and Cremers2017). In addition, an integrated system with other types of sensors (e.g., IMU) is another common way to provide external information to limit the drift (Bloesch et al., Reference Bloesch, Omari, Hutter and Siegwart2015; Leutenegger et al., Reference Leutenegger2015; Qin et al., Reference Qin, Li and Shen2018; Von Stumberg et al., Reference Von Stumberg, Usenko and Cremers2018; Qin et al., Reference Qin, Pan, Cao and Shen2019). In the past, little work has been devoted to stereo VO as the core sensor in integrated navigation. This paper presents the mechanisation equations of stereo VO and the corresponding linearised error equations which can be applied as the core component in integration with other types of measurements. For land vehicle applications, velocity constraint is an effective way of reducing the rapidity of drift based on the fact that the land vehicle only has forward velocity in most cases. To mitigate the error accumulation in land vehicle applications, velocity constraints in the right and down directions can be applied as additional measurements to reduce the stereo VO drift (Shin, Reference Shin2001; Godha and Cannon, Reference Godha and Cannon2007; Noureldin et al., Reference Noureldin, Karamat and Georgy2012; Y. Liu et al., Reference Liu, Liu, Gao and Zhao2018). Since the estimated state vector elements are correlated, the velocity constraints can correct the velocity estimation together with other unknowns. In this way, without adding any external sensors, the performance can be further improved for land vehicle applications. If external measurements are available, the proposed method can also be applied to integrate stereo VO with other systems.
The rest of this paper is organised as follows. In Section 2, the fundamentals of stereo VO are briefly reviewed. The method with stereo VO as the core component together withthe NHC application in the extended Kalman filter (EKF) is illustrated in Section 3. Section 4 presents the test results of the proposed method using the KITTI datasets and the discussion and conclusions are in the final section.
2. Fundamentals of stereo VO
The principle of VO is to estimate the six degrees of freedom camera pose from consecutive images by solving perspective-n-point (PnP) (Fischler and Bolles, Reference Fischler and Bolles1981). To estimate the unknown translation and rotation between the camera frames in two consecutive epochs, continuously tracked or matched image features together with their three-dimensional (3D) coordinates are applied in the resectioning (Nistér et al., Reference Nistér, Naroditsky and Bergen2004; Scaramuzza and Fraundorfer, Reference Scaramuzza and Fraundorfer2011). Tracking or matching methods such as Kanade–Lucas–Tomasi (KLT) tracking, SIFT, FAST, SURF, etc. are used to find common features in consecutive images (Tomasi and Kanade, Reference Tomasi and Kanade1992; Shi and Tomasi, Reference Shi and Tomasi1993; Bay et al., Reference Bay, Tuytelaars and Van Gool2006; Rosten and Drummond, Reference Rosten, Drummond, Leonardis, Bischof and Pinz2006). Several PnP methods such as P3P, EPnP and direct least squares are widely used for VO resectioning (Gao et al., Reference Gao, Hou, Tang and Cheng2003; Lepetit et al., Reference Lepetit, Moreno-Noguer and Fua2009; Hesch and Roumeliotis, Reference Hesch and Roumeliotis2011).
In this paper, 3D feature coordinates are determined by the triangulation of stereo cameras. A KLT tracker is used to track the common features continuously in consecutive frames (Shi and Tomasi, Reference Shi and Tomasi1993). The camera pose estimation is based on the least squares method proposed by Menze and Geiger (Reference Menze and Geiger2015). The general procedures can be summarised as follows. With the feature tracker, the common features on stereo images in two consecutive epochs can be found. The translation and rotation unknowns between two consecutive epochs can be calculated by minimising the difference between the reprojected and tracked image coordinates of the features on the current stereo images in the least squares estimation.
The coordinate systems used in this paper are defined as follows. The origin of the camera frame is defined at the perspective centre of the left camera with the X-axis pointing to the right, the Y-axis pointing downward and the Z-axis completing the right-handed coordinate system. The origin of the image coordinate system is the left-top corner with X pointing to the right and Y pointing to the down direction. The coordinate conversion from the previous camera frame to the current camera frame can be given as (Menze and Geiger, Reference Menze and Geiger2015)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn1.png?pub-status=live)
where Cc and Cp are the feature coordinates in the current and previous camera frames; $R^c_p$ is the rotation matrix from the previous camera frame (p) to the current camera frame (c); γ, β, α are the Euler angles rotating with the Z-axis in the previous camera frame Zp, Y-axis after the first rotation Y’ and the X-axis after the second rotation X’’ respectively; $T^c_{cp}$
= [tx, ty, tz]T is the translation expressed in the current camera frame.
To reproject the feature coordinate Cc (xc, yc, zc) in the current camera frame on the image (u, v), the following equation is applied (Scaramuzza and Fraundorfer, Reference Scaramuzza and Fraundorfer2011)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn2.png?pub-status=live)
where (u, v) is the feature image coordinate; (xc, yc, zc) is the 3D feature coordinate in the current camera frame; (u0, v0) is the principal point image coordinate; the focal length of image X- and Y-axes are represented as fx and fy.
To estimate the unknowns [α, β, γ, tx, ty, tz]T, initial values are set and the corrections to the initial values are estimated using the least squares equations. For six unknowns, at least three features are required to form six equations (Lepetit et al., Reference Lepetit, Moreno-Noguer and Fua2009). The RANSAC algorithm is applied to remove the outliers (Nistér, Reference Nistér2005). Specifically, three features are randomly selected to estimate the rotation and translation 50 times. In each estimation, the number of inliers is recorded with a certain threshold. The maximum number of inliers is applied again to finally achieve the estimation, given as (Menze and Geiger, Reference Menze and Geiger2015)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn3.png?pub-status=live)
with estimated covariance matrix as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn4.png?pub-status=live)
In Equation (3), dX is the estimated corrections to the initial values of the unknowns [α, β, γ, tx, ty, tz]T; P is the weight matrix; L is the misclosure of the observed feature pixel coordinates minus the calculated ones achieved by Equations (1) and (2); H is the design matrix. In Equation (4), R is the covariance matrix of dX in Equation (3); P and H have the same meaning as in Equation (3). In normal cases, all the elements in R are expected to be non-zero, which means that the unknowns (translation and rotation) are correlated with each other.
3. Stereo VO as a core component
So far, the procedures for solving the translation and rotation of stereo VO between consecutive epochs have been introduced. By accumulating the rotation and translation, the camera pose at any epoch can be achieved. To integrate stereo VO with other types of measurements as a core component, the stereo VO error dynamic equations have to be derived to be applied in the Kalman filter. Therefore, another way to apply the stereo VO is to form the stereo VO mechanisation using the results from the previous section. In this section, the stereo VO mechanisation, error linearisation, Kalman filter integration and ground vehicle velocity constraint are illustrated.
3.1 Stereo VO mechanisation
To apply the stereo VO as a core component in integrated navigation, the mechanisation of stereo VO can be given as Equation (5), based on the results achieved by Equations (3) and (4). In this paper, the mechanisation of stereo VO is conducted in ECEF (Earth-centred, Earth-fixed), given as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn5.png?pub-status=live)
where the left side of the equation is the time derivatives of the re and $R^e_c$; re is the position in ECEF; $R^e_c$
is the rotation matrix from camera frame to ECEF; $\Omega^c_{ec}$
is the skew-symmetric matrix of the angular rate of camera frame with respect to ECEF expressed in the camera frame $\omega^c_{ec}$
; vc is the velocity in the camera frame.
Note that the initial position, velocity and attitude are assumed to be known. Therefore, to apply Equation (5) for stereo VO navigation solutions, vc and $\Omega^c_{ec}$ need to be calculated, which can be given as follows. From Equations (1) to (3), the translation in the camera frame can be achieved. The velocity is the translation divided by time interval, given as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn6.png?pub-status=live)
where $T^c_{cp}$ = [tx, ty, tz]T is the translation obtained from Equation (3); Δt is the time interval between two consecutive epochs.
For the angular rate $\omega^c_{ec}$, it can be calculated based on the Euler angles and their corresponding axis. As mentioned in Equation (1), the rotation can be decomposed as Euler angles −γ, −β, and −α rotating Zp, Y′ and X″ axes. Therefore, $\omega^c_{ec}$
can be given as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn7.png?pub-status=live)
Based on the rotation orders, the unit vectors of $\overrightarrow {Y^{\prime}}$., $\overrightarrow {X^{\prime\prime}}$
can be derived as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn8.png?pub-status=live)
where γ, β, α are the Euler angles explained in Equation (1), namely Euler angles rotating Zp, Y′ and X″ respectively. With $\overrightarrow {Y^{\prime}}$, $\overrightarrow {X^{\prime\prime}}$
obtained from Equation (8), Equation (7) can be rewritten as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn9.png?pub-status=live)
With Equations (5) to (9), the stereo VO can be updated based on velocity and angular rate.
3.2 Stereo VO error linearisation
Given the mechanisation process presented in Section 3.1, the errors of stereo VO mechanisation can be modelled by linearisation of Equation (5). The purpose of modelling the errors is to integrate different types of data. In this paper, the errors of position, velocity, attitude and angular velocity are estimated in the Kalman filter. The state vector is given as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn10.png?pub-status=live)
where δre is the position error in ECEF; δre is the velocity error in the camera frame; ε is the attitude error; $\delta\omega^c_{ec}$ is the angular rate error of the camera frame with respect to ECEF expressed in the camera frame.
The EKF procedure after linearisation can be summarised as (Welch and Bishop, Reference Welch and Bishop1995)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn11.png?pub-status=live)
In Equation (11), $X^-_{k+1}$ is the predicted state estimate with covariance matrix $P^-_{k+1}$
. $X^-_{k+1}$
is achieved based on the transition matrix Φk+1,k and the previously estimated state vector Xk with process noise wk. The predicted covariance matrix $P^-_{k+1}$
is achieved by uncertainty propagation of the previously estimated covariance matrix Pk and the process noise covariance matrix Qk. The measurement update is represented as Zk+1 with covariance Rk+1 and the corresponding design matrix is represented as Hk+1. Based on the optimal gain matrix Kk+1, the state vector and its covariance matrix can be updated. Note that in Equation (11) P has a different meaning to that in Equation (4).
Since the errors are estimated in the Kalman filter, Equation (5) needs to be linearised to achieve the dynamic information of the system errors, given as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn12.png?pub-status=live)
where δ represents the error of each item; the dot on the left side of the equation represents the time derivative of each item; ε is the attitude error; Ψ is the skew-symmetric matrix of the attitude error ε; $R_{c}^{e}v^{c}$ achieves the velocity in ECEF ve; Ve is the skew-symmetric matrix of ve; the velocity error δvc, and angular velocity error $\delta\omega^c_{ec}$
are modelled as random walk, wv and wω are the driven noise of δvc and $\delta\omega^c_{ec}$
respectively.
Based on Equation (12), the dynamic matrix of stereo VO can be formed as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn13.png?pub-status=live)
The transition matrix Φk+1,k in Equation (11) can be simplified as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn14.png?pub-status=live)
Equation (13) shows the velocity error is not coupled with the angular rate error in the mechanisation. However, this is not the case for stereo VO. The velocity and angular rate should be correlated since the translation and rotation are correlated, based on Equations (3), (4), (6) and (9). To take full advantage of the information from PnP, the velocity and angular rate derived from Equations (6) and (9) are used to form the measurement update in the Kalman filter, which are the same as those used in mechanisation in Equation (5). Therefore, the measurement update would be Zk+1 = 0, which means that the measurement update does not have an impact on the Kalman filter estimation. However, the velocity and angular rate become coupled with each other due to the contribution of the measurement covariance. Specifically, the covariance matrix of the measurement update in Equation (11) can be written as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn15.png?pub-status=live)
where RPnP is the PnP covariance matrix achieved from Equation (4); A is the design matrix based on Equation (6) and linearisation of Equation (9); B is the linearisation of Equation (9).
For the process noise Q in Equation (11), the velocity and angular rate related process noise are set as large values to fully use the information from PnP. The design matrix in Equation (11) can be given as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn16.png?pub-status=live)
3.3 Ground vehicle velocity constraint
With the equations in Section 3.2, the stereo VO can be integrated with other types of measurements. When the stereo VO is applied on a ground vehicle, the velocity constraint can be applied to reduce the drift if no additional sensors are available. In most cases, there is only the forward velocity for a ground vehicle when it is not manoeuvring. Therefore, the rightward and downward velocity can be applied as constraints in land vehicle applications (Shin, Reference Shin2001). The velocity constraints can correct the velocity, position and angular rate since they are coupled with each other. Furthermore, the attitude can be corrected as well since it is correlated with the angular rate. In this way, the performance of stereo VO can be improved.
The camera frame has to be aligned with the land vehicle frame with the X-axis of the camera frame pointing to the right of the ground vehicle and the Y-axis of the camera frame pointing downward of the ground vehicle. When the vehicle is not manoeuvring, the velocity constraint as measurement update can be given as (Shin, Reference Shin2001; Godha and Cannon, Reference Godha and Cannon2007; Noureldin et al., Reference Noureldin, Karamat and Georgy2012; Y. Liu et al., Reference Liu, Liu, Gao and Zhao2018)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn17.png?pub-status=live)
where ZNHC is the velocity constraint, $v^c_{x}$ and $v^c_{y}$
are the x and y components of camera velocity; X is the state vector in Equation (10).
For ground vehicle navigation, horizontal positioning accuracy is much more important than height accuracy. The velocity in the X-axis and angular rate in Y-axis in the camera frame are coupled with each other and they have a greater impact on the horizontal positioning accuracy, compared with the velocity in the Y-axis. Therefore, Equation (17) can be further simplified as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_eqn18.png?pub-status=live)
In this work, the angular rate of the Y-axis is applied to determine when to apply the constraint. If the magnitude of the angular rate is larger than the threshold, the constraint is not applied due to possible manoeuvres.
4. Field tests and results
Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) datasets (Geiger et al., Reference Geiger, Lenz, Stiller and Urtasun2013) are used to verify the proposed stereo VO with NHC for ground vehicle applications. Shown in Figure 1 is the KITTI test setup.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_fig1.png?pub-status=live)
Figure 1. KITTI data collection system setup (Geiger et al., Reference Geiger, Lenz, Stiller and Urtasun2013)
In Figure 1, there are two grey (Cam 0 and Cam 1) and two colour cameras (Cam 2 and Cam 3) installed on the land vehicle roof. In this paper, only the grey stereo images collected by the grey cameras (Point Grey Flea 2 FL2-14S3M-C) are used. FL2-14S3M-C is a CCD (charge-coupled device) camera with a global shutter supporting 15 FPS (frames per second). The sensor format, pixel size and resolution are 1/2″, 4⋅65 μm, and 1384 by 1032, respectively. The FPS used during the test is 10. The stereo images applied are calibrated, rectified and synchronised, which are all done by KITTI. The laser scanner shown in Figure 1 is not used in this work. The GNSS/IMU system OXTS RT 3003 is applied as the reference with an accuracy of 0⋅4 m CEP (circular error probable).
Three relatively long KITTI datasets collected in residential areas are applied in this work. To verify the improvement of the proposed method, the stereo VO with NHC results are compared with stand-alone stereo VO results. The reference used is the DGPS (differential GPS)/IMU output of RT 3003.
The first dataset contains 2,762 stereo images with a duration of about four and a half minutes. The total horizontal distance is approximately 2,205 m. The trajectories obtained by stereo VO and the proposed method together with the reference are plotted in Figure 2. The green, blue and red lines represent the stereo VO, stereo VO with NHC, and the reference (DGPS/IMU), respectively. It can be seen that although it is inevitable for both stereo VO and stereo VO with NHC to suffer drift, the accuracy of the proposed method with NHC is much improved. The trajectory of the proposed method is closer to the reference most time of the test.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_fig2.png?pub-status=live)
Figure 2. First dataset trajectories
To see the performance of the two different methods more clearly, the horizontal errors are plotted in Figure 3 and the root mean square (RMS) values of horizontal positioning error obtained by each method are summarised in Table 1. In Figure 3, the stereo VO and stereo VO with NHC errors are plotted in green and blue lines respectively. The horizontal errors are represented by solid lines while the east and north errors are plotted in two kinds of dash lines. Each epoch is 0⋅1 s since the FPS setting is 10. Generally, the errors in each direction for both methods increase with time. For stereo VO, the errors in east and north fluctuate during the test, which is caused by the manoeuvring. The maximum horizontal error of stereo VO happens at the end of the trajectory. In comparison, the errors of stereo VO with NHC in each direction grow much more slowly with time and all the errors are under 10 m. The accuracy of the proposed method is improved in both east and north directions, compared with stereo VO. The horizontal RMS value of the proposed method is less than half of stereo VO. The maximum horizontal error is also reduced from 21⋅89 m to 8⋅16 m. From Figures 2 and 3, the stereo VO horizontal error is smaller than stereo VO with NHC in a short period only at around the 2500th epoch. After several manoeuvres, when the land vehicle just turns around, the stereo VO's position is not accurate due to the drift. However, it might move closer to the true path due to inaccurate heading. This is the reason that the stereo VO errors fluctuate and sometimes are slightly better than stereo VO with NHC.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_fig3.png?pub-status=live)
Figure 3. First dataset horizontal errors
Table 1. RMS of stereo VO and stereo VO with NHC for the first dataset
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_tab1.png?pub-status=live)
The second dataset contains 5,177 stereo images with more manoeuvres during the test. The duration of this dataset is over eight and a half minutes and the horizontal distance is about 4,129 m. Similarly, in comparison with stereo VO, the trajectory of the proposed method with the application of the velocity constraint is much closer to the reference in Figure 4.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_fig4.png?pub-status=live)
Figure 4. Second dataset trajectories
The horizontal errors of stereo VO and stereo VO with NHC are plotted in Figure 5 and summarised in Table 2. Stereo VO with NHC outperforms stereo VO, as shown in Figure 5. The abnormal peaks around the 900th epoch may be caused by the inaccurate positioning results of the reference DGPS/IMU system since the reference trajectory is not on the path during this short period. The maximum horizontal error of stereo VO with NHC happens during this period. Other than this period, the errors of stereo VO with NHC in each direction grow slowly with time. Errors of stereo VO with NHC are under 10 m most of the time, while the error growth of stereo VO is much more evident, especially after the 3,500th epoch. Similar to the previous dataset, stereo VO errors are less than stereo VO with NHC during very short periods around the 2,700th epoch and 4,500th epoch. From Table 2, the RMS values of the proposed method are less than stereo VO in both east and north directions. The horizontal RMS value of the proposed method is only one-third of the stereo VO. The maximum horizontal error drops from 57 m to 28⋅54 m.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_fig5.png?pub-status=live)
Figure 5. Second dataset horizontal errors
Table 2. RMS of stereo VO and stereo VO with NHC for the second dataset
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_tab2.png?pub-status=live)
The third dataset contains 4,663 stereo images with the most manoeuvres. The horizontal distance of this data is about 5,060 m and the duration is more than seven and a half minutes. The trajectories are shown in Figure 6. Similarly, the drift of the proposed method is much reduced compared with stereo VO. To compare the accuracy of the two methods, the positioning errors with DGPS/IMU as the reference are plotted in Figure 7 and RMS values are listed in Table 3. Although the performance of the proposed method for this dataset is worse than the previous two datasets, the accuracy is still much improved with horizontal RMS almost one-third of stereo VO. Because there are more manoeuvres in this test, the horizontal errors of stereo VO fluctuate more than the previous datasets. The general error trends of stereo VO with NHC are similar to the stereo VO, but the magnitudes are much less. The velocity constraint effectively limits the error growth with a horizontal error smaller than 20 m most of the time while the stereo VO errors increase significantly, especially during the second half of the test. With the application of velocity constraint, the horizontal error RMS is reduced from 43⋅17 m to 13⋅24 m while the maximum horizontal error drops from 83⋅48 m to 23⋅88 m.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_fig6.png?pub-status=live)
Figure 6. Third dataset trajectories
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_fig7.png?pub-status=live)
Figure 7. Third dataset horizontal errors
Table 3. RMS of stereo VO and stereo VO with NHC for the third dataset
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_tab3.png?pub-status=live)
To summarise the results in this section, the relative horizontal position errors (the ratio of horizontal error RMS to the horizontal distance) of both methods are listed in Table 4. The relative horizontal position errors of the proposed method are much reduced compared with stereo VO. The relative horizontal position errors of stereo VO with NHC are all less than half of the stereo VO. With the velocity constraint applied, the relative horizontal position errors are all reduced to less than 0⋅3%.
Table 4. Relative horizontal position errors for the datasets
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210826010555926-0258:S037346332100028X:S037346332100028X_tab4.png?pub-status=live)
5. Conclusion
This paper proposes a velocity constrained stereo VO method to improve the horizontal positioning accuracy for ground vehicle navigation. The method to apply the stereo VO as the core component in multi-sensor integration is presented. The drift of stand-alone stereo VO is effectively reduced by introducing the velocity constraint of the ground vehicle. The test results with the KITTI datasets show that the horizontal errors of the proposed method are much less than the stereo VO. By comparison with stereo VO, both the overall horizontal error RMS and maximum horizontal errors are reduced to half of the stand-alone stereo VO. The relative horizontal position errors of the proposed method are reduced to less than 0⋅3% for all the datasets.