New Environmental Line Feature-based Vision Navigation: Design and Analysis

Zeyu Li; Jinling Wang; Kai Chen; Yu Sun

doi:10.1017/S0373463317000224

New Environmental Line Feature-based Vision Navigation: Design and Analysis

Published online by Cambridge University Press: 09 May 2017

Zeyu Li ,

Jinling Wang ,

Kai Chen and

Yu Sun

Show author details

Zeyu Li*: Affiliation:
(School of Civil and Environmental Engineering, UNSW Australia, Sydney, Australia)
Jinling Wang: Affiliation:
(School of Civil and Environmental Engineering, UNSW Australia, Sydney, Australia)
Kai Chen: Affiliation:
(School of Civil and Environmental Engineering, UNSW Australia, Sydney, Australia)
Yu Sun: Affiliation:
(School of Civil and Environmental Engineering, UNSW Australia, Sydney, Australia)
*: (E-mail: zeyu.li@student.unsw.edu.au)

Article contents

Abstract
INTRODUCTION
LINE SEGMENT INTERSECTION FEATURE DETECTION
LINE SEGMENT INTERSECTION FEATURE DESCRIPTION
LINE SEGMENT INTERSECTION FEATURE MATCHING
MATCHING VALIDATION
EXPERIMENTS AND ANALYSIS
CONCLUDING REMARKS
References

Rights & Permissions

Abstract

Vision navigation using environmental features has been widely applied when satellite signals are not available. However, the matching performance of traditional environmental features such as keypoints degrades significantly in weakly textured areas, deteriorating navigation performance. Further, the user needs to evaluate and assure feature matching quality. In this paper, a new feature, named Line Segment Intersection Feature (LSIF), is proposed to solve the availability problem in weakly textured regions. Then a combined descriptor involving global structure and local gradient is designed for similarity comparison. To achieve reliable point-to-point matching, a coarse-to-fine matching algorithm is developed, which improves the performance of the point set matching algorithm. Finally, a framework of matching quality evaluation is proposed to assure matching performance. Through the comparison, it is demonstrated that the proposed new feature has superior overall performance especially on correctly matched numbers of keypoints and matching correctness. Also, using real image sets with weak texture, it is shown that the proposed LSIF can achieve improved navigation solutions with high continuity and accuracy.

Keywords

Vision Indoor navigation Weak texture Line segment intersection feature

Type: Research Article
Information: The Journal of Navigation , Volume 70 , Issue 5 , September 2017 , pp. 1133 - 1152

DOI: https://doi.org/10.1017/S0373463317000224 [Opens in a new window]
Copyright: Copyright © The Royal Institute of Navigation 2017

1. INTRODUCTION

The next generation of navigation systems aim to obtain position and orientation of a moving platform anywhere at any time. Among many emerging navigation techniques, vision-based navigation has become a promising and popular approach to achieve ubiquitous navigation as it is accurate, passive and low-cost (Jin et al., Reference Jin, Wang, Moran, Pan and Zhao2016; Liu et al., Reference Liu, Zhou, Sun, Di and Liu2012; Xian et al., Reference Xian, Hu and Lian2015). In this domain, natural environmental features including edges, corners and keypoints (points of interests or features) play an important role as they can be automatically detected, described and matched, which provides the capability of storing and transferring the geospatial information. As a type of natural environmental feature, keypoints attract interest from researchers due to their detector repeatability and descriptor distinctiveness. Repeatability is defined as robustness of keypoint location with regard to environment change such as translation, scaling, rotation and viewpoint change. Distinctiveness means the generated descriptor should be distinctive from others in the same image. The former characteristic enables keypoints to be tracked and the latter provides the possibility for keypoints to be distinguished and matched.

Keypoints are closely related to the performance of vision-based navigation, which is illustrated in Figure 1. The presence or absence of keypoints directly affects the continuity since geo-spatial information contained in the keypoints cannot be obtained if matched keypoints do not exist. The number and distribution of keypoints affect the accuracy of navigation solutions through error propagation (Li et al., Reference Li, Wang, Alqurashi, Chen and Zheng2016). If there are mismatched keypoints, the correctness of the navigation solution may be violated, degrading the integrity of the navigation system.

Figure 1. The relationship between keypoints and navigation performance.

Traditional keypoints such as Scale-Invariant Feature Transform (SIFT) (Lowe, Reference Lowe1999) are generally good natural features in rich texture areas as illustrated in Figure 2(a). However, keypoint matching in ubiquitous weakly textured areas such as homogenous walls and man-made objects is challenging. Most traditional keypoint detection and description methods are based on intensity or colour variation (e.g. gradients). If the intensity or colour is stable, it will be difficult to detect keypoints. Moreover, although a number of keypoints may be detected, the descriptor of such a keypoint will tend to lose its distinctiveness as the descriptor comes from the neighbouring area of the detected keypoint, eventually affecting matching performance. In this scenario, the continuity of the vision-based navigation system will be seriously affected. Although some keypoints may be detected and matched correctly, they tend to be located in certain small rich texture areas. Besides, the number is significantly reduced, hence the geometry of vision-based navigation will deteriorate, finally affecting the accuracy of navigation solutions (Li et al., Reference Li, Wang, Alqurashi, Chen and Zheng2016). Also, the ratio of mismatches will be enlarged, threatening the integrity of the navigation system.

Figure 2. Illustrative examples for keypoint matching, line segment matching and point set matching.

There have been studies aiming to improve navigation performance in texture-less areas. One aspect is to directly match the detected line segments to achieve vision-based navigation. As Figure 2(b) shows, line segments with the same number are correctly matched. The geo-spatial information contained in the line segments can be transferred through line segment matching. Then the position of the camera can be determined by space resection. Zhou et al. (Reference Zhou, Zou, Pei, Ying, Liu and Yu2015) put forward Structural Simultaneous Localisation And Mapping (StructSLAM) that made use of the structured lines to reduce the error of navigation, and simultaneously generated maps made up of detected line segments. However, for line segment matching, it is difficult to assure its correctness in weakly textured areas as the distinctiveness of description depends on the neighbouring areas. Another aspect is to develop new keypoints. As illustrated in Figure 2(b), the intersection points with the same shape are correctly matched. Similarly, Kim and Lee (Reference Kim and Lee2012) employed the line segment pair to generate a Line Intersection Context Feature (LICF) that was invariant under perspective projection in weakly textured areas. The matched LICFs provided the clue for line segment matching, point-to-point matching and epipolar geometry reconstruction in weakly textured areas. However, original LICF employs the Normalised Cross-Correlation (NCC) method for matching tentative points. There is still room to improve its matching performance as NCC is not invariant to scale, rotation and shearing differences (Lewis, Reference Lewis1995).

Point set registration is related to keypoint matching. As shown in Figure 2(c), the main objective of point set registration is to find the correspondence between two point sets only using the positions of the point sets as the input, which is beneficial for keypoint matching in weakly textured areas. The transformation of the two point sets often includes not only rigid transformation such as rotation and scaling, but also non-rigid deformation, noise and outliers. However, one difference between the traditional point set and detected keypoints is that the mismatch ratio can be very high (e.g. larger than 50%) in the texture-less environment, which poses a challenge for the robustness of the traditional point set matching algorithm. Therefore the robustness of the point set matching algorithm needs to be enhanced through the appropriate design.

In this paper, we address the limitation of natural features in weakly textured areas in the indoor environment by the detection of a new feature – Line Segment Intersection Feature (LSIF). Considering the difficulty of matching in texture-less areas, its description is given and the matching algorithm is designed. Quality control is conducted to provide a measure of the trust for matching correctness and to further ensure matching correctness. The rest of this paper is structured as follows. Sections 2, 3, 4 and 5 propose the detailed design for the LSIF detection, description, matching and validation algorithm respectively. In Section 6, real images for weakly textured areas in indoor environment are employed to compare and analyse the matching and navigation performance of LSIF. Section 7 presents concluding remarks.

2. LINE SEGMENT INTERSECTION FEATURE DETECTION

LSIFs originate from the intersections of line segment pairs. Here, a Line Segment Detector (LSD) (Von Gioi et al., Reference Von Gioi, Jakubowicz, Morel and Randall2008) with sub-pixel accuracy and controlled false detection rate is employed. It creates a level-line field, and then pixels with similar orientations are segmented to form line support regions. In each region, the principal inertial axis is used as the main rectangle direction. Pixels whose level-line angles are within a certain tolerance are identified as aligned points. After the validation process based on a contrario approach using the number of aligned points, the detected line segments are extracted with two end points.

However, the simple intersection for the detected line segments will not contribute to the following matching performance since there are several chaotic intersection points from unnecessary line segment pairs. Normally these line segments have small lengths. The main line segments tend to be vertical with others in a man-made indoor environment, which is a beneficial characteristic for eliminating unnecessary line segments. Therefore, a filter based on the line segments' lengths and intersection angles with others is applied. That is, only the line segments with lengths larger than a predefined threshold (e.g. ten pixels), and at the same time, intersection angles with others lying in a range such as $\lsqb 80^{\circ}100^{\circ}\rsqb $ , are preserved. The retained line segments can be represented by $L_{1}\comma \; L_{2}\comma \; \ldots\comma \; L_{k}$ .

As shown in Figure 3, a neighbouring area alongside L ₁ is named as the critical region ∏ within the image coordinate system, which can be defined as a rectangle Q ₁ Q ₂ Q ₃ Q ₄ by extending two squares with the length being equal to D from the two ending points respectively. The squares always lie on different sides of L ₁. The rule for detecting LSIFs from L ₁ is that if intersections between L ₁ or L ₁'s extensions and other line segments or their extensions lie in ∏, they are detected as LSIFs. For example, K ₁, K ₂ and K ₃ are detected LSIFs as their corresponding line segments or extensions intersect with L ₁ in ∏. The intersection between L ₅'s extension and L ₁'s extension does not lie in ∏, therefore its intersection is discarded.

Figure 3. The critical region ∏ of L ₁

The coordinates of the detected LSIFs can be represented by:

(1)

$$\hbox{C}_{\rm LSIF}=\left\{\hbox{INEPT}(\hbox{L}_{\rm i},\hbox{L}_{\rm j})\mid \hbox{INEPT}(\hbox{L}_{\rm i},\hbox{L}_{\rm j})\in \prod \right\}$$

where INEPT represents the intersection's coordinates of the two detected line segments or their corresponding extension lines. The main differences between the proposed LSIF and LICF proposed by Kim and Lee (Reference Kim and Lee2012) lie in two aspects: firstly, the critical region of LSIF is a square for simplified calculation, while the boundary of determining LICF involves the calculation of two circles. Secondly, the filter based on the length and intersection angle excludes the unnecessary line segments which contribute little for structure description.

3. LINE SEGMENT INTERSECTION FEATURE DESCRIPTION

The aim of LSIF description is to generate descriptors according to gradients and structure information (e.g. geometric relationship), and provide tentative correspondences by similarity comparison.

Although the number and distribution of detected LSIFs vary for each image, the global structure of the detected LSIFs, which can be referred to as the general 2D geometric relationship between one keypoint and others, is comparatively stable. For example, if one detected LSIF_a in image I correctly matched with another LSIF_b in image II, significant similarities will exist between LSIF_a's geometric relationship (e.g. distance and relative position) with other LSIFs in image I, and LSIF_b's geometric relationship with other LSIFs in image II, and vice versa. The benefit of this is that the stable global structure description purely uses LSIFs' image coordinates without the extraction of gradients or colour information from texture-less areas. Therefore, global structure should be the main clue for LSIF matching.

Compared with pure point set matching, the image itself also brings benefits for LSIF description. Although the uniqueness of neighbouring gradient information (e.g. variation of intensity) for the detected LSIFs from a texture-less area is reduced, it can provide the secondary clues for LSIF matching if the weighting factor is appropriately controlled (Section 4.1). Besides, a few detected LSIFs may exist in the small areas with rich texture. Therefore, the description of local gradient contributes to reducing the matching ambiguity.

Considering these two aspects, this paper has designed a combined descriptor that utilises local gradient and global structure of the detected LSIF and is illustrated as follows:

(2)

$$D_{C}=(D_{S},D_{I})$$

where I and S are local gradient descriptor and global structure descriptor respectively.

4. LINE SEGMENT INTERSECTION FEATURE MATCHING

The matching correctness is important as it will affect the integrity of the vision-based navigation system. This section designs a coarse to fine matching strategy to assure correctness.

4.1. Coarse Matching using Combined Descriptors

Compared with point set registration where the coordinates of all the keypoints are directly used as the description, the coarse matching preserves the keypoints with high possibility to be correctly matched. The coarse matching procedure for two sub-descriptors should consider their own characteristics. In this paper, Shape Context (Belongie et al., Reference Belongie, Malik and Puzicha2002) and SIFT descriptor (Lowe, Reference Lowe1999) are chosen as the global structure descriptor and local gradient descriptor, respectively. Therefore, the similarity between the sub-descriptors is measured by a χ² test statistic and L ₂ norm respectively. Assume there are m and n keypoints in the two images. For each sub-descriptor, a m × n cost matrix can be generated. Lower cost means that the two corresponding descriptors have a higher possibility to be matched.

The cost matrix can be generated as follows: originally the similarity measure for Shape Context is constrained between 0 and 1 by χ² test statistic. Similarly, to combine with the Shape Context, the similarity measure of the SIFT descriptor needs to be normalised between 0 and 1. Assume the cost matrix generated by S is C _S and the one generated by I is C _I. Since the distinctiveness of local gradient descriptors is weakened in texture-less areas, more weight should be assigned to a global structure descriptor. Hence w _S should be larger than w _I (e.g. w _S = 0·6 and w _I = 0·4). Therefore, the final cost matrix C _C of the combined descriptor D _C can be denoted as:

(3)

$$C_{C}=w_{S}C_{S}+w_{I}C_{I}(w_{S}>w_{I})$$

The establishment of point-to-point correspondences is done by the Hungarian method (Kuhn, Reference Kuhn1955). Other algorithms for finding the optimal path of the cost matrix can still be applied. After that, epipolar geometry-based Random Sample Consensus (RANSAC) (Hartley and Zisserman, Reference Hartley and Zisserman2003) is employed to further reduce mismatches. Then the initial point-to-point matching is generated.

As expected, the initial matching result purely from the coarse matching stage is not completely reliable. The ratio of mismatches depends on the distinctiveness of keypoints' global structure and local gradient.

4.2. Original Affine Coherent Point Drift Algorithm

For completeness, affine Coherent Point Drift (CPD) matching (Myronenko and Song, Reference Myronenko and Song2010) is briefly described in this section. The affine CPD algorithm models one point sets as the centroids from the Gaussian Mixture Model (GMM), and the other set is then generated by GMM. If one data point is correctly matched, its GMM posterior probability is maximised. In the matching process, the topological structure of the point is preserved as the centroids move coherently as a group.

Assume X _N×2 and Y _N×2 are the coordinates of the tentative matches in two images respectively after coarse matching. X _N×2 is the data point set and Y _N×2 is the set of GMM centroids. An additional uniform distribution represents the noise and mismatches. The mixture model is constructed as:

(4)

$$\hbox{p}(\hbox{x})=\hbox{w}\displaystyle{{1}\over{N}}+(1-w)\sum_{n=1}^N \displaystyle{{1}\over{N}}p(x\mid n)$$

where $\hbox{p}\lpar \hbox{x}{\vert }\hbox{m}\rpar =\displaystyle{{1}\over{2\pi \sigma^{2}}}e^{-\displaystyle{{\Vert x-y_{n}\Vert }\over{2\sigma^{2}}}}$ , and $\hbox{w}\lpar 0<w<1\rpar $ is the weight for the uniform distribution. The location of the GMM centroid can be determined by the affine parameter B and t by minimising the negative log-likelihood function in Equation (5).

(5)

$$\hbox{E}(\hbox{B},\hbox{t},\sigma^{2})=-\sum_{n=1}^N {log}\sum_{n=1}^{N+1} p(n)p(x_{n}{\vert}n)$$

An Expectation Maximisation (EM) algorithm is employed to find B and t. In the Expectation step, the objective function shown in Equation (6) is the upper bound of the negative log-likelihood function in Equation (5).

(6)

$$\hbox{Q}(\hbox{B},\hbox{t},\sigma^{2})=\displaystyle{{1}\over{2\sigma^{2}}} \sum_{n=1}^N \sum_{n=1}^N P^{old} (n{\vert} x_{n}) \vert x_{n}-By_{n}-t\vert^{2}+N_{p}{log}\sigma^{2}$$

where $\hbox{N}_{\rm p}=\sum_{n=1}^{N} \sum_{n=1}^{N} p^{old}\lpar n{\vert }x_{n}\rpar $ . According to Bayes' theorem, the components of the posteriori probability matrix $\hbox{p}^{\rm old}$ equals:

(7)

$$\hbox{p}^{\rm old}(\hbox{n}{\vert}\hbox{x}_{\rm n}) =P(i,j)=\displaystyle{{exp(-\displaystyle{{1}\over{2\sigma^{2}}} \vert x_{i}-(By_{j}+t) \vert^{2})}\over{\sum_{n=1}^N exp(-\displaystyle{{1}\over{2\sigma^{2}}} \vert x_{i}-(By_{j}+t) \vert^{2})+2\pi \sigma^{2} \displaystyle{{w}\over{1-w}}}}$$

where σ² equals $\displaystyle{{1}\over{2N^{2}}}\sum_{k}^{N} \sum_{k}^{N} \Vert x_{k}-y_{k}\Vert $ . As the number of the two tentatively matched LSIFs is the same, a N × N posterior probability matrix P can be constructed as the indicator for matching correctness.

The affine parameter B and t can be directly deduced by the partial derivatives from Equation (6) in the Maximisation step of EM process. The EM process will iterate Expectation and Maximisation steps until convergence. More details can be seen in the work by Myronenko and Song (Reference Myronenko and Song2010).

4.3. Modified Coherent Point Drift Algorithm

As mentioned above, the original affine CPD employs the initial values of affine parameters to estimate the probability of matching correctness (E step). However, the constraints from global structure similarity can be beneficial in assuring the matching correctness in the EM process.

Shape Context is applied as the descriptor for the global structure similarity comparison. The cost matrix of Shape Context for each initial keypoint pair can be transferred to the prior matching probability matrix, which imposes constraints on the original matching probability matrix in Equation (7).

The construction of the prior probability matrix P _Prior is as follows. The weighting factor for the ‘correct’ correspondence in the coarse matching stage is set as W _Prior (e.g. 0.8). In the first iteration, the prior probability matrix is a N × N diagonal matrix with diagonal elements equalling to W _Prior, and remaining elements equals to $\displaystyle{{1-W_{Prior}}\over{N-1}}$ . For the following iterations, the elements on the optimal path of the cost matrix from the Hungarian method are set as W _Prior, and similarly, the other elements are set as $\displaystyle{{1-W_{Prior}}\over{N-1}}$ . Therefore the new matching probability matrix is set as the multiplication of P(n, n) and $P_{Prior}\lpar n\comma \; n\rpar $ as shown in Equation (8).

(8)

$$P_{New}(n,n)=P(n,n)P_{Prior}(n,n)$$

By using Equation (8), the correct correspondences are preserved, while the weights of mismatches are controlled. In summary, the description and matching process is summarised in Figure 4.

Figure 4. LSIF description and matching process.

5. MATCHING VALIDATION

After the above three steps, a few mismatches may still exist in the pairs. Moreover, if the matching quality does not meet the requirement for correctness, the algorithm should have the ability to warn users. Therefore matching validation and quality evaluation are necessary to further detect and exclude the mismatches as well as to assess the matching quality. Assume X _N×2 and Y _N×2 are tentative matching pairs. With the obtained affine transformation with B and t, each keypoints' coordinates in X _N×2 can be transferred to $\hat{Y}_{N\times 2}$ . If they are correctly matched, the Euclidean distance between estimated coordinates $\hat{Y}_{N\times 2}$ and $\hbox{Y}_{{\rm N}\times 2}$ will be small. If mismatches exist, their corresponding distances will be distant from the majority. Therefore, mismatch elimination is transferred to a one-dimensional outlier detection problem. Based on this, a Minimum Covariance Determinant (MCD) is applied in this paper to detect and exclude mismatches. The details of MCD can be found in Humenberger et al. (Reference Humenberger, Engelke and Kubinger2010). Other outlier detection algorithms can still be applied.

After mismatch detection and exclusion based on MCD, all the obtained matching pairs $Pair_{i}=\lcub \lpar x_{i}\comma \; y_{i}\rpar \mid i=1\comma \; 2\comma \; \ldots\comma \; N$ can be modelled in a linear form based on affine transformation as the functional model, and its corresponding stochastic model as illustrated below:

(9)

$$Ax=l+v$$

(10)

$$\Sigma=\sigma_{o}^{2}\hbox{Q}=\sigma_{o}^{2}\hbox{P}^{-1}$$

where x represents unknowns in the affine transformation matrix, l is the 2N × 1 observation vector composed of y _i, v is the residual vector and A is the 2N × 6 design matrix coming from x _i. For the stochastic model, $\sigma_{o}^{2}$ is the a priori variance factor, Q is the $\hbox{2}N\times 2N$ cofactor matrix, and P is the $\hbox{2}N\times 2N$ weight matrix.

To evaluate the quality of correspondences, a two-step outlier detection procedure including global test and data snooping is conducted as shown in Figure 5.

Figure 5. The validation processes.

The global test will verify the global consistency between the observation and the model including functional model and stochastic model. The test statistic can be formulated as (Knight et al., Reference Knight, Wang and Rizos2010):

(11)

$$\displaystyle{{f\hat{\sigma}_{0}^{2}}\over{\sigma_{0}^{2}}}=\displaystyle{{v^{T}Pv}\over{\sigma_{0}^{2}}} \sim \chi_{1-\alpha,f}^{2}$$

where f is a number for redundancy which equals to 2N − 6, $\hat{\sigma}_{0}^{2}$ is the posteriori variance factor, $\sigma_{0}^{2}$ is the a priori variance factor, v is the residual vector and P is the weight matrix. The local test (data snooping) shown in Equation (12) is to further find the faulty observations according to outlier statistics:

(12)

$$w_{i}=\displaystyle{{e_{i}^{T}Pv}\over{\sigma_{0}\sqrt {e_{i}^{T}PQ_{v}Pe_{i}} }}\sim N(0,1)$$

where e _i is a n × 1 vector containing zeros but one is in the corresponding position for the assumed outlier. If the test statistic is larger than the critical value determined by the confidence level, the observations may be faulty.

6. EXPERIMENTS AND ANALYSIS

A pre-calibrated Digital Single Lens Reflex (DSLR) camera Canon 450D was used to capture the images. Three data sets, which were named as Office (94 images), Corridor (66 images) and Kitchen (27 images) respectively, were collected in large weakly textured regions inside the UNSW Civil Engineering Building, which were shown in Figure 6.

Figure 6. Samples of collected datasets

Each image contained a set of targets (e.g. Ground Control Points - GCPs) whose image coordinates and world coordinates were precisely known, which could be applied as the input to generate ground truth such as affine parameters and navigation solutions. It should be noted that in the mapping stage, the GCPs are used only to transfer the geo-information to the matched LSIFs by bundle adjustment. In the navigation stage (Sections 6.3 and 6.4), the LSIFs detected on these coded targets are removed. Therefore LSIFs are employed as natural features for navigation.

6.1. Illustrative example for LSIF Keypoint Matching Algorithm

This section aims to provide an insight into the proposed LSIF detection, description, matching and validation algorithm with an example.

As shown in Figure 7(a), in total 230 line segments were detected. Depending on the LSD's performance, some vertical line segments laid on the left half of the images were not detected, potentially reducing the number of detected LSIFs. However, the detected line segments still capture the main structure. It was observed that there still existed small and chaotic line segments that did not contribute much in capturing the main structure. Therefore the line segments, whose lengths were less than 0·02H and intersection angles were out of the range between 80 and 100 degrees, were excluded, where H was the height of image in pixels. For example, the line segments detected in sub-figure 1 of Figure 7(a) were removed as their length were less than 0·02H. Sub-figure 2 was also excluded due to the intersection angle being less than 80^°. Only the line segments that met the aforementioned conditions in Section 2 were preserved. Sub-figure 3 was an example. Sub-figures 4 and 5 of Figure 7(b) were typical detected LSIF. Finally 102 LSIFs from 98 preserved line segments were detected.

Figure 7. Line segment detection, filtering line segments and detected LSIFs

With the designed detector, 102 and 120 LSIFs were detected for the two images respectively, as shown in Figure 8. In Figure 8(a), with the designed descriptor and coarse matching algorithm, 56 of 66 detected LSIFs were correctly matched. But there were still ten mismatches. The coordinates of tentatively matched keypoints were then employed as the input for the modified CPD-based matching approach. Using Shape Context and LSIF validation, the mismatches could be completely eliminated as illustrated in Figure 8(b). The degree of freedom after fine matching was 96 according to Equation (11). If the confidence level was set as 99·5%, the critical value should be 135·433, since the calculated global test statistic (42.275) was less than the critical value. The global test showed that the correspondences were consistent with the affine transformation model. W tests were also conducted for each matched keypoint pair, and the histogram of W test statistics is illustrated in Figure 9. The critical value was set as 3·29 if the confidence level was 99·9%. Since W test statistics for each correspondence were less than 3·29, it was validated that there were no mismatches in the generated correspondences.

Figure 8. Matching solution after coarse matching and fine matching.

Figure 9. The histogram of W test statistics in the validation stage.

6.2. Matching Performance Comparison and Analysis

To compare the performance of LSIF with LICF and SIFT, 30 pairs of images from the aforementioned datasets were employed as the testing data. The algorithms shown in Table 1 included three parts: LICF-based approaches (Algorithms 1–4), LSIF-based approaches (Algorithms 5–8) and SIFT-based approaches (Algorithm 9–12), which all applied different description and matching strategies. All the tentatively matched keypoints were refined by RANSAC algorithm based on epipolar constraints except Algorithms 4, 8 and 12. Algorithms 3, 7 and 11 directly used the detected keypoints' coordinates as the input for original affine CPD matching approach. In the following paragraphs, all the algorithms are represented by Detector/Descriptor for better readability. For example, Algorithm 8 is represented by LSIF/PCD.

Table 1. Matching performance comparison for LICF based approaches, LSIF based approaches and SIFT based approaches.

The matching performances are compared in five aspects. NoF represents the number of matched keypoints after RANSAC or corresponding validation approaches. NoM represents the number of mismatches. Matching accuracy is quantified by Average Transformation Error (ATE) illustrated in Equation (13):

(13)

$$E_{Trans}=\displaystyle{{1}\over{2N}}\sum_{i=1}^N (d(x'_{i},Fx_{i})+d(x_{i},F^{T}x'_{i}))$$

where N is the number of correctly matched keypoint pairs and x _i and x′_i are image coordinates of correctly matched keypoints respectively. F is the estimated fundamental matrix from RANSAC using the correctly matched keypoints. $d\lpar x'_{i}\comma \; Fx_{i}\rpar $ represents the epipolar distance from the keypoint x′_i to epipolar line. Continuity is defined as the ratio between the number of image pairs where at least four keypoints are finally matched and the number of all the image pairs. The threshold for continuity is set as three because if the number of matched keypoints is equal or less than three, the solution for the camera's position cannot be unique (Thompson, Reference Thompson1966). The processing time is also compared.

LICF/PCD shows superior performances with the largest NoF and the smallest NoM. However, merely using a global structure descriptor such as Shape Context (LICF/SC) does not contribute to improving continuity and eliminating mismatches. Because LICF detection involves the calculation of two curves as the boundary, which takes more time, the processing time for LICF-based approaches are larger than for LSIF-based approaches.

NoFs of LSIF-based approaches are larger than those of LICF-based approaches. In terms of NoM, the LSIF-based approach shows similar performances to LICF-based approaches. However, LSIF/PCD does not contain any mismatches. The two groups' matching accuracy is similar, which are all around one pixel. LICF/PCD and LSIF/PCD have the highest matching accuracy. LICF/SC, LICF/CO and LSIF/CO contained images whose numbers of matched keypoints are less than three, which are caused by the limited ability of description and matching.

NoFs for SIFT-based approaches are similar to those of LICF-based approaches. However, NoMs of SIFT-based approaches are the largest among the three groups, which is caused by the deteriorated distinctiveness of the descriptor. Also, their continuity performances are the worst among the three groups. Using SIFT/SIFT, the NoFs for most image pairs are fewer than four. Therefore it has the minimum continuity. SIFT/PCD has the largest NoF among the four algorithms but its NoM is larger than those of LICF/PCD and LSIF/PCD.

The NoM for LSIF/PCD is the smallest among all the algorithms, showing the superiority of the proposed description and matching approach. Both NoF and ATE for LSIF/PCD are the second best among all the algorithms. The processing time for LSIF/PCD is moderate and acceptable and it still can be optimised. The comparison demonstrates that LSIF has the most balanced performances over the five studied aspects.

The time complexity analysis for LSIF/PCD is as follows. Assume the number of pixels in the left and right image is N _p pixels. In LSIF detection, the time complexity of line segment detection is O(N _p) (Von Gioi et al., Reference Von Gioi, Jakubowicz, Morel and Randall2008). The computation time for filtering and intersection is proportional to the number of detected line segments. Assume there are m _L and m _R detected LSIFs respectively (m _L > m _R). In the description stage, the computation time for SIFT and Shape Context is proportional to m _L and m _R respectively. In the coarse matching stage, the complexity of Hungarian optimisation is O(m _L ³). In the fine matching stage, assume there are n _f tentative correspondences to be further matched. The computation time for the modified CPD algorithm is $k_{EM}O\lpar n_{f}^{2}\rpar $ if the EM process converges after k _EM iterations. In the validation stage, assume the global test is passed after k _G iterations. The time complexity is k _G O(n _f). Therefore for LSIF/PCD, the processing time mainly depends on the number of detected keypoints since it directly affects the computation time of the Hungarian optimisation.

6.3. Navigation Performance Comparison

In the framework of navigation using reality-based 3D maps (Li et al., Reference Li, Wang, Knight and Ding2011), this section aims to compare the navigation performance of LSIF/PCD with three other competitive algorithms (LICF/PCD, SIFT/SIFT and SIFT/PCD).

Using LSIF/PCD, the geometric relationship between the camera, geo-referenced keypoints (e.g. Pseudo GCPs - PGCPs) and GCPs of the three data sets are shown in Figure 10, where the crosses show the mapping cameras' moving path, the asterisks are the manually configured targets which act as GCPs, and the triangles are geo-referenced LSIFs. The approximate reference is set as the planar determined by GCPs. The coarse mapping accuracy indicator is defined as the distance from the geo-referenced LSIFs to the defined reference since the ground-truth position of each geo-referenced LSIF is difficult to obtain.

Figure 10. Geometric relationship and corresponding accuracy histogram for Office

With the designed detection, description, matching and validation procedure, a total of 4968 LSIFs are geo-referenced from 59 images of the Office dataset. The LSIF's distribution and density are shown in Figure 10(a). It is noted that LSIFs cover most areas of one wall for the Office data set. There are fewer geo-referenced LSIFs for the other two walls. The reason for this is physically there are fewer line segments from the actual environment, reducing the number of matched LSIFs. It is observed that the position errors of most geo-referenced LSIF lie between −0·05 m to 0·05 m as illustrated in Figure 10(b), indicating that the mapping accuracy is promising. Some of geo-referenced LSIFs' accuracies are larger than 0·1 m. This can be explained by the fact that the physical positions of these geo-referenced LSIFs are not strictly on the wall.

Similarly, Figure 11(a) shows the coordinates of 2972 geo-referenced LSIFs from 40 images of the Corridor dataset, and Figure 11(b) shows the geo-reference accuracy of LSIFs, demonstrating that the majority of LSIFs' position error lies in [−0·05, 0·05] m as well. Figure 12(a) illustrates the 1186 geo-referenced LSIFs from 14 images of the Kitchen. Since physically most of the geo-referenced LSIFs lie on the wall, the coarse accuracy indicator shown in Figure 12(b) is higher than the other two data sets.

Figure 11. Geometric relationship and corresponding accuracy histogram for Corridor.

Figure 12. Geometric relationship and corresponding accuracy histogram for Kitchen

It can be concluded that LSIFs have the feasibility to be correctly matched and geo-referenced. The generated reality-based 3D map acts as the resource to provide geo-spatial information for navigation. The navigation solution is obtained in an epoch-by-epoch manner.

35, 26 and 13 querying images from the three datasets are matched with the reference images of generated maps to obtain geo-referenced keypoints' world coordinates respectively. All the querying images can obtain their navigation solutions. As each querying image contains at least four GCPs with known image coordinates and world coordinates, the ground truth of the navigation solution can be obtained uniquely by space resection, which can be used for accuracy evaluation. Four statistics for navigation error including average value, standard deviation, maximum value and minimum value are calculated for the position and orientation error.

These statistics for Office, Corridor and Kitchen are illustrated in Tables 2, 3 and 4 respectively. Due to the larger number of matched keypoints with favourable distribution, most position errors are less than 5 cm, and the largest error is less than 10 cm. All the errors in orientation generally were less than 3^°.

Table 2. Navigation solution accuracy of Office using LSIF.

Table 3. Navigation solution accuracy of Corridor using LSIF.

Table 4. Navigation solution accuracy of Kitchen using LSIF.

LSIF's navigation performance is compared with LICF/PCD, SIFT/SIFT and SIFT/PCD in five aspects, namely continuity, number of PGCPs (NoPGCP), availability, position error and orientation error. The definition of continuity is the same as that in Section 6.2. NoPGCP reflects the geometric strength of navigation. Based on the typical accuracy in position and orientation, it is reasonable to consider the epochs whose position and orientation errors are less than 0·1 m and 5^° respectively are available for indoor navigation. Therefore, the availability is defined as the ratio between the number of epochs whose navigation solutions meet the above criteria and the number of epochs that can obtain navigation solutions. Continuity indicates whether the epoch can obtain a navigation solution or not, while on the basis of continuity, availability puts more concerns on accuracy. As it is meaningless to involve incorrect navigation solutions in the discussion about accuracy, the accuracy is calculated and compared using only available epochs. Continuity of the four algorithms with regard to the three datasets is illustrated in Table 5. Through the comparison, classical SIFT/SIFT has the lowest continuity and NoPGCP due to the weak texture. SIFT/PCD and LICF/PCD have slightly higher continuity and NoPGCP. Using LSIF/PCD we can obtain a navigation solution for all the epochs, and its NoPGCP is the highest among the four algorithms.

Table 5. Continuity and average number of PGCPs for the four algorithms.

As illustrated in Table 6, SIFT/SIFT completely loses availability. Therefore its position and orientation error are not applicable. The availability's variation for SIFT/PCD and LICF/PCD depends on the number of correctly matched keypoints. The availability of LSIF/PCD for all the datasets is 100%. The accuracy of available epochs for SIFT/PCD, LICF/PCD and LSIF/PCD are similar.

Table 6. Availability, position error and orientation error for the four algorithms.

Continuity is closely related to the number of correctly matched keypoints. The accuracy of navigation solutions are influenced by a number of factors, such as the geometry between geo-referenced keypoints and camera, the accuracy of geo-referenced keypoints' world coordinates in the mapping stages, and the matching accuracy of keypoint matching algorithms. It could be concluded that by using LSIFs as the environmental feature, most navigation solutions could achieve centimetre-level accuracy in positioning, and the accuracy of orientation could be controlled to a few degrees.

6.4. Navigation Performance Evaluation Based on the Trajectory

This section aims to evaluate the performance of LSIF/PCD in a comparatively larger indoor environment. A total of 152 mapping images and 277 querying images were collected from a lobby with weak texture in the Civil Engineering Building on the University of New South Wales (UNSW) campus.

The 2D trajectory of navigation solutions and the simplified surrounding environment is illustrated in Figure 13. The 2D trajectory is consistent with the reality of the motion. All the epochs can obtain their navigation solutions, which means the continuity for LSIF is 100%. On average, 91 LSIFs can be matched on each querying image. If the same definition of availability as that in Section 6.3 is followed, its availability is 97.47% as the navigation solutions of eight epochs are not available. However, their position error is slightly larger in only a certain direction (e.g. E direction) and navigation solutions are still reasonable. One possible reason is the weak geometric distribution of PGCPs in the mapping stage.

Figure 13. Aerial view of the camera's trajectory

Similarly, the accuracy of all the available navigation solutions can be evaluated using the GCPs, which is illustrated in Table 7. The accuracy is similar with that in Section 6.3, where position and orientation errors are limited to a few centimetres and degrees, respectively.

Table 7. Navigation accuracy of position and orientation for the large indoor environment.

7. CONCLUDING REMARKS

This paper has proposed a new environmental line feature named as LSIF and its corresponding detection, description, matching and validation algorithms to overcome the continuity and correctness challenges in weakly textured areas of indoor environments. The proposed LSIF outperforms other keypoints in terms of the number of matched keypoints and matching correctness. Under the framework of navigation using reality-based 3D maps, high accuracy, continuity and availability have been achieved, showing its feasibility and potential as a new feature for vision-based navigation.

LSIF originates from the detection of line segments so that the detection of LSIF mainly depends on the line segment detection algorithm, whose invariance to scaling and illumination is still not fully explored. Further investigation is underway to improve its repeatability.

References

REFERENCES

Belongie, S., Malik, J. and Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 509–522.Google Scholar

Hartley, R. and Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge University Press.Google Scholar

Humenberger, M., Engelke, T. and Kubinger, W. (2010). A census-based stereo vision algorithm using modified Semi-Global Matching and plane fitting to improve matching quality. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 77–84.Google Scholar

Jin, Z., Wang, X., Moran, B., Pan, Q. and Zhao, C. (2016). Multi-region scene matching based localisation for autonomous vision navigation of UAVs. Journal of Navigation, 69, 1215–1233.Google Scholar

Kim, H. and Lee, S. (2012). Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs. Pattern Recognition Letters, 33, 1349–1363.Google Scholar

Knight, N.L., Wang, J. and Rizos, C. (2010). Generalised measures of reliability for multiple outliers. Journal of Geodesy, 84, 625–635.CrossRef Google Scholar

Kuhn, H.W. (1955). The Hungarian method for the assignment problem. Naval research logistics quarterly, 2, 83–97.Google Scholar

Lewis, J. (1995). Fast normalized cross-correlation. Vision interface, 120–123.Google Scholar

Li, X., Wang, J., Knight, N. and Ding, W. (2011). Vision-based positioning with a single camera and 3D maps: accuracy and reliability analysis. Journal of Global Positioning Systems, 10, 19–29.Google Scholar

Li, Z., Wang, J., Alqurashi, M., Chen, K. and Zheng, S. (2016). Geometric analysis of reality-based indoor 3D mapping. Journal of Global Positioning Systems, 14, 1.Google Scholar

Liu, C., Zhou, F., Sun, Y., Di, K. and Liu, Z. (2012). Stereo-image matching using a speeded up robust feature algorithm in an integrated vision navigation system. Journal of Navigation, 65, 671–692.Google Scholar

Lowe, D.G. (1999). Object recognition from local scale-invariant features. The proceedings of the 7th IEEE international conference on Computer vision, 1999. 1150–1157.Google Scholar

Myronenko, A. and Song, X. (2010). Point set registration: Coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 2262–2275.Google Scholar

Thompson, E. (1966). Space resection: Failure cases. The Photogrammetric Record, 5, 201–207.Google Scholar

Von Gioi, R.G., Jakubowicz, J., Morel, J.-M. and Randall, G. (2008). LSD: A fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 722–732.Google Scholar

Xian, Z., Hu, X. and Lian, J. (2015). Fusing stereo camera and low-cost inertial measurement unit for autonomous navigation in a tightly-coupled approach. Journal of Navigation, 68, 434–452.Google Scholar

Zhou, H., Zou, D., Pei, L., Ying, R., Liu, P. and Yu, W. (2015). StructSLAM: Visual SLAM With Building Structure Lines. IEEE Transactions on Vehicular Technology, 64, 1364–1375.Google Scholar