1. Introduction
The maritime transportation industry continues to see an increased focus on traffic safety, green shipping, marine environmental protection and overall sustainable development (Chen et al., Reference Chen, Zheng, Garg, Xu, Li and Fei2019a, Reference Chen, Zhang, Wan, Li, Huang and Fei2019b; McWhinnie et al., Reference McWhinnie, O'Hara, Hilliard, Le Baron, Smallshaw, Pelot and Canessa2021). Ship trajectory data, often collected by an automatic identification system (AIS), can reflect the locations and behaviour of ships. Based on ship trajectory data, the information can be processed to help with maritime activities (Fournier et al., Reference Fournier, Hilliard, Rezaee and Pelot2018) including location modelling for maritime search and rescue vessels (Pelot et al., Reference Pelot, Akbari and Li2015; Akbari et al., Reference Akbari, Pelot and Eiselt2018), estimation of the ship domain under different conditions to avoid collision (Hörteborn et al., Reference Hörteborn, Ringsberg, Svanberg and Holm2019; Qi et al., Reference Qi, Zheng and Li2011), traffic flow analysis (Qi et al., Reference Qi, Zheng and Gang2017, Reference Qi, Ji, Balling and Xu2021), ship trajectory prediction for traffic organisation optimisation (Qi and Zheng, Reference Qi and Zheng2016a; Alizadeh et al., Reference Alizadeh, Alesheikh and Sharif2021), green development of ports (Hua et al., Reference Hua, Chen, Wan, Xu, Bai, Zheng and Fei2020; Liu et al., Reference Liu, Liu and Qi2021), monitoring of fishing activities (Hu et al., Reference Hu, Jiang, de Souza, Pelot and Matwin2016; Rezaee et al., Reference Rezaee, Brooks and Pelot2017), and others.
However, the frequency of AIS messages is relatively high as an ordinary shipborne AIS device sends a message every two seconds to three minutes (Harati-Mokhtari et al., Reference Harati-Mokhtari, Wall, Brooks and Wang2007; Tichavska et al., Reference Tichavska, Cabrera, Tovar and Araña2015; Alizadeh et al., Reference Alizadeh, Alesheikh and Sharif2021). Accordingly, this generates a large amount of data and contains redundant information due to relatively slow ship movements thereby taking up large amounts of disk space and restricting the convenience and flexibility of the use of AIS data (Mazzarella et al., Reference Mazzarella, Arguedas and Vespe2015; Fiorini et al., Reference Fiorini, Capata and Bloisi2016; Qi and Zheng, Reference Qi and Zheng2016b). Therefore, ship trajectory data often need to be compressed and simplified to reduce redundant data and save storage space (Zhang et al., Reference Zhang, Liu, Cai, Wu and Shi2016). Ship trajectory data compression can alleviate the data storage pressure caused by long-term data accumulation which can help data migration, backup or use. Data compression can also simplify the visualisation and mapping of ship trajectory data, thereby reducing the load on computer memory and the graphics card.
Research on trajectory data compression mainly falls into three types. First is general vector data compression research, which considers the distance, angle and/or directional relationship of the points in the trajectory. This type of research is usually generalised and adaptable to the ship trajectories in different waters. Researchers have proposed a variety of approaches including the choosing interval points algorithm, limiting vertical distance algorithm, limiting angle algorithm, offset angle algorithm, the Douglas–Peucker (DP) algorithm, the grating algorithm and others (Gudmundsson et al., Reference Gudmundsson, Katajainen, Merrick, Ong and Wolle2009; Ji and Wang, Reference Ji and Wang2010; Popa et al., Reference Popa, Zeitouni, Oria and Kharrat2015). Among them, the compression errors of the DP and grating algorithms are smaller than the others (Veregin, Reference Veregin2000; Liu et al., Reference Liu, Li, Yang, Wu, Liu and Liu2019). The DP algorithm has reasonably good compression performance and is widely used. This approach requires all the points of the trajectory to be loaded to compute their Euclidean distance. The computational complexity is $\textrm {O}(n \log n)$ or even $\textrm {O}(n^2$
), and large memory and storage space are needed to execute the algorithm (Muckell et al., Reference Muckell, Hwang, Patil, Lawson, Ping and Ravi2011, Reference Muckell, Olsen, Hwang, Lawson and Ravi2014; Singh et al., Reference Singh, Aggarwal, Saxena and Prakash2017). The grating algorithm processes three points at one time, so the computational complexity is O($n$
) resulting in an algorithm that requires low memory and has more flexibility and application potential (Ji et al., Reference Ji, Xu and Deng2019; Qi and Ji, Reference Qi and Ji2020).
The second type of research on ship trajectory simplification focuses attention on ship motion characteristics in specific waters or circumstances, such as slow in ports. Studies of this type are usually more detailed in the local water area and are used for specific purposes. In order to retain constant information about speed, heading, position and path of the vessel in ports or other traffic-dense areas, a data reduction algorithm was developed (Ifrim et al., Reference Ifrim, Iuga, Pop, Wallace and Poulopoulos2017). To preserve vessel course alteration, a compression algorithm based on feature point recognition was proposed (Popa et al., Reference Popa, Zeitouni, Oria and Kharrat2015). Multiple dimensions (position, direction, speed) were used to optimise the directed-acyclic-graph-based online trajectory simplification algorithm (Zhang et al., Reference Zhang, Shi, Li and Zhang2020). A trajectory simplification method that better encodes complex behaviours and possibly retains the interesting part in full resolution was proposed (Sánchez-Heres, Reference Sánchez-Heres2019).
The third type of research facilitates the application or usage of trajectory compression, such as compression threshold determination, similarity measure and/or trajectory clustering. An AIS-based method for determining compression algorithm thresholds for the minimum ship domain was proposed (Zhang et al., Reference Zhang, Liu, Cai, Wu and Shi2016). An automatic thresholding was developed based on the channel and trajectory characteristics (Liu et al., Reference Liu, Li, Yang, Wu, Liu and Liu2019). A threshold determining method was studied based on the uncertainty and confidence value associated with each candidate stop in the trajectory (Milaghardan et al., Reference Milaghardan, Abbaspour and Claramunt2018). A trajectory similarity measurement algorithm based on feature point detection and synthetic distance was proposed to measure the similarity between trajectories (Qi and Zheng, Reference Qi and Zheng2016c). A similarity method considering the spatio-temporal behaviour of trajectories was used to find trajectories with acceptable similarity (Abbaspour et al., Reference Abbaspour, Shaeri and Chehreghan2017; Moayedi et al., Reference Moayedi, Abbaspour and Chehreghan2019). A similarity measure considering the course change and the meaning at the route level was developed (Zhao and Shi, Reference Zhao and Shi2019a). A clustering method based on compression and density was proposed to achieve improved marine traffic pattern recognition on ship trajectories (Zhao and Shi, Reference Zhao and Shi2019b).
Our research belongs to the aforementioned first and third types. In the first type of research, on the basis of our previous research on the comparison of different compression algorithms (Ji et al., Reference Ji, Xu and Deng2019; Qi and Ji, Reference Qi and Ji2020), we found that the grating compression algorithm has the performance advantages of low algorithm complexity, high compression performance and less requirement for computer memory. We pioneered the use of the grating algorithm for ship trajectory compression. In the third type of research, we improved the grating method to simplify the usage of the algorithm and optimise the compression performance. In this paper, we propose a dynamic adaptive grating algorithm that can automatically generate an optimal threshold for each trajectory in the dataset based on the users’ expectancy or requirement for the overall compression performance.
This paper is structured such that Section 2 states the methodology of the algorithm, Section 3 shows the test and the results of performances of the proposed algorithm, and Section 4 concludes the contribution of this paper and prospects the future work.
2. Methodology
2.1 Traditional grating compression algorithm
The general approach of the grating algorithm (Ji et al., Reference Ji, Xu and Deng2019) is to process three points (first, middle and last points) through a sector pattern. The connection between the first point and the middle point is the central axis of the sector to judge whether the last point is within the sector opening angle (the opening angle is obtained by the input threshold). If the last point is within the sector opening angle, it is considered that the trajectory has little change, there is information redundancy and therefore data compression is performed by deleting the middle point. If the last point is out of the sector opening angle, it is considered that the trajectory has a non-negligible change and therefore the middle point is retained. That middle point is then taken as the new first point to form a new sector to judge the relative position of the next point, and so on, until the execution reaches the last point of the trajectory. The schematic diagram of the grating algorithm compression principle is shown in Figure 1.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig1.png?pub-status=live)
Figure 1. Schematic diagram of the grating algorithm principle: (a) the trajectory; (b) compress $P_1$, $P_2$
, $P_3$
; (c) compress $P_1$
, $P_3$
, $P_4$
; (d) compress $P_1$
, $P_4$
, $P_5$
; (e) compress $P_4$
, $P_5$
, $P_6$
; (f) compression result
Figure 1(a) is a schematic diagram of a trajectory, with trajectory points $P_1,P_2, \ldots ,P_{15}$ distributed in a non-linear fashion. In Figure 1(b), the geospatial relationship of points $P_1$
, $P_2$
, $P_3$
is judged by the sector, whose vertex is $P_1$
, central axis is $P_1P_2$
and opening angle is the threshold. Because $P_3$
is in the sector, $P_2$
is considered compressible. As Figure 1(c) shows, $P_2$
is deleted, and $P_1$
is still kept as the first point, $P_3$
is set as the new middle point, so $P_1$
and $P_3$
form a sector (the blue area), and it intersects with the previous sector (the grey area) to form the new sector (yellow area). This new sector is used to judge the geospatial relationship with point $P_4$
. Because $P_4$
is in the sector, the middle point $P_3$
is deleted. In Figure 1(d), $P_3$
is compressed, so $P_4$
is set as the new middle point. With the central axis $P_1$
$P_4$
, the sector is formed as the blue area, and the intersect of the blue sector with the previous grey sector is shown as the yellow area, which is taken as the new sector for this process. Because $P_5$
is outside of the sector, $P_4$
needs to be retained. In Figure 1(e), since $P_4$
is kept, it is taken as a new start point, and then $P_5$
is the new middle point. They form a new sector to judge whether $P_6$
is in the sector or not, which is same to the situation shown in Figure 1(b). The final result is shown in Figure 1(f) in which the white hollow points are the points that have been compressed and deleted, and the black solid points are those that have been retained. It can be seen that the grating method retains the turning characteristics of the trajectory very well.
The algorithm has the characteristics of memorability, as well as a sensitivity to the angle changes. When the trajectory is relatively smooth and there are discarded compression points, the new sector will be the intersection of the current sector and the previous sector(s), and therefore, the new sector retains the features of the previous sector(s). The characteristic of the sensitivity to the angle changes is reflected in the dynamical adjustment of the sector pattern. As shown in Figure 1(c), the central axis of the current sector and the previous sector are $P_1P_3$ and $P_1P_2$
, respectively, but their directions are slightly different. The algorithm reflects the changes by narrowing down the opening angle dynamically.
In complex navigation waterways, there are many shapes of trajectories, which may be straight, curved or turning. If these trajectories with large differences in shape use the same threshold, the error distribution may be great for some of them because of the improper threshold. And since the threshold is the input, it is unrealistic to set the thresholds one by one to best fit the performance of each trajectory, and it would be time-consuming and labour-intensive to try to determine the best threshold repeatedly. These facts lead to a reduction in the performance or the applicability when the grating algorithm is used to compress a large dataset with different trajectories.
However, due to its excellent compression performance and fast computation demonstrated in practical use, the grating algorithm is considered as a promising algorithm worthy for further promotion and application. In this paper, we commit the effort on adaptive thresholding to dynamically assign the optimal threshold to each trajectory, to improve the algorithm in ease of use, better generalisation to the complex dataset with different trajectories and improvement in the compression performance.
2.2 Dynamic adaptive grating (DAG) compression algorithm
2.2.1 Definition of threshold
To determine the threshold, we must first clarify the specific definition of the threshold for calculation. As the above context states, the opening angle is the threshold. To simplify the threshold to a more understandable unit, we convert the threshold from the opening angle to the length of its opposite side, as shown in Figure 2.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig2.png?pub-status=live)
Figure 2. Schematic diagram of the sector pattern
Figure 2 shows that the starting point $P_1$ and the middle point $P_2$
are connected as a central axis, with the starting point as a vertex and angle $\theta$
as an opening angle, so a sector is formed. We take the length of the side opposite the angle $\theta$
as the threshold $T$
, as shown in Figure 2. The threshold $T$
can be obtained by Equation (1):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_eqn1.png?pub-status=live)
where $D_{p_1p_2}$ is the length between point $P_1$
and $P_2$
.
Because using a threshold for data processing is often to achieve a certain goal or performance (such as error or compression ratio) (Goh et al., Reference Goh, Basah, Yazid, Safar and Saad2018), we associate the compression performance with the selection of the threshold. In order to automatically obtain the threshold that adapts to compression requirements, an approximation method was developed to achieve this goal; the approximation method can use the compression performance as feedback to adaptively approximate the desired threshold.
2.2.2 Calculation of the candidate threshold
In order to derive the desired threshold, the compression performance can be taken as a guide. For instance, we take the compression error as the requirement of compression and require that the practical compression error not be greater than a setting value $E$. It was found that the compression error is nearly positively correlated with the threshold (refer to Figure 8). We use the linear relationship as the function to simplify the algorithm complexity, as shown in Equation (2), then the candidate threshold $T_i$
is obtained by Equation (3):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_eqn2.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_eqn3.png?pub-status=live)
where $T_i$ is the candidate threshold, $i$
($i = 0, 1, 2,\ldots$
) is the label representing the sequence of the candidate threshold; $T_{\textrm {min}}$
is the lower boundary of the threshold, usually set to 0; $E_{\textrm {min}}$
is the lower boundary of the error equal to 0 when $T_{\textrm {min}}$
is 0; $T_{\textrm {max}}$
is the upper boundary of the threshold, because it corresponds to the opening angle of the sector, it can be obtained by setting the angle no greater than $180^\circ$
in Equation (1); and $E_{\textrm {max}}$
is the upper boundary of the error, which is calculated by inputting $T_{\textrm {max}}$
to the compression algorithm.
2.2.3 Evaluation of the compression performance
The candidate threshold $T_i$, which is calculated from Equation (3), is then added into the compression algorithm to obtain the compressed trajectory, as shown in Equation (4):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_eqn4.png?pub-status=live)
where $(x_0, y_0)$ is the coordinate pair of the original trajectory, $(x_c, y_c)$
is the coordinate pair of the compressed trajectory and $f_{AG}$
is the grating compression algorithm.
Then the performance corresponding to the threshold is evaluated. When we take the compression error as an example, the area error is used as the performance index. Different from the length error, the area error refers to the area of all closed areas formed by the intersection of the compressed and uncompressed trajectories, which can effectively avoid the possibility of reflecting the difference when the starting point and the ending point of the trajectories are very close to each other. The equation of area error is shown as Equation (5):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_eqn5.png?pub-status=live)
where $\varepsilon _i$ is the actual error corresponding to the candidate threshold $T_i$
and $f_P$
is the error function.
Another commonly used performance index is the data compression ratio and the equation is as follows:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_eqn6.png?pub-status=live)
where $r$ is the data compression ratio, $N(x_0, y_0)$
is the number of points in the uncompressed trajectories and $N(x_c, y_c)$
is the number of points in the compressed trajectories.
2.2.4 Performance comparison and approximation
If $\varepsilon _i< E$, that is, the compression performance index meets the requirements, then the corresponding candidate threshold is the adaptive threshold. However, if $\varepsilon _i>E$
, $T_i$
and $\varepsilon _i$
will be taken as the new $T_{\textrm {max}}$
and $E_{\textrm {max}}$
, forming a new linear relationship with $T_{\textrm {min}}$
and $E_{\textrm {min}}$
, to continue to approach the desired threshold until the candidate meets the requirements. The diagram of threshold approximation is shown in Figure 3(a).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig3.png?pub-status=live)
Figure 3. Schematic diagram of threshold approximation
However, there is an issue. If the relationship between the threshold and the compression error is the concave function shown as Figure 3(a), it would be difficult to approximate the error which is less than $E$. This issue may lead to numerous or even endless iterations of approximation. In order to solve this problem, we propose to use $E'$
instead of $E$
as input, let:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_eqn7.png?pub-status=live)
where $\forall k\in [ 0,1)$. The larger the value of $k$
, the slower the approximation speed, but the smaller the deviation between the outcome error and expected error $E$
; and vice versa. Here, $E$
is still used as the basis for judging, as shown in Figure 3(b). Figure 3(a) shows that, after 3 times of approximation, the errors are $E_0$
, $E_1$
and $E_2$
, respectively, where $E_0>E_1>E_2$
. Yet the value of $E_2$
is still greater than $E$
, the approximate termination condition is not met, so the operation continues. In Figure 3(b), after 2 approximations, the errors are $E_0$
and $E_1$
, where $E_0>E_1$
. The value of $E_1$
is less than $E$
, which means that the approximate termination condition is satisfied and the operation can be stopped. Therefore, comparing the schematic diagrams of Figures 3(a) and 3(b), it can be seen that using $E'$
instead of $E$
can improve the efficiency of the approximation iteration and prevent endless iteration.
2.2.5 Proposed DAG compression algorithm
Our proposed DAG compression algorithm takes the user's expectation for compression performance as a guide. This guide is then used to adaptively derive an appropriate threshold, and then uses the threshold for ship trajectory compression. The pseudocode of the algorithm is as follows:
DAG compression algorithm
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_tabU1.png?pub-status=live)
According to the steps of the algorithm, the computational complexity of the algorithm can be estimated; the big-O notation is generally used to estimate the computational complexity (Vaz et al., Reference Vaz, Shah, Sawhney and Deolekar2017). Because complexity of grating algorithm is O$( n)$, the complexity of the DAG compression algorithm is estimated as Equation (8):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_eqn8.png?pub-status=live)
where $T(n)$ is the complexity of the DAG compression algorithm, $n$
represents the points in the trajectory dataset, $m$
represents the times of iteration for approximation and O represents the order of magnitude.
3. Test and compression results
3.1 Ship trajectories dataset for testing
To test the effectiveness of our proposed DAG compression algorithm and compare the performance with the traditional grating algorithm, a historical AIS messages record of ship trajectories in the Qiongzhou Strait is used as the test dataset. There are 79,934 points in the dataset, as shown in Figure 4.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig4.png?pub-status=live)
Figure 4. The uncompressed ships’ trajectories points in the Qiongzhou Strait
These points are tagged with a Maritime Mobile Service Identity (MMSI) number, which is a unique 9-digit number that is assigned to an AIS unit. Because AIS equipment is installed on each ship, the MMSI number is unique to each ship. Then the points are classified by MMSI number and divided into 200 trajectories. The points of each trajectory are connected in time order, the result is shown in Figure 5.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig5.png?pub-status=live)
Figure 5. The uncompressed trajectories of ships in the Qiongzhou Strait
In the dataset, different shapes of trajectories are contained to increase the diversity of the sample. One-hundred trajectories with 41,624 points are relatively straight trajectories, running nearly east and west, shown as the light blue lines in Figure 5. Another 100 trajectories with 38,310 points are curve trajectories, turning or tortuous, shown as the blue lines in Figure 5. All of these trajectories are used for the test.
3.2 Compression results and comparison
To obtain the compression results for our proposed DAG algorithm and the traditional grating algorithm, we set the threshold as 0·0138 n miles for the traditional grating compression algorithm, 3,950 compressed points are left out of the 79,934 uncompressed points, so the overall compression ratio of the dataset is approximately 20. The compression result is shown as Figure 6(a), in which the yellow lines represent the relatively straight trajectories and the red lines correspond to the curve trajectories. Accordingly, Figure 6(b) is the compression result of the DAG compression algorithm when the compression ratio is 20, in which the light green lines represent the relatively straight trajectories and the green lines represent the curve trajectories.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig6.png?pub-status=live)
Figure 6. The compression results: (a) traditional grating compression; (b) DAG compression
Figure 6 shows that the overall compression results of these two compression algorithms are similar, but a closer look reveals some differences in the compression points. In fact, they are not only different in the distribution of compression points, but also in the threshold(s) as well as in the compression errors. The average compression error of the traditional grating compression algorithm is 0·035 sq. n miles, while the error is 0·029 sq. n miles for the DAG compression algorithm. Additionally, the maximum compression error of the traditional grating compression algorithm is 0·056 sq. n miles, while the error is 0·033 sq. n miles for the DAG compression algorithm. Therefore, when they are of the same compression ratio, the DAG compression algorithm has a lower error than the traditional grating algorithm both in the average error and the maximum compression error.
Because the compression trajectories in Figure 6 are a bit too dense to distinguish the overlaps, 2 curves and 2 straight trajectories are taken out from the dataset to better illustrate the difference of these two algorithms. The compression results are shown in Figure 7.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig7.png?pub-status=live)
Figure 7. The compression results: (a) curve trajectories compressed by the traditional compression algorithm; (b) straight trajectories compressed by the traditional compression algorithm; (c) curve trajectories compressed by the DAG algorithm; (d) straight trajectories compressed by DAG algorithm
Figure 7 shows that in the results of the traditional algorithm, the points of the curve trajectories are significantly denser than the points of the straight trajectories. This is mainly because the curve trajectories and the straight trajectories are of the same compression threshold, regardless of the tortuous nature of the trajectories. For the traditional compression algorithm, although the overall compression ratio is 20 and the average compression error 0·035 sq. n miles, the compression ratio of the 2 curve trajectories is 6 with an average error of 0·019 sq. n miles; and the compression ratio of the 2 straight trajectories is 37 with an average error of 0·044 sq. n miles. Hence, the compression statistics of straight trajectories and curves vary considerably.
In contrast, the proposed DAG algorithm considers the compression requirement or expectation (here the compression error is close to 0·03 sq. n miles), and assigns an adaptive threshold to each trajectory. For the curve trajectories, their thresholds are 0·018 and 0·017, respectively; for the straight trajectories, their thresholds are 0·009 and 0·01, respectively. Therefore, the compression ratio of 2 curve trajectories is 19 with an average compression error of 0·027 sq. n miles, and the compression ratio of 2 straight trajectories is 31 with an average compression error of 0·029 sq. n miles. Hence, the difference of straight trajectories and curves is relatively smaller than the traditional compression algorithm.
Figures 6 and 7 are the results of a compression ratio equal to 20. To test the performances in a wider range of compression ratios, more tests were conducted. The results are shown in Figure 8, where the compression ratios are from approximately 1·5 to 50. Every point in the curve of Figure 8 represents a statistic of the compression of an overall 200 trajectories. Because each trajectory has its compression error, to extract the statistical value of error, the average error and maximum error are used respectively for the comparison. Figure 8(a) is the ‘compression ratio–average error’ curve of the proposed DAG compression algorithm and the traditional grating compression algorithm, and Figure 8(b) is ‘compression ratio–maximum error’ curve.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig8.png?pub-status=live)
Figure 8. Compression ratio–error curves: (a) average error; (b) maximum error
Figure 8 shows that when the compression ratio is less than 10, the performances of these two algorithms are similar, but when the compression ratio is greater than 10, the proposed DAG algorithm has lower errors than the traditional grating compression algorithm both in the average compression error and the maximum compression error. Additionally, the difference in maximum compression error is much more obvious, which is approximately 0·02 sq. n miles.
The main reason for the improvement in compression error is the adaptive thresholding. Instead of using the same threshold for all the trajectories, our proposed DAG compression algorithm assigns different thresholds to different trajectories. To better illustrate the specific performance of the DAG compression algorithm, 3 compression ratios (10, 20, 40) are selected from Figure 8 to demonstrate the performance of the algorithms. Figure 9 shows the distribution of adaptive thresholds of the trajectories under the 3 compression ratios.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig9.png?pub-status=live)
Figure 9. The distribution of adaptive thresholds: (a) overall compression ratio = 10; (b) overall compression ratio = 20; (c) overall compression ratio = 40
Figure 9(a) shows the distribution of adaptive thresholds of the DAG compression algorithm under the compression ratio of 10, where the corresponding fixed threshold for the traditional grating compression algorithm is 0·0072 n miles. From Figure 9(a), it can be seen that the thresholds of relatively straight trajectories are generally less than the thresholds of curve trajectories. For the straight trajectories, the threshold of the peak bar is between 0·005 and 0·0055 n miles, the span of the thresholds is from 0·0035 to 0·0095 n miles; for the curve trajectories, the threshold of the peak bar is between 0·007 and 0·0075 n miles, the span of the thresholds is from 0·004 to 0·0145 n miles; for all trajectories, the threshold of the peak bar is between 0·005 and 0·0055 n miles, and overall span is from 0·0035 to 0·0145 n miles. Therefore, the peak threshold of curve trajectories is greater than the peak threshold of straight trajectories and the thresholds of curve trajectories have a wider span than those of the straight trajectories. Additionally, the fixed threshold of the traditional grating compression algorithm is close to the peak threshold for curve trajectories of the DAG compression algorithm.
Figure 9(b) shows the distribution of adaptive thresholds of the DAG compression algorithm under the compression ratio of 20, where the corresponding fixed threshold for the traditional grating compression algorithm is 0·0138 n miles. For the straight trajectories, the threshold of the peak value is between 0·0096 and 0·0104 n miles, the span of the thresholds is from 0·0064 to 0·0192 n miles; for the curve trajectories, the threshold of the peak value is between 0·012 and 0·0128 n miles, and the span is from 0·0072 to 0·0232 n miles. For all trajectories, the threshold of peak is between 0·014 and 0·0112 n miles and the span is from 0·0064 to ·0232 n miles. Therefore, the peak threshold of curve trajectories is greater and the span of the thresholds of curve trajectories is slightly longer than those of the straight trajectories. The fixed threshold of the traditional grating algorithm is somewhat greater than the peak value of the curve thresholds.
Figure 9(c) shows the distribution of adaptive thresholds of the DAG compression algorithm under the compression ratio of 40, where the corresponding fixed threshold for the traditional grating compression algorithm is 0·0273 n miles. For the straight trajectories, the threshold of the peak value is between 0·021 and 0·0225 n miles and the span is from 0·0015 to 0·0495 n miles; for the curve trajectories, the threshold of the peak value is between 0·0285 and 0·03 n miles, and span is from 0·018 to 0·048 n miles. For all the trajectories, the threshold of peak value is between 0·024 and 0·0255 n miles, and the span is from 0·015 to 0·0495. Therefore, the peak threshold of curve trajectories is greater than the corresponding value of straight trajectories, but the span of thresholds of straight trajectories is longer than that of curve trajectories and equal to the threshold span of all trajectories. The fixed threshold of the traditional grating algorithm is close to the peak value of curve thresholds.
From Figure 9, it can be concluded that the peak value of straight trajectories thresholds is less than the peak value of curve trajectories threshold. The span of straight trajectories thresholds is narrower than the span of curve trajectories when compression is 10, but it becomes longer with the increase of compression ratio. Additionally, the fixed threshold in the traditional grating compression algorithm is close to the peak value of curve trajectories in the DAG compression algorithm when they are of the same overall compression ratio. This means in the traditional grating compression algorithm, the straight trajectories use a greater threshold than the corresponding trajectories of the DAG compression algorithm, which will cause a great error for straight trajectories. However, for some threshold of curve trajectories, the DAG compression algorithm assigns greater thresholds than the traditional grating compression algorithm, which will also cause some greater errors but within the requirement.
The compression errors graphs of every trajectory in the compression are listed in Figure 10.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig10.png?pub-status=live)
Figure 10. Compression errors graphs: (a) overall compression ratio = 10; (b) overall compression ratio = 20; (c) overall compression ratio = 40
In each graph of Figure 10, the first 100 labels of trajectories (No. 1~100) correspond to straight trajectories, the second 100 labels (No. 101~200) represent curve trajectories. From all the graphs in Figure 10, it can be seen that the errors of traditional grating compression algorithms are generally greater than the DAG compression algorithms, especially for straight trajectories. This is mainly because the straight trajectories thresholds in the DAG compression algorithms are smaller than those in the traditional algorithms. For curve parts, some errors of the traditional grating compression results are greater than the DAG compression results, and some errors are smaller. This is mainly due to some adaptive thresholds of curve trajectories are less than the fixed threshold while some are greater. For the errors of the traditional grating algorithm, the errors of straight trajectories are greater than the errors of curve trajectories, mainly because both straight trajectories and curve trajectories use the same threshold for compression, and the threshold is somewhat large for the straight trajectories but small for the curve trajectories. However, for the errors of the DAG compression algorithm, errors of straight trajectories are nearly equivalent to the errors of curve trajectories, especially for the maximum values, and they are within the desired error.
To analyse the relationship between compression ratio and compression error of each trajectory in the dataset, the ‘compression ratio–error’ scatter diagrams are shown in Figure 11. Figures 11(a), 11(b), 11(c) correspond to the overall compression ratio equal to 10, 20, 40, respectively.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220316021422219-0112:S0373463321000692:S0373463321000692_fig11.png?pub-status=live)
Figure 11. ‘Compression ratio–error’ scatter diagrams: (a) overall compression ratio = 10; (b) overall compression ratio = 20; (c) overall compression ratio = 40
In Figure 11, each point in the scatter diagram represents a trajectory in the dataset, where each colour of points specifies a specific category of trajectories compressed by an algorithm. It can be seen that the distribution of the green points is more concentrated than the red points, which means that the compression errors of curve trajectories compressed by the DAG algorithm are smaller than the errors of curve trajectories compressed by the traditional grating algorithm. Besides, in Figure 11(a), most of the green points are also more concentrated in abscissa, which means the compression ratios of curve trajectories in the DAG algorithm are close to a specific compression ratio, while the compression ratios of curve trajectories in traditional grating algorithms disperse in a wider range. The concentricity of green points becomes dispersive with the increase of compression ratio, as shown in Figures 11(b) and 11(c), but are still more concentrated than the red points especially in ordinate. Therefore, for the curve trajectories, the compression errors of the DAG algorithm are generally smaller than those of the traditional grating algorithm, and the compression ratio–error relationship is much more clustered in the DAG compression than in the traditional grating compression, which means the compression results of each trajectory in the dataset is more similar in the DAG compression.
The blue points and yellow points in the scatter diagrams of Figure 11 represent the compression results of straight trajectories by the DAG compression algorithm and the traditional compression algorithm, respectively. It can be learned from Figure 11 that the blue points are lower than the yellow points, which means the compression error of the DAG compression algorithm for each trajectory is generally smaller than that of the traditional grating compression algorithm. Most of the blue points in Figure 11(a) are located in a smaller range in abscissa than the yellow points, while this characteristic becomes less obvious as the compression rate increases, as shown in Figures 11(b) and 11(c). This is because in the DAG compression, the straight trajectories use smaller thresholds than the fixed threshold of the traditional grating compression, and then most of the compression ratios of the DAG compression are not as large as those of the traditional grating compression. As the overall compression ratio increases, especially to 40, the dispersions in abscissa of these two algorithms have less difference, but the DAG compression still has smaller error both in average error and maximum error.
4. Conclusion
A DAG compression algorithm was proposed in this paper, with consideration of the expectation or requirement for the compression performance and with an adaptive thresholding module to dynamically obtain the suitable threshold for each trajectory. The DAG algorithm appears to have a better compression than the traditional grating compression algorithm.
This algorithm has a good universality for ship trajectories. It can deal well with the situation where the dataset contains trajectories with different shapes. Through the test on a complex dataset including both relatively straight trajectories and curves of various shapes and directions, our DAG compression dynamically assigns a relatively lower threshold to straight trajectories than the curve trajectories, which makes the compression more balanced and performance more concentrated, rather than the straight trajectory having a greater error while the curve having a much smaller error or the errors of trajectories being randomly dispersed over a wide range as observed with the traditional grating compression algorithm.
The automation feature of the algorithm simplifies the process of operation, saving time and effort for the operator having to repeatedly test to select the threshold. By considering an optimisation strategy in determining the threshold, the complexity of the DAG compression algorithm is maintained as O$( n)$, which does not lead to an obvious increase in the computation complexity. In the future, the application of the algorithm can be further promoted in practice, for example, a practical toolbox can be developed so as to facilitate data visualisation, mapping and data analysis.
Acknowledgment
The authors greatly appreciate the work of data processing and language help done by researchers from Arizona State University and Wuhan University of Technology.
Funding statement
This research was funded by Natural Science Foundation of Hubei Province (grant number 2019CFB339) and China Scholarship Council (CSC No. 202006955031).