Subspace-based estimation of time of arrival and Doppler shift for a signal of known waveform

V.V. Latyshev

doi:10.1017/S175907870900021X

Subspace-based estimation of time of arrival and Doppler shift for a signal of known waveform

Published online by Cambridge University Press: 15 May 2009

V.V. Latyshev

Show author details

V.V. Latyshev*: Affiliation:
Moscow Aviation Institute (State Technical University), Moscow, Russia
*: Corresponding author: V.V. Latyshev Email: lvv@mai.ru

Article contents

Abstract
INTRODUCTION
DATA DIMENSION REDUCTION
NUMERICAL EXAMPLE
SUBSPACE-BASED MAXIMUM-LIKELIHOOD (ML) ESTIMATION
INDEPENDENT TIME ARRIVAL AND DOPPLER SHIFT ESTIMATIONS
CONCLUSIONS
References

Rights & Permissions

Abstract

The subspace-based technique is used for the estimation of the time of arrival and Doppler shift of a signal of known waveform. The tool to find required subspaces is a special orthogonal decomposition of received data. It allows one to concentrate Fisher information on the desired parameter in just a few of the first terms of the decomposition. This approach offers a low-dimensional vector of sufficient statistics. It leads to computationally efficient Bayesian estimation. Besides, it results in expansion of the signal-to-noise ratio (SNR) range for effective maximum likelihood (ML) estimation. Finally, we can obtain independent time arrival and Doppler shift estimations based on generalized eigenvectors.

Keywords

Time of arrival Doppler shift estimates subspace-based technique

Type: Original Article
Information: International Journal of Microwave and Wireless Technologies , Volume 1 , Special Issue 3: Surveillance systems for air and airport traffic control , June 2009 , pp. 209 - 214

DOI: https://doi.org/10.1017/S175907870900021X [Opens in a new window]
Copyright: Copyright © Cambridge University Press and the European Microwave Association 2009

I. INTRODUCTION

Dimension reduction techniques are often applied as a data pre-processing step or as a part of the data analysis to simplify a data model. This typically involves identification of a suitable low-dimensional representation for the original high-dimensional data set or a subspace of an observation space. By working with this reduced representation, tasks such as classification or estimation can often yield more accurate and readily interpretable results, while computational costs may be significantly reduced.

In mathematical terms, the problem we consider can be stated as follows: given the N-dimensional random variable x = (x ₁, x ₂, … , x _N)^T, find a lower dimensional representation of it, y = (y ₁, y ₂, … , y _m)^T with m < N, that captures the content in the original data, according to some criterion.

Among the linear dimension reduction techniques, principal component analysis (PCA) is the best in the mean-square error sense [Reference Jolliffe1]. In various fields, it is also known as the singular value decomposition (SVD) and the Karhunen–Loeve transform. In essence, PCA seeks to reduce the dimension of the data by finding a few orthogonal linear combinations (the PCs) of the original variables. The first PC y ₁ is the linear combination with the largest variance. The second PC y ₂ is the linear combination with the second largest variance and orthogonal to the first PC, and so on. There are as many PCs as the number of original variables. For many data sets, the first several PCs explain most of the variance, so that the rest can be disregarded with minimal loss of information.

Here we would like to use dimension reduction techniques in a signal parameter estimation problem. We hope to obtain a similar PCA technique with a different criterion.

II. DATA DIMENSION REDUCTION

The estimation problem, which we consider here, can be phrased as follows: given x(t) = s(t, θ)+w(t), where s(t, θ), 0 ≤ t ≤ T is a signal of the known waveform to be observed in the presence of an additive Gaussian noise process w(t) with zero mean. We have an a priori probability density p(θ) in the random parameter estimation problem. In all of our discussions we assume that p(θ) is known. We need an estimation parameter θ of a signal. In general, the variable θ appears in a signal in a nonlinear manner.

We assume that the observation space corresponds to the set of N observations: x ₁, x ₂, … , x _N. Thus, each set can be thought of as a point in an N-dimensional space and can be denoted by a column vector x = s(θ)+w, where s(θ) ∈ R ^N and w ∈ R ^N are the N-dimensional vectors of a signal and a noise correspondingly. Vector w has a nonsingular covariance matrix R_w. Then the probability density of x is

(1)

$\eqalignno{p\lpar {\bf x}\vert \theta\rpar &= \lpar \lpar 2\pi\rpar ^N \vert {\bf R}_w\vert\rpar ^{1/2} \exp \cr &\quad \times \left[-{1 \over 2}\lpar {\bf x} - {\bf s}\lpar \theta\rpar \rpar ^T {\bf R}_w^{-1}\lpar {\bf x} - {\bf s}\lpar \theta\rpar \rpar \right].&}$

To obtain the m-dimensional vector y with m < N, we use linear transformation y = Cx with the transformation matrix C . We need such a matrix C that guarantees minimal loss of estimation accuracy of a parameter θ, using vector y. In addition to the foregone requirement we try to represent x in a new coordinate system in which the components are statistically independent random variables: CRC^T = I, where I is a diagonal identity matrix. It is convenient to write the transformation matrix as C = AR^−1/2_w. Here R^−1/2_w is a symmetric square root from R⁻¹_w(R^−1/2_wR^−1/2_w = R⁻¹_w),

A^T = (a₁, … , a_m), AA^T = I. So we have

(2)

${\bf y} = {\bf AR}_w^{-1/2}{\bf x}\comma \; \quad {\bf y}\in R^m.$

First of all, recall that the variance of any unbiased estimate of an arbitrary parameter θ is determined from the Cramer–Rao inequality [Reference Van Trees2]:

(3)

$\sigma_\theta^2\ge\left(\left[{{\partial\ln p\lpar {\bf x}\vert\theta\rpar }\over{\partial\theta}}\right]^2\right)^{-1} = \left(\left[-{{\partial^2\ln p\lpar {\bf x}\vert\theta\rpar }\over{\partial\theta^2}}\right]\right)^{-1}.$

The inequalities are defined if the derivatives involved are existent and absolutely integrable. Recall that the denominator of the right-hand sides of CRI is usually referred to as the Fisher information on parameter θ:

(4)

$I\lpar \theta\rpar = {\left[{{\partial\ln p\lpar {\bf x}\vert\theta\rpar }\over{\partial\theta}}\right]}^2 = -{{\partial^2\ln p\lpar {\bf x}\vert\theta\rpar }\over{\partial\theta^2}}.$

Inserting (1) in (3) we have

(5)

$I_N\lpar \theta\rpar = \lpar {\bf s}^\prime\lpar \theta\rpar \rpar ^T{\bf R}_w^{-1}{\bf s}^\prime\lpar \theta\rpar \comma \;$

where subscript N is used to distinguish the initial dimension of the observation from a new reducing dimension m, and ${\bf s}^\prime\lpar \theta\rpar = \lpar \lpar {{\partial s_1\lpar \theta\rpar }/{\partial\theta}}\rpar \comma \;\ldots\comma \; \lpar {{\partial s_N\lpar \theta\rpar }/{\partial \theta}}\rpar \rpar ^T$ is the column vector of derivatives.

The Fisher information in the vector y [Reference Latyshev3] is

$I_m\lpar \theta\rpar = \sum_{k = 1}^m{\left[{\bf a}_k^T{\bf R}_w^{-1/2}{\bf s}^\prime\lpar \theta\rpar \right]}^2.$

The loss of the Fisher information is

$\eqalign{\Delta I\lpar \theta\rpar &= I_N\lpar \theta\rpar - I_m\lpar \theta\rpar \cr &=\lpar {\bf s}^\prime\lpar \theta\rpar \rpar ^T{\bf R}_w^{-1}{\bf s}^\prime\lpar \theta\rpar - \sum_{k = 1}^m{\left[{\bf a}_k^T{\bf R}_w^{-1/2}{\bf s}^\prime\lpar \theta\rpar \right]}^2.}$

The mean of the loss of the Fisher information is

(6)

$\Delta I = E_\theta\left\{\lpar {\bf s}^\prime\lpar \theta\rpar \rpar ^T{\bf R}_w^{-1}{\bf s}^\prime\lpar \theta\rpar - \sum_{k = 1}^m{\left[{\bf a}_k^T{\bf R}_w^{-1/2}{\bf s}^\prime\lpar \theta\rpar \right]}^2\right\}\comma \;$

where E _θ denotes an expectation over the random variable θ. Thus, we need the transformation matrix (Reference Van Trees2) which provides minimal value of ΔI.

Theorem: The linear transformation with the matrix AR_w^−1/2 provides minimal mean of the loss of Fisher information ΔI, if the column vectors a₁, … , a_m of A^T are the orthonormal eigenvectors of

(7)

${\bf B} = {\bf R}_w^{-1/2}E_\theta\left\{{\bf s}^\prime\lpar \theta\rpar \lpar {\bf s}^\prime\lpar \theta\rpar \rpar ^T\right\}{\bf R}_w^{-1/2}\comma \;$

corresponding to m largest eigenvalues. At the same time

(8)

$\Delta I_{\rm min} = \lambda_{m + 1} + \cdots + \lambda_N\comma \;$

where $\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_N$ are the eigenvalues of B.

Proof: Rewrite (3) in the next form

$\Delta I = E_\theta\left\{\lpar {\bf s}^\prime\lpar \theta\rpar \rpar ^T{\bf R}_w^{-1}{\bf s}^\prime\lpar \theta\rpar \right\}- E_\theta \left\{\sum_{k = 1}^m{\left[{\bf a}_k^T{\bf R}_w^{-1/2}{\bf s}^\prime\lpar \theta\rpar \right]}^2\right\}.$

The first term does not depend on a_k. Therefore, we find the minimal value of ΔI if the subtrahend is maximal. Denote it as H(a₁, … , a_m). Inverting averaging with summation and taking into account the equality

${\left[{\bf a}_k^T{\bf R}_w^{-1/2}{\bf s}^\prime\lpar \theta\rpar \right]}^2 = {\bf a}_k^T{\bf R}_w^{-1/2}{\bf s}^\prime\lpar \theta\rpar \lpar {\bf s}^\prime\lpar \theta\rpar \rpar ^T{\bf R}_w^{-1/2}{\bf a}_k\comma \;$

we have

$H\lpar {\bf a}_1\comma \;\ldots\comma \; {\bf a}_m\rpar = \sum_{k = 1}^m{\bf a}_k^T\left[{\bf R}_w^{-1/2}E_\theta\left\{{\bf s}^\prime\lpar \theta\rpar \lpar {\bf s}^\prime\lpar \theta\rpar \rpar ^T\right\}{\bf R}_w^{-1/2}\right]{\bf a}_k.$

The expression in brackets is a symmetric matrix:

${\bf B} = {\bf R}_w^{-1/2}E_\theta\left\{{\bf s}^\prime\lpar \theta\rpar \lpar {\bf s}^\prime\lpar \theta\rpar \rpar ^T\right\}{\bf R}_w^{-1/2}.$

In compliance with the theorem about eigenvalues and eigenvectors [Reference Fukunaga4], the maximal value of H (a₁, … , a_m) takes place if a₁, … , a_m are the orthonormal eigenvectors of the matrix B, corresponding to m largest eigenvalues $\lambda_1\gt\lambda_1\gt\cdots\gt\lambda_m$ and

$\hbox{max}\, H\lpar {\bf a}_1\comma \;\ldots\comma \; {\bf a}_m\rpar = \sum_{k = 1}^m \lambda_k.$

The equality ${\bf R}_w^{-1/2}{\bf R}_w^{-1/2} = {\bf R}_w^{-1}$ implies

$tr\, {\bf B}=E_\theta\left\{\lpar {\bf s}^\prime\lpar \theta\rpar \rpar ^T{\bf R}_w^{-1}{\bf s}^\prime\lpar \theta\rpar \right\}.$

On the other hand, $tr\, {\bf B} = \sum_{k=1}^N\lambda_k$ . It implies $\Delta I_{\rm min} = \sum_{k = m +1}^N \lambda_k$ .

Note that we can assert that a subspace spanned by the column vectors a₁, … , a_m is the m-dimensional subspace of an observation space with maximal Fisher information content concerning the parameter θ among any other m-dimensional subspaces.

III. NUMERICAL EXAMPLE

To illustrate the advantages of the dimension reduction technique in signal parameter estimation, let us consider the Doppler shift problem. We model the next signal

(9)

$x\lpar t\rpar = s\lpar t\comma \; f_D\rpar + w\lpar t\rpar .$

Here s(t, f _D) is a signal of known form to be observed in the presence of an additive white Gaussian noise process w(t) with zero mean and variance σ_w². The time delay is known and we wish to estimate the Doppler frequency shift f _D only. Note that for the radar case, the standard narrowband assumption is employed here, i.e. the frequency offset f _D is a narrowband approximation to the stretching or shrinking of the frequency axis due to the Doppler effect induced by the relative motion of the reflecting target.

The quantity f _D is a random parameter with a uniform a priori density p ₀(f _D), f _D ∈[−F;F], so we may use the Bayesian estimation procedure. However, it is very complicated to put it into practice. In this example, our goal is to simplify the Bayesian estimate for the cases when a cost function is the square of the error. For the problem at hand, we have to capture Fisher information concerning Doppler shift after data dimension reduction.

To illustrate our results we use the FM signal with T · Δf = 32:

(10)

$\eqalign{s\lpar t\comma \; F_D\rpar &= E\, \cos\left(\lpar \omega_0 - 2\pi F_D\rpar t + {{\pi\Delta f}\over T}t^2\right)\comma \; \cr &\quad -{T \over 2}\le t \le {T \over 2}.}$

A set of N observations, x ₁, x ₂, …, x _N, consists of statistically independent values x _k with variance σ_w². So we have a column vector x = s(f _D)+w ∈ R ^N with the nonsingular covariance matrix R_w = σ_w²I.

The Bayesian estimate uses the a posteriori density:

(11)

$p\lpar f_D\vert{\bf x}\rpar = {{p\lpar {\bf x}\vert f_D\rpar p_0\lpar f_D\rpar } \over {\vint_{-F}^F{p\lpar {\bf x}\vert f_D\rpar p_0\lpar f_D\rpar \, df_D}}}.$

The estimate is the mean of the a posteriori density (or the conditional mean):

(12)

$f_D\lpar {\bf x}\rpar = \vint_{-F}^F f_Dp\lpar f_D\vert{\bf x}\rpar \, df_D.$

The simplification of the Bayesian estimate may be achieved using linear transformation with matrix A to provide minimal mean loss of Fisher information about Doppler shift. From the foregoing theorem the column vectors a₁, … , a_m of A^T are the orthonormal eigenvectors of the matrix

${\bf B} = {1 \over \sigma_w^2}E_\theta\left\{{\bf s}^\prime\lpar f_D\rpar \lpar {\bf s}^\prime\lpar f_D\rpar \rpar ^T\right\}\comma \;$

corresponding to m largest eigenvalues. Here s′ (f _D) is the column vector of derivatives with respect to f _D.

If we use the vector y = Ax, y ∈ R ^m, the Bayesian estimate is the same as in (11) and (12), where x is replaced by y.

For the simulations T = 1, N = 512, f _D is a random parameter with the uniform a priori density f _D ∈ [−0.4; 0.4]. The mean squared error (MSE) of the Doppler shift estimate was calculated for each of the algorithms based on 10 000 Monte Carlo trials for various signal-to-noise ratio (SNR) values. The results are plotted in Fig. 1 together with the appropriate Cramer–Rao bound (CRB). The demonstrated results correspond to the Bayesian estimate using initial 512-dimensional vector x and results based on five-dimensional and 10-dimensional vectors y.

Fig. 1. MSE of the Bayesian estimate using initial (512-lower thin line) and reduced dimensional vectors (5, 10).

As can be seen from Fig. 1, the 512-dimensional and the 10-dimensional data sets are equivalent with respect to estimation accuracy. At the same time the computational complexity of the Bayesian algorithm is two orders of magnitude less in this example. This is due to the fact that we capture Fisher information concerning Doppler shift. Further, the effect of dimension reduction from 10 to 5 degrades accuracy noticeably.

IV. SUBSPACE-BASED MAXIMUM-LIKELIHOOD (ML) ESTIMATION

The theorem that is proved above allows us to reveal an m-dimensional subspace of the observation space containing Fisher information on the arbitrary parameter θ. If the informational contents of m-dimensional vector y and N-dimensional x are equal to one another, we may regard y as a vector of sufficient statistics in a sense that an estimate of θ based on y is characterized by the same CRB as the estimate based on the initial observation. Otherwise y may be considered as an approximate vector of sufficient statistics. The degree of approximation is determined by the sum (8).

Next, we consider an approach to the synthesis of subspace-based ML estimation algorithms providing for a higher accuracy of the results. Note that the estimate as obtained according to the ML principle in nonlinear estimation problems is asymptotically optimal for large SNR.

As an illustration, let us consider estimation of the time of arrival. Instead of (9) now we have

$x\lpar t\rpar = s\lpar t - \tau\rpar + w\lpar t\rpar$

with

(13)

$s\lpar t\rpar = 1 - \cos \left({{2\pi}\over T}t\right)\comma \; \quad 0\le t \le T.$

The unknown parameter is a signal delay τ with respect to the point on the time axis that is chosen as an original. It is necessary to estimate τ from the observation x(t).

We are taking into account that an estimation problem is usually solved either after a signal has been detected or simultaneously with signal detection. Therefore, below we assume that an expected signal has been detected and that its approximate position on the time axis is known. It is necessary to specify the signal's position in the presence of distortions. This assumption fits the real operation conditions of tracking equipment containing a discriminator that in each measurement cycle is tuned to a given reference value of a parameter to be measured.

Let us compare the information content of the vector y with different dimensions for this problem. We can take into account the results of calculation of Fisher information from [Reference Latyshev5, Reference Latyshev6] for signal (10) to be observed in the presence of an additive white Gaussian noise process w(t). Normalized graphs are shown in Figs. 2–4 for different dimensions of vector y and T = 1.

Fig. 2. Fisher information in observation (1) and in the output signal of an ML filter (2).

Fig. 3. Fisher information in observation (1) and in 2D vector (2).

Fig. 4. Fisher information in observation (1) and in 3D vector (2).

It is noteworthy that the first eigenvector a₁ coincides with the pulse response of the ML-filter which follows from the ML equation to within a constant [Reference Latyshev5]. Therefore, the solution of the ML equation corresponds to the first coordinate of the vector y. Its informational content is shown in Fig. 2. The graph in Fig. 2 can be interpreted as an informational explanation of the asymptotic optimal property of the ML estimate. The informational content of the observation x and of the one-dimensional subspace are close to each other for small τ only. In an estimation problem the estimate is close to the true value τ in case of high SNR and approximate position on the time axis is known within narrow a priori range (τ_min; τ_max). In general, if the SNR is not high, the 1D subspace is associated with large losses of Fisher information concerning the time delay. So the first coordinate of the vector y cannot be considered as sufficient statistic for the time delay estimation. Figure 2 shows the losses of Fisher information using the gray shading.

The rest of the coordinates of vector y yield an additional information channel that can be used to enhance the accuracy of estimation. In Figs. 3 and 4 we see informational contents of 2D and 3D subspaces, respectively. The losses are noticeably less. Therefore, we can consider 2D or 3D vectors y as an approximate vector of sufficient statistics within larger a priori range (τ_min; τ_max).

As long as we have an approximate vector of sufficient statistics, each of its components can be used for an independent estimation of the signal delay. In practice, this estimation procedure necessitates a multi-channel device providing for the formation of the corresponding vector components in its channels. Each of the channels contains a linear filter with an impulse response a_k as the corresponding eigenvector of (7).

This approach actually deals with an m-channel device, so that parameter τ is estimated in each channel using the ML estimator (MLE) (see Fig. 5).

Fig. 5. A multi-channel estimator of the time of arrival.

Therefore, the results in channels can be integrated in a weight adder (WA). Since an estimation procedure in each channel is nonlinear, it is difficult to exactly calculate its statistical characteristics. However, potential characteristics can be employed. As shown above, all channels are characterized by the mean of the Fisher information λ_k. Hence, the estimates in the channels can be characterized by the different CRBs for the error variance. The optimal integration in WA involves calculation of the average of the estimates obtained in the channels with the weights proportional to $\sqrt {\lambda_k}$ .

Simulation of this approach is illustrated in Fig. 6. For comparison, Fig. 6 shows the error variances obtained on the bases of 1D, 2D and 3D approximate vectors of sufficient statistics. Recall that the estimate for the case m = 1 coincides with the classic ML estimate. Each point of the plots has been obtained as a result of 10 000 estimation cycles with various white noise realizations. The initial dimension of the observation is equal to N = 1024. Here it is assumed that the approximate value of the delay is known within an a priori range (−T/4;T/4), T = 1. Thus, the domain of uncertainty is the half-duration of the signal.

Fig. 6. MSE of the estimates based on approximate vectors of sufficient statistics.

The simulation results show that the pulse's time of arrival may be estimated more accurately than that in the case when the classical ML approach is applied (m = 1). In the example considered, this effect is the result of averaging independent estimates of the time of arrivals in independent channels. As is known from estimation theory, the averaging effectively enhances the accuracy of the determination of various signal parameters. The method proposed in this paper allows application of the aforementioned effect of averaging.

V. INDEPENDENT TIME ARRIVAL AND DOPPLER SHIFT ESTIMATIONS

Consider the widespread Gaussian observation model. Let the N-dimensional data column vector be an additive mixture of the deterministic component and the distortion vector

(14)

${\bf x} = {\bf s}\lpar \tau\comma \; f_D\rpar + {\bf w}\comma \;$

where signal s(τ, f _D) depends on the two a priori unknown parameters. Assume that both parameters are statistically mutually independent. The parameter vector v = (τ, f _D) has a bounded domain of variation, where 2D probability density p ₀(τ, f _D) is specified. The distortion vector w consists of independent sample values, described by the Gaussian probability distribution with the zero mean and a nonsingular covariance matrix R_w = σ_w²I.

Assume that it is necessary to estimate τ from the observed vector x. In such a case we can consider f _D to be a nuisance parameter. To find an independent estimate of τ, it is desirable to obtain the statistics in the form of linear functions of vector x that are not affected by the nuisance parameter. We consider these statistics as invariant to variations of the nuisance parameter. It is also necessary to minimize a possible deterioration of the estimation accuracy, if τ is estimated using these statistics.

The problem formulated above may be solved on the basis of the following ideas. If the data or the statistics obtained from these data do not contain Fisher information on a certain parameter, it is impossible to estimate this parameter by the use of this statistic. This situation may be interpreted as an invariance to the changes of this parameter. On the other hand, when the observations are used to form a statistic or data set concentrating the complete Fisher information on a desired parameter, this parameter will be estimated with the same accuracy as that provided by all of the observations. Hence, it is necessary to have a tool that ensures control of the Fisher information content in the statistics obtained via linear transformations of the observations. The theorem proved above allows one to reveal the subspaces of an observation space containing Fisher information on specific scalar parameters τ or f _D. We use the next two matrices, which are similar to (7):

(15)

${\bf B}_\tau = E_{\rm v}\left\{{\bf s}^\prime_\tau\lpar \tau\comma \; f_D\rpar \lpar {\bf s}^\prime_\tau\lpar \tau\comma \; f_D\rpar \rpar ^T\right\}/\sigma_w^2\comma \;$

(16)

${\bf B}_f = E_{\rm v}\left\{{\bf s}^\prime_f\lpar \tau\comma \; f_D\rpar \lpar {\bf s}^\prime_f\lpar \tau\comma \; f_D\rpar \rpar ^T\right\}/\sigma_w^2.$

Here, differentiation is performed with respect to the parameters indicated by the indexes τ or f _D. E _v denotes an expectation over the vector random variable v = (τ, f _D).

In order to eliminate Fisher information on the nuisance parameter f _D from an observation, it is possible to obtain a decomposition of the observed signal using the eigenvectors of B_f. Then we remove the terms with nonzero eigenvalues, and sum the remaining terms.

A drawback of this approach is that the Fisher information on desired parameter τ is beyond transformations. We need the eliminated statistics with minimum Fisher information concerning τ. Therefore, retaining the idea of the method applied to find approximate invariant statistics by means of elimination of some terms from the orthogonal decomposition, we modify the approach. Note that the considered transformation of the observed vector x to the vector of decomposition coefficients actually converts B_τ into a diagonal matrix. Denoting the matrix that consists of the eigenvectors of (15) by A, we have AB_τA^T = D, where D is the diagonal matrix with the eigenvalues of B_τ. It is known that two matrices B_τ and B_f can be diagonalized simultaneously. According to [Reference Fukunaga4], this can be done using the single matrix A so that

(17)

${\bf AB}_f{\bf A}^T = {\bf I}\comma \; \quad{\bf AB}_\tau{\bf A}^T = {\bf D}\comma \;$

where D and A^T are chosen to be the matrices of the eigenvalues and the eigenvectors of the matrix B_f⁻¹B_τ, respectively. Simultaneously with performance of the above transformation, it is necessary to separate Fisher information on parameter τ from that on parameter f _D.

Following [Reference Fukunaga4], let us introduce the auxiliary matrix

(18)

${\bf Q} = {\bf B}_f + {\bf B}_\tau.$

Using matrix A, we can choose a linear transformation so that Q and B_τ are diagonalized simultaneously and the following conditions are fulfilled:

(19)

${\bf AQA}^T = {\bf I}, \quad{\bf AB}_\tau{\bf A}^T = {\bf D}^{\lpar 1\rpar }\comma \;$

where D⁽¹⁾ is the diagonal matrix with ordered entries $\lambda_1^{\lpar 1\rpar }\ge\lambda_2^{\lpar 1\rpar }\ge\cdots\lambda_N^{\lpar 1\rpar }\comma \;$ which are the eigenvalues of matrix B_τ. The substitution (18) into (19) yields

${\bf I} = {\bf A}\lpar {\bf B}_\tau + {\bf B}_f\rpar {\bf A}^T = {\bf D}^{\lpar 1\rpar } + {\bf AB}_f{\bf A}^T$

(20)

${\bf AB}_f{\bf A}^T = {\bf I} - {\bf D}^{\lpar 1\rpar }.$

Hence, it follows that matrices B_τ and B_f are diagonalized simultaneously under the above conditions. Besides, the eigenvalues of both matrices satisfy the relationship $\lambda_k^{\lpar 2\rpar } = 1 - \lambda_k^{\lpar 1\rpar }\comma \; \quad k = 1\comma \;\ldots\comma \; N$ . Thus, when B_τ and B_f are diagonalized simultaneously, the greatest eigenvalues of the first matrix correspond to the least eigenvalues of the second matrix and vice versa. An advantage of this approach is that the orthogonal decomposition based on matrix A yields an orthogonal series, whose terms are ranked simultaneously according to the degree of decrease of Fisher information. However, these terms are arranged in reciprocal order: Fisher information on the desired parameter is concentrated in the first decomposition terms and Fisher information on the nuisance parameter is concentrated in the last terms. Naturally, in this situation, elimination of terms with the greatest content of information on the nuisance parameter guarantees both minimal loss of Fisher information on the parameter to be estimated and minimal loss in the accuracy of estimation.

In such a way, on the basis of the generalized eigenvectors of the matrix pair (B_τ, B_f) we can divide the observation space into two mutually orthogonal subspaces containing Fisher information about τ and f _D, respectively. For example, to obtain an independent estimate of the time of arrival we have to remove Fisher's information about Doppler shift. For this purpose, we use the subspace with Fisher's information about τ only. Figures 7 and 8 illustrate this possibility for 7 bits M-sequences. Both figures show relief of |l(τ, f _D)|, where l(τ, f _D) is the logarithm of the likelihood function. Figure 7 corresponds to observation data without noise. We see here the local gap near the true values of τ and f _D. Figure 8 corresponds to |l(τ, f _D)| using the orthogonal projection of observation data onto the subspace with Fisher information concerning τ only. Here we see that the narrow canyon is definitely parallel to f _D axes. It implies that the true value of τ may be obtained independently from f _D. If τ is estimated, the next step is to estimate f _D using the subspace with Fisher information concerning f _D.

Fig. 7. Logarithm of the likelihood function for 7 bits M-sequences.

Fig. 8. Logarithm of the likelihood function in invariant to f _D subspace.

VI. CONCLUSIONS

Based on the Fisher information approach, a linear dimension reduction technique has been proposed. It is similar to PCA, but it makes good use of the minimal loss of Fisher information as a criterion. As a result, we obtain subspaces of an observation space concentrating Fisher information concerning interesting parameters of a signal. The projections of an observation onto these subspaces are the vectors of the sufficient statistics. Simulation performed with the use of the test signals has shown that the accuracy of the estimation is based on the ML method and can be improved if we use these vectors. The generalized eigenvectors of the matrix pair allow obtaining mutually orthogonal subspaces for independent estimation of two different parameters.

Viacheslav Latyshev received the diploma in electrical engineering from the Moscow Aviation Institute in 1971. In 1979, he received the Ph.D. degree in electrical engineering and D.Sc. degree in 1999. From 1971 to 1979, he has held research positions at the Department of Electrical Engineering, Moscow Aviation Institute. In 1985, he was appointed Professor of Signal Processing, where he is working currently. His research interests include stochastic signal processing, fast methods for signal parameter estimation, higher order statistics.

References

REFERENCES

[1]Jolliffe, I.T.: Principal Component Analysis, Springer-Verlag, 1986.CrossRef Google Scholar

[2]Van Trees, H.L.: Detection, Estimation, and Modulation Theory, Part 1, Wiley, New York, 2001, ch. 2.CrossRef Google Scholar

[3]Latyshev, V.: The optimum decrease of the dimensionality of data for estimating signal parameters. Sov. J. Commun. Technol. Electron., 33 (1988), 172–175.Google Scholar

[4]Fukunaga, K.: Introduction to Statistical Pattern Recognition, San Diego, Academic Press, 1990, ch. 2.Google Scholar

[5]Latyshev, V.: Informational analysis of statistics in time delay estimation problem, in Proc. IRS2007, Cologne, Germany, pp. 169–173.Google Scholar

[6]Latyshev, V.: Subspace-based estimation of time of arrival and Doppler shift for a signal of known waveform, in Proc.ESAV'08, Capri, Italy, pp. 94–99.Google Scholar