2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

Parameter Estimation of RBF-AR Model Based on the EM-EKF Algorithm

Yanhui Xi Hui Peng Hong Mo

Yanhui Xi, Hui Peng, Hong Mo. Parameter Estimation of RBF-AR Model Based on the EM-EKF Algorithm. ACTA AUTOMATICA SINICA, 2017, 43(9): 1636-1643. doi: 10.16383/j.aas.2017.e160216
Citation: Yanhui Xi, Hui Peng, Hong Mo. Parameter Estimation of RBF-AR Model Based on the EM-EKF Algorithm. ACTA AUTOMATICA SINICA, 2017, 43(9): 1636-1643. doi: 10.16383/j.aas.2017.e160216
席燕辉, 彭辉, 莫红. 基于EM-EKF算法的RBF-AR模型参数估计. 自动化学报, 2017, 43(9): 1636-1643. doi: 10.16383/j.aas.2017.e160216
引用本文: 席燕辉, 彭辉, 莫红. 基于EM-EKF算法的RBF-AR模型参数估计. 自动化学报, 2017, 43(9): 1636-1643. doi: 10.16383/j.aas.2017.e160216

Parameter Estimation of RBF-AR Model Based on the EM-EKF Algorithm

Funds: 

the National Natural Science Foundation of China 70921001

the National Natural Science Foundation of China 51425701

Hunan Province Science and Technology Program 2015NK3035

the National Natural Science Foundation of China 71271215

the National Natural Science Foundation of China 61233008

the National Natural Science Foundation of China 51577014

the National Natural Science Foundation of China 51507015

the Natural Science Foundation of Hunan Province 2015JJ3008

the National Natural Science Foundation of China 61773402

the Key Laboratory of Renewable Energy Electric-Technology of Hunan Province 2014ZNDL002

the National Natural Science Foundation of China 61540037

More Information
    Author Bio:

    Hui Peng received the Ph.D. degree at the School of Mathematical and Physical Science, the Graduate University for Advanced Studies, Japan, in 2003. He is a Professor at the School of Information Science and Engineering, Central South University. His research interests include predictive control, adaptive control, system modeling and identification, modeling and dynamic asset allocation in financial markets. E-mail: huipeng@mail.csu.edu.cn

    Hong Mo received the Ph.D. degree from Graduate University of Chinese Academy of Sciences in 2004. She is a Professor at the School of Electrical and Information Engineering, Changsha University of Sciences and Technology. Her research interests include linguistic dynamic systems, time-varying universe, and type-2 fuzzy sets. E-mail: mohong198@163.com

    Corresponding author: Yanhui Xi received the Ph.D. degree in control science and engineering from Central South University, China, in 2013. She is an Associate Professor at Changsha University of Science and Technology. Her research interests include system modeling and identification, nonlinear optimization, and traveling identification. Corresponding author of this paper.E-mail: xiyanhui@126.com

基于EM-EKF算法的RBF-AR模型参数估计

doi: 10.16383/j.aas.2017.e160216
基金项目: 

the National Natural Science Foundation of China 70921001

the National Natural Science Foundation of China 51425701

Hunan Province Science and Technology Program 2015NK3035

the National Natural Science Foundation of China 71271215

the National Natural Science Foundation of China 61233008

the National Natural Science Foundation of China 51577014

the National Natural Science Foundation of China 51507015

the Natural Science Foundation of Hunan Province 2015JJ3008

the National Natural Science Foundation of China 61773402

the Key Laboratory of Renewable Energy Electric-Technology of Hunan Province 2014ZNDL002

the National Natural Science Foundation of China 61540037

  • 摘要: 为了利用EKF(extended Kalman filter)算法对RBF-AR(radial basis function network-based autoregressive)模型进行参数估计,重构了RBF-AR模型的网络结构,将其变换成一种新型的广义径向基函数(radial basis function,RBF)神经网络.与典型三层RBF网络结构相比,该广义RBF网络增加了线性输出加权层.为了克服基于EKF神经网络学习算法由于噪声统计特性未知导致滤波发散或者滤波精度不高的问题,利用EM(expectation maximization)算法对RBF-AR模型噪声协方差矩阵进行估计.同时,通过EKF滤波实时估计RBF-AR模型参数(系统状态),EKF平滑过程得到了更加准确的期望估计.仿真结果显示,该方法用在此变形的RBF-AR模型结构中是有效的,特别在信噪比低的情况下,估计效果比SNPOM(structured nonlinear parameter optimization method)方法好,而且还能估计出噪声方差.F检验显示了两方法估计得到的标准偏差有显著性差异.
  • The radial basis function network-based state-dependent autoregressive (RBF-AR) model, which offers a very flexible structure for nonlinear time-series modeling, has been extensively studied. For example, Shi et al. [1] developed the RBF-AR model to reconstruct the dynamics of given nonlinear time series. Peng et al. [2] extended the RBF-AR model to the case where there are several exogenous variables (RBF-ARX model) for the system. Following this method, Gan et al. [3], [4] successively developed the locally linear radial basis function network-based autoregressive (LLRBF-AR) model and a gradient radial basis function based varying-coefficient autoregressive (GRBF-AR) model. The major feature of the RBF-AR (X) model which is superior to the black-box models based on general function approximations is that: the RBF-AR (X) model may provide some insights into the system dynamics due to its quasi-linear structure, whereas the general function approximations cannot.

    The identification of the RBF networks includes the choice of topology (e.g., the number of neurons) and estimation of all the parameters. For selecting the number of neurons, several methods have been proposed, e.g., the Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross validation. Therefore we must first have a good model parameter estimation method, and then we can repeat the method for models with different number of neurons, before finally selecting the best model. In this paper, the majority of our works focuses on determining the parameters of RBF networks.

    The estimation of the RBF-AR (X) model is a difficult task but is a key procedure for successfully applying the RBF-AR (X) model. Peng et al. [2] presented a structured nonlinear parameter optimization method (SNPOM) for the RBF-AR (X) model estimation. Researches [2], [5], [6] shown the SNPOM can optimize all of the model parameters and can also accelerate the computational convergence of the optimization search process. However, the primary shortcoming of the SNPOM is that it can be easily trapped in a local optimum because of an initial guess value. To overcome the problem of local minima, Gan et al. [7] proposed the hybrid evolutionary algorithm (EA)-SNPOM and simulation results indicated that the hybrid algorithm provides better results than either method alone (EA and SNPOM), but the complexity and running time increased substantially. Also, Gan et al. [8] proposed a variable projection (VP) approach to efficiently estimate the parameters of the RBF-ARX model. Although these identification methods for RBF-AR (X) have good estimation results, their performance may get deteriorated in case of measurement with noise interference.

    As Peng et al. [2] and Gan et al. [7] mentioned that RBF-AR (X) can be regarded as a more general nonlinear model than the RBF neural network and as a generalization of the RBF network. Furthermore, any kind of RBF and the RBF-ARX model parameter estimation procedure must include the selection of appropriate centers and scaling factors, and estimation of all the linear weights of the RBF networks in the model.

    Various derivative-based methods have been used to train RBF neural network, including gradient descent [9] and the well-known back propagation [10]. Although gradient descent training for RBF networks has proven to be much more effective than lots of conventional methods, it can be computationally expensive. An alternative to sequentially estimate RBF network is the extended Kalman filter (EKF)-based neural network sequential learning method [11]-[13]. The sequential EKF algorithm is a fast training algorithm, which takes all the network parameters (weights or weights and centers) as a state vector in state-space and adjusts them to adapt to measurements [14]. Because it makes use of second order statistics (covariance), the EKF is an improvement over conventional neural network estimation techniques, such as back-propagation and gradient descent training algorithm. However, a well known limitation of the EKF is the assumption of known a priori statistics to describe the measurement and process noise. Setting these noise levels appropriately often makes the difference between success and failure in the use of the EKF. Especially in environments where the noise statistics change with time, the EKF can lead to large estimation errors and even to divergence. Thus, the selection of the initial states, the measurement noise covariance matrix and the process noise covariance matrix are very important in the control of the convergence of the EKF learning algorithm. To overcome the difficulties above, a number of improved learning methods were proposed, such as EKFQ (the EKF algorithm with evidence maximization and sequentially updated priors) [15], [16], APNCPF (adaptive process noise covariance particle filter) [17], a hybrid EKF and switching PSO (particle swarm optimization) algorithm [18], and a particle filtering algorithm in combination with the kernel smoothing method [19]. The EKFQ and APNCPF algorithms employ adaptive noise parameters, but the problem of choosing the right window length, which is used to update noise covariance matrices, results in a regularization/tracking dilemma because of unknown prior knowledge. Moreover, the above algorithms do not consider the optimal initial values of parameters, which will lead to some imprecise information about the spread or the shape of the posterior [20].

    The expectation-maximization (EM) algorithm [21] was developed to learn parameters of statistical models in the presence of incomplete data or hidden variables. Lázaro et al. [22] and Constantinopoulos et al. [23] extended the EM learning strategy for estimation of the neural network weights, such as the multi-layer perception (MLP) and the Gaussian radial basis function (RBF) networks. Research results indicate that the EM method is effective and simple to obtain maximum likelihood (ML) estimates of the parameter and states.

    In this paper, the proposed expectation-maximization extended Kalman filter (EM-EKF) method, which combines the expectation-maximization and the extended Kalman filtering, is developed to identify the RBF-AR model. The method is more accurate as it involves extended Kalman smoothing, which provides a minimum variance Gaussian approximation to the posterior probability density function. To use the EM-EKF algorithm for state space learning, RBF-AR model is reconstructed as a general RBF network, which has additional linear output weight layer compared with the traditional three-layer RBF network. Thus, the general RBF network is represented by state-space model, and the centers and the weights of the general RBF network are treated as hidden state variables. The learning algorithm for RBF-AR model possesses advantages of both the expectation-maximization and of the extended Kalman filtering and smoothing. Specifically, the EM algorithm is utilized to estimate parameters of the state-space model, such as the measurement and process noise variance and the initial states, while the extended Kalman filtering and smoothing are used to estimate the approximate state distribution. The proposed algorithm simplifies the optimizing estimation of the maximum likelihood by making the expectation maximal, and EKF and smoothing process realize the more exact estimation of the expectation. This learning technique can improve the performance of the EKF-based neural network sequential learning method by estimating the noise variance and the initial states. The performance and effectiveness of our proposed method are evaluated by the Mackey-Glass time series through three cases.

    The contributions of this paper comprise two parts. First, RBF-AR model is reconstructed as a new type of general radial basis function (RBF) neural network, which makes it able to estimate the parameters of RBF-AR model using the EKF. Second, by combining the expectation maximization and extended Kalman filtering and smoothing process, the EM-EKF method is developed to estimate RBF-AR model, which can give joint state and parameter estimates.

    The structure of the remaining of this paper is as follows. Section 2 presents the state-space representation of the RBF-AR model. The EM-EKF method is developed to identify the RBF-AR model in Section 3. The cases studies are shown in Section 4. Finally, the paper is concluded in Section 5.

    We are interested in the nonlinear time series that can be described by the following state-dependent AR (SD-AR) model

    $$ \begin{align} \begin{cases} y_{t}=\phi_{0}\, (\boldsymbol{X}_{t-1})+\sum\limits_{i=1}^p\phi_{i}\, (\boldsymbol{X}_{t-1})y_{t-i}+e_{t}\\[1mm] \boldsymbol{X}_{t-1}=[y_{t-1}, y_{t-2}, \ldots, y_{t-d}]^{{T}} \end{cases} \end{align} $$ (1)

    where $y_{t}$ $(t=1, \ldots, T)$ is the output, $\boldsymbol{X}_{t-1}$ is regarded as the state vector at time $t$ , which contains only the output series in this case (in other cases it may contain the input series or another). $\phi_{i}(\boldsymbol{X}_{t-1})$ $(i=0, 1, \ldots, p)$ are the state-dependent function coefficients of the model, $p$ is the model order, and $d$ is the dimension of the state vector. $e_{t}$ denotes Gaussian white noise.

    Although the SD-AR model provides a useful framework for general nonlinear time series modeling, the problem is how to specify the functional form of its state-dependent coefficients. An efficient approach to solve the problem is to use Gaussian RBF neural networks approximations of coefficients of model (1) [2], and then the model derived is called the RBF-AR model, which is given by

    $$ \begin{align} \begin{cases} y_{t}=\phi_{0}\, (\boldsymbol{X}_{t-1})+\sum\limits_{i=1}^p\phi_{i}\, (\boldsymbol{X}_{t-1})y_{t-i}+e_{t}\\[1mm] \phi_{0}\, (\boldsymbol{X}_{t-1})=\omega_{0, 0}+\sum\limits_{k=1}^m\omega_{0, k}\exp\left\{-\lambda_{k}\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|_{2}^{2}\right\}\\[1mm] \phi_{i}\, (\boldsymbol{X}_{t-1})=\omega_{i, 0}+\sum\limits_{k=1}^m\omega_{i, k}\exp\left\{-\lambda_{k}\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|_{2}^{2}\right\}\\[1mm] \boldsymbol{Z}_{k}=[z_{k, 1}, \ldots, z_{k, d}]^{{T}} \end{cases} \end{align} $$ (2)

    where $\boldsymbol{Z}_{k}$ $(k=1, 2, \ldots, m)$ are the centers of the local linear RBF networks, $\lambda_{k}$ $(k=1, 2, \ldots, m)$ are the scaling parameters, $\omega_{i, k}$ $(i=0, 1, 2, \ldots, p; k=0, 1, 2, \ldots, m)$ are the linear weights, $m$ and $d$ are the number of hidden neurons and the dimension of the centers (the dimension of the centers is the same as the dimension of the state vector), respectively, and $\|\cdot\|_{2}$ denotes the vector 2-norm.

    In general case, the RBF networks in model (2) may have different centers for different regression variables. However, in some applications, all the RBF networks may be allowed to share the same centers, because model (2) possesses the autoregressive structure, thus, although the RBF centers are the same in that case, the regressive polynomials' coefficients are different. Thus, the RBF-AR model can be seen as a general RBF network, which has two hidden layers with $m$ and $p+1$ neurons, respectively, and the output layer with one output. In this structure, the identification of RBF-AR model is to estimate the centers, the scaling parameters and the weights of the general RBF network. Fig. 1 shows the schematic of the RBF-AR model as a general RBF network.

    图 1  The schematic of the RBF-AR model.
    Fig. 1  The schematic of the RBF-AR model.

    To simplify the nonlinear optimization steps, the scaling parameters may be selected as [2]

    $$ \begin{align} %\begin{cases} \lambda_{k}=\dfrac{-\log\epsilon_{k}}{\max\limits_{t-1}\{\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|^{2}\}}, \quad \epsilon_{k}\in[0.0001, 0.1]. %\end{cases} \end{align} $$ (3)

    Using this heuristic way, the weights will become bounded and stable when the signal $\boldsymbol{X}_{t-1}$ is far away from the centers $\boldsymbol{Z}_{k}$ .

    To apply a filtering algorithm to the RBF network training, the state-space model is established, which is given by

    $$ \begin{align} \begin{cases} \boldsymbol {\theta}_{t}=\boldsymbol{\theta}_{t-1}+\boldsymbol{v}_{t}\\[1mm] y_{t}=g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})+\xi_{t} \\[1mm] \ \ \ \, =\phi_{0}\, (\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})+\sum\limits_{i=1}^{p}\phi_{i}\, (\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})y_{t-i}+\xi_{t} \end{cases} \end{align} $$ (4)

    where $\boldsymbol{\theta}=[\boldsymbol{\omega}_{0}^{{T}}~ \boldsymbol{\omega}_{1}^{{T}}~ \ldots~ \boldsymbol{\omega}_{p}^{{T}}~ \boldsymbol{Z}_{1}^{{T}}~ \ldots~ \boldsymbol{Z}_{m}^{{T}}]^{{T}}$ ( $\boldsymbol{\omega}_j^{{T}}=[\omega_{0, j}$ $\omega_{1, j}$ $\ldots~\omega_{m, j}]$ , $0 \leq j \leq p$ ; $\boldsymbol{Z}_k=[z_{k, 1}~z_{k, 2}~\ldots~z_{k, d}]^{{T}}$ , $1$ $\leq$ $k$ $\leq$ $m$ ) represents the system state vector, $y_{t}$ is the observation variable, $\boldsymbol{v}_{t}$ and $\xi_{t}$ denote the process noise and observation noise, which are assumed to be zero-mean Gaussian processes with the covariance $\boldsymbol{Q}$ and $R$ , namely $\boldsymbol{v}_{t}$ $\sim$ ${\rm N}(0, \boldsymbol{Q}) $ , $\xi_{t}\sim {\rm N}(0, R) $ . The initial state (parameters) $\boldsymbol{\theta}_{0}$ is normally-distributed with mean $\boldsymbol{\mu}_{0}$ and covariance $\boldsymbol{\Xi}_{0}$ . Obviously, the crux of the matter is that both the system hidden state $\boldsymbol{\theta}_{t}$ and the parameters $\boldsymbol{\varphi}=(\boldsymbol{Q}, R, \boldsymbol{\mu}_{0}, \boldsymbol{\Xi}_{0})$ are unknown.

    To simultaneously estimate parameters and hidden states, the expectation maximization (EM) is incorporated with extended Kalman filtering and smoothing, which aims to integrate over the uncertain estimates of the unknown hidden states and optimize the resulting marginal likelihood of the parameters given the observed data. The algorithm realizes the more exact estimation of the posterior distribution by use of the extended Kalman filtering and smoothing.

    The EM algorithm is an iterative method for finding a mode of the likelihood function. To derive the EM algorithm for nonlinear state space models, we need to develop an expression for the likelihood of the completed data. We assume that the likelihood of the data given the states, the initial conditions and the evolution of the states can be represented by Gaussian distributions. Thus, if the initial mean and covariance of the states is given by $\boldsymbol{\mu}_{0}$ and $\boldsymbol{\Xi}_{0}$ , then

    $$ p(\boldsymbol{\theta}_{0}|\boldsymbol{\varphi})= (2\pi)^{-\frac{l}{2}}|\boldsymbol{\Xi}_{0}|^{-\frac{1}{2}} \nonumber\\ \quad\ \times\exp\left[-\dfrac{1}{2}\, (\boldsymbol{\theta}_{0}- \boldsymbol{\mu}_{0})^{T}\boldsymbol{\Xi}_{0}^{-1}\, (\boldsymbol{\theta}_{0}-\boldsymbol{\mu}_{0})\right] $$ (5)
    $$ p(\boldsymbol{\theta}_{t}|\boldsymbol{\theta}_{t-1}, \boldsymbol{\varphi}) =(2\pi)^{-\frac{l}{2}}|\boldsymbol{Q}|^{-\frac{1}{2}}\nonumber\\ \quad\ \times\exp\left[-\dfrac{1}{2}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1})^{T}\boldsymbol{Q}^{-1}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1})\right] $$ (6)
    $$ p(y_{t}|\boldsymbol{\theta}_{t}, \boldsymbol{\varphi})=(2\pi)^{-\frac{n}{2}}|R|^{-\frac{1}{2}}\nonumber\\ \quad\ \times\exp\left[-\dfrac{1}{2}\, (y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))^{T}R^{-1}\, (y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))\right] $$ (7)

    where $l=(1+m)(1+p)+md$ is the dimension of the state vector, $n=1$ is the dimension of the observation vector.

    Then, the log-likelihood of the complete data is given by:

    $$ \begin{align} \ln p(&\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi}) =-\dfrac{1}{2}[\boldsymbol{\theta}_{0}-\boldsymbol{\mu}_{0}]^{T} \boldsymbol{\Xi}_{0}^{-1}[\boldsymbol{\theta}_{0}-\boldsymbol{\mu}_{0}] \nonumber\\ &-\dfrac{1}{2}\ln |\boldsymbol{\Xi}_{0}|-\dfrac{T(l+n)+l}{2}\ln 2\pi-\dfrac{T}{2}\ln |\boldsymbol{Q}|\nonumber\\ &-\dfrac{T}{2}\ln |R|-\sum\limits_{t=1}^{T}\dfrac{1}{2} [\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1}]^{T} \boldsymbol{Q}^{-1}[\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1}] \nonumber\\ &-\sum\limits_{t=1}^{T}\dfrac{1}{2}[y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})]^{T}R^{-1}[y_{t}-g(\boldsymbol {\theta}_{t}, \boldsymbol{X}_{t-1})]. \end{align} $$ (8)

    If we take the expectation of the log-likelihood for the complete data, we get the following expression:

    $$ \begin{align} & E\ln\left[p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})\right]=-\dfrac{1}{2}\ln |\boldsymbol{\Xi}_{0}|-\dfrac{T}{2}\ln |\boldsymbol{Q}|-\dfrac{T}{2}\ln |R|\nonumber\\ &\ \ \, -\dfrac{1}{2}E\left[\boldsymbol{\theta}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\theta}_{0}-\boldsymbol{\theta}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\mu}_{0}-\boldsymbol{\mu}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\theta}_{0}+\boldsymbol{\mu}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\mu}_{0}\right]\nonumber\\ &\ \ \, -\dfrac{1}{2}\sum\limits_{t=1}^{T}E\left[\boldsymbol{\theta}_{t}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t-1}-\boldsymbol{\theta}_{t-1}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t}\right.\nonumber\\ &\ \ \, +\left.\boldsymbol{\theta}_{t-1}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t-1}\right]-\dfrac{T(l+n)+l}{2}\ln 2\pi\nonumber\\ &\ \ \, -\dfrac{1}{2}\sum\limits_{t=1}^{T}E\left[(y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))^{{T}}R^{-1}\, (y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))\right]. \end{align} $$ (9)

    To compute the expectation of the measurement mapping $g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})$ , the Taylor expansion of this mapping around the smoothing estimate $\boldsymbol{\theta}_{t|T}$ is given by

    $$ \begin{align} &g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})=g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\nonumber\\ &\qquad +\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})+\cdots \end{align} $$ (10)

    where the smoothing estimate $\boldsymbol{\theta}_{t|T}$ denotes the conditional mean of $\boldsymbol{\theta}_{t}$ given $y_{1:T}=\{y_1, \ldots, y_{T}\}$ .

    The EKF utilizes only the first-order Taylor expansion, so we can obtain

    $$ \begin{align} &g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})\approx g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\nonumber\\ &\qquad\left. +\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \boldsymbol{\theta}_{t}}\right|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T}). \end{align} $$ (11)

    Under the basic assumptions of the EKF (no model and parameter uncertainty, zero-mean white-noise sequence, known process and measurement models, etc.), the smoothing estimates are unbiased, and the smoothing estimates satisfy

    $$ \begin{align} &E(\boldsymbol{\theta}_{t|T})=\boldsymbol{\theta}_{t}. \end{align} $$ (12)

    So we can get

    $$ \begin{align} &E(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))\approx g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}). \end{align} $$ (13)

    Subsequently, we compute the covariance of $g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})$

    $$ \begin{align} &E\Big[\big(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)\nonumber\\ &\qquad\qquad\qquad\quad\quad\quad\big(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}}\Big]\nonumber\\ &\quad\quad\, \, \approx E\Bigg[\bigg[\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})\bigg]\nonumber\\ &\qquad\quad\ \, \times\bigg[\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})\bigg]^{{T}}\Bigg]\nonumber\\ &\quad\quad\, \, =\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}. \end{align} $$ (14)

    Hence, it follows that:

    $$ \begin{align} &E\left[g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})\big(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})\big)^{{T}}\right]\nonumber\\ &\qquad \approx \boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}+g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big(g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}} \end{align} $$ (15)

    where $\boldsymbol{G}_{t}$ corresponds to the Jacobian matrix of the measurement function:

    $$ \begin{align} &\boldsymbol{G}_{t}=\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|t-1})}\nonumber\\ &\quad\, =\bigg[\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \theta_{1, t}}~\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \theta_{2, t}}~ \cdots ~\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \theta_{l, t}}\bigg] \end{align} $$ (16)

    and $\boldsymbol{P}_{t|T}$ denotes the conditional covariance of $\boldsymbol{\theta}_{t}$ given $y_{1:T}$ $=$ $\{y_{1}, \ldots, y_{T}\}$ :

    $$ \begin{align} &\boldsymbol{P}_{t|T}=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})^{{T}}\right]. \end{align} $$ (17)

    Equations (13)-(15) are substituted in (9), and then we get

    $$ \begin{align} &E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})] =-\dfrac{1}{2}\ln |\boldsymbol{\Xi}_{0}|-\dfrac{T}{2}\ln |\boldsymbol{Q}| \nonumber\\ &\qquad -\dfrac{T}{2}\ln |R|-\dfrac{1}{2}{\rm tr}\bigg\{\boldsymbol{\Xi}_{0}^{-1}[\boldsymbol{\theta}_{0|T} \boldsymbol{\theta}_{0|T}^{{T}}\nonumber\\ &\qquad-2\boldsymbol{\theta}_{0|T} \boldsymbol{\mu}_{0}^{{T}}+\boldsymbol{\mu}_{0} \boldsymbol{\mu}_{0}^{{T}}+\boldsymbol{P}_{0|T}]\bigg\}\nonumber\\ &-\dfrac{1}{2}\sum\limits_{t=1}^{T}{\rm tr}\bigg\{\boldsymbol{Q}^{-1} \big[\boldsymbol{\theta}_{t|T}\boldsymbol{\theta}_{t|T}^{{T}} +\boldsymbol{P}_{t|T}\nonumber\\ &-2(\boldsymbol{\theta}_{t|T} \boldsymbol{\theta}_{t-1|T}^{{T}}+\boldsymbol{P}_{t, t-1|T})^{{T}}\nonumber\\ &+\boldsymbol{\theta}_{t-1|T}\boldsymbol{\theta}_{t-1|T}^{{T}} +\boldsymbol{P}_{t-1|T}\big]\bigg\}\nonumber\\ &-\dfrac{(l+n)T+l}{2}\ln 2\pi\nonumber\\ &-\dfrac{1}{2}\sum\limits_{t=1}^{T}{\rm tr}\bigg\{R^{-1}\big[(y_{t}-g(\boldsymbol {\theta}_{t|T}, \boldsymbol{X}_{t-1}))\nonumber\\ &\times(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}))^{{T}}+\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\big]\bigg\}. \end{align} $$ (18)

    Smoothing often includes the forward and backward filtering over a segment of data so as to obtain improved estimates. The forward filtering stage involves computing the estimates $\boldsymbol{\theta}_{t|t}$ and $\boldsymbol{P}_{t|t}$ over a segment of $T$ samples, where $\boldsymbol{\theta}_{t|t}$ and $\boldsymbol{P}_{t|t}$ denote the conditional mean and conditional covariance of $\boldsymbol{\theta}_{t}$ given $y_{1:t}=\{y_{1}, \ldots, y_{t}\}$ . Then, the EKF-based forward filtering for estimating model (4) can be derived as follows. The predicted state vector:

    $$ \begin{align} &\boldsymbol{\theta}_{t|t-1}=E[\boldsymbol{\theta}_{t}|y_{1:t-1}]=\boldsymbol{\theta}_{t-1|t-1}. \end{align} $$ (19)

    The predicted conditional covariance of $\boldsymbol{\theta}_{t}$ :

    $$ \begin{align} \boldsymbol{P}_{t|t-1}&=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t-1})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t-1})^{{T}}\right]\notag\\ &=\boldsymbol{P}_{t-1|t-1}+\boldsymbol{Q}. \end{align} $$ (20)

    The Kalman gain:

    $$ \begin{align} &\boldsymbol{K}_{t}=\boldsymbol{P}_{t|t-1}\boldsymbol{G}^{{T}}\, (\boldsymbol{G}_{t}\boldsymbol{P}_{t|t-1}\boldsymbol{G}_{t}^{{T}}+R)^{-1}. \end{align} $$ (21)

    Updated system state estimate:

    $$ \begin{align} \boldsymbol{\theta}_{t|t}&=E[\boldsymbol{\theta}_{t}|y_{1:t}]\notag\\ &=\boldsymbol{\theta}_{t|t-1}+\boldsymbol{K}_{t}[y_{t}-g(\boldsymbol{\theta}_{t|t-1}, \boldsymbol{X}_{t-1})]. \end{align} $$ (22)

    Updated estimate covariance:

    $$ \begin{align} \boldsymbol{P}_{t|t}&=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t})^{{T}}\right]\notag\\ &=\boldsymbol{P}_{t|t-1}-\boldsymbol{K}_{t}\boldsymbol{G}_{t}\boldsymbol{P}_{t|t-1}. \end{align} $$ (23)

    To obtain the smoothing estimates $\boldsymbol{\theta}_{t|T}$ and $\boldsymbol{P}_{t|T}$ , we employ the Rauch-Tung-Strieber smoother to do the following backward recursions

    $$ \begin{align} \begin{cases} \boldsymbol{J}_{t}=\boldsymbol{P}_{t|t}\boldsymbol{P}_{t+1|t}^{-1}\\[1mm] \boldsymbol{\theta}_{t|T}=\boldsymbol{\theta}_{t|t}+\boldsymbol{J}_{t}\, (\boldsymbol{\theta}_{t+1|T}-\boldsymbol{\theta}_{t+1|t})\\[1mm] \boldsymbol{P}_{t|T}=\boldsymbol{P}_{t|t}+\boldsymbol{J}_{t}\, (\boldsymbol{P}_{t+1|T}-\boldsymbol{P}_{t+1|t})\boldsymbol{J}_{t}^{{T}}. \end{cases} \end{align} $$ (24)

    We also require the cross-covariance $\boldsymbol{P}_{t, t-1|T}$ , which is defined as follows

    $$ \begin{align} &\boldsymbol{P}_{t, t-1|T}=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})^{{T}}\right]. \end{align} $$ (25)

    And it can be obtained through the backward recursions

    $$ \begin{align} &\boldsymbol{P}_{t, t-1|T}=\boldsymbol{P}_{t|t}\boldsymbol{J}_{t-1}^{{T}}+\boldsymbol{J}_{t}\, (\boldsymbol{P}_{t+1, t|T}-\boldsymbol{P}_{t|t})\boldsymbol{J}_{t-1}^{{T}}. \end{align} $$ (26)

    The backward recursions as above are initialized as following

    $$ \begin{align} \begin{cases} \boldsymbol{\theta}_{T|T}=\boldsymbol{\theta}_{T}\\[1mm] \boldsymbol{P}_{T|T}=\boldsymbol{P}_{T}\\[1mm] \boldsymbol{P}_{T, T-1|T}=(\boldsymbol{I}-\boldsymbol{K}_{T}\boldsymbol{G}_{T})\boldsymbol{P}_{T-1|T-1}. \end{cases} \end{align} $$ (27)

    Compared with the filtering algorithm that uses the observations up to time $t$ for estimation of the state $\boldsymbol{\theta}_{t}$ , the smoothing yields a more accurate estimate $\boldsymbol{\theta}_{t|T}$ by using all available data up to time $T$ .

    To find the optimal parameters $\boldsymbol{\varphi}$ , we need to maximize the expected value of the log-likelihood with respect to the parameters, and then compute the derivatives with respect to each parameter individually. That is

    $$ \begin{align} \begin{cases} \dfrac{\partial }{\partial \boldsymbol{\mu}_{0}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[3mm] -\dfrac{1}{2}\dfrac{\partial}{\partial \boldsymbol{\mu}_{0}}{\rm tr} \left\{\boldsymbol{\Xi}_{0}^{-1}[(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})^{{T}}+\boldsymbol{P}_{0|T}]\right\}=0\\[3mm] \boldsymbol{\Xi}_{0}^{-1}\, (\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})=0\\ \boldsymbol{\mu}_{0}=\boldsymbol{\theta}_{0|T} \end{cases} \end{align} $$ (28)
    $$ \begin{align} \begin{cases} \dfrac{\partial }{\partial \boldsymbol{\Xi}_{0}^{-1}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[3mm] \dfrac{1}{2}\dfrac{\partial}{\partial \boldsymbol{\Xi}_{0}^{-1}}\Big\{\ln |\boldsymbol{\Xi}_{0}^{-1}|-{\rm tr}\big[\boldsymbol{\Xi}_{0}^{-1}\, ((\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})^{{T}}\\[1mm] \qquad\ \ +~\boldsymbol{P}_{0|T})\big]\Big\}=0\\[2mm] \dfrac{\boldsymbol{\Xi}_{0}}{2}-\dfrac{1}{2}\left((\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})^{{T}}+\boldsymbol{P}_{0|T}\right)=0\\[2mm] \boldsymbol{\Xi}_{0}=\boldsymbol{P}_{0|T} \end{cases} \end{align} $$ (29)
    $$ \begin{align} \begin{cases} \dfrac{\partial }{\partial \boldsymbol{Q}^{-1}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[3mm] \dfrac{\partial}{\partial \boldsymbol{Q}^{-1}}\left\{\dfrac{T}{2}\ln |\boldsymbol{Q}^{-1}|-\dfrac{1}{2}\big[{\rm tr}(\boldsymbol{Q}^{-1}\, (\boldsymbol{\Gamma}-2\boldsymbol{\Upsilon}^{{T}}+\boldsymbol{\Lambda}))\big]\right\}=0\\[3mm] \dfrac{T}{2}\boldsymbol{Q}-\dfrac{1}{2}\left(\boldsymbol{\Gamma}-2\boldsymbol{\Upsilon}^{{T}}+\boldsymbol{\Lambda}\right)=0\\[1mm] \boldsymbol{Q}=\dfrac{1}{T}\left(\boldsymbol{\Gamma}-2\boldsymbol{\Upsilon}^{{T}}+\boldsymbol{\Lambda}\right) \end{cases} \end{align} $$ (30)
    $$ \begin{align} \begin{cases} \dfrac{\partial }{\partial R^{-1}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[2mm] \dfrac{\partial}{\partial R^{-1}}\bigg\{\dfrac{T}{2}\ln |R^{-1}|-\dfrac{1}{2}\sum\limits_{t=1}^{T}{\rm tr}\left[R^{-1}\big((y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}))\right.\\[1mm] \qquad\times\left.(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}))^{{T}}+\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\big)\right]\bigg\}=0\\[2mm] \dfrac{T}{2}R-\dfrac{1}{2}\sum\limits_{t=1}^{T}\bigg[\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}}\\[1mm] \qquad +~\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\bigg]=0\\[2mm] R=\dfrac{1}{T}\sum\limits_{t=1}^{T}\bigg[\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}}\\[1mm] \qquad +~\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\bigg] \end{cases} \end{align} $$ (31)

    where

    $$ \begin{align} \begin{cases} \boldsymbol{\Gamma}=\sum\limits_{t=1}^{T}\left(\boldsymbol{\theta}_{t|T}\boldsymbol{\theta}_{t|T}^{{T}}+\boldsymbol{P}_{t|T}\right)\\[2mm] \boldsymbol{\Lambda}=\sum\limits_{t=1}^{T}\left(\boldsymbol{\theta}_{t-1|T}\boldsymbol{\theta}_{t-1|T}^{{T}}+\boldsymbol{P}_{t-1|T}\right)\\[2mm] \boldsymbol{\Upsilon}=\sum\limits_{t=1}^{T}\left(\boldsymbol{\theta}_{t|T}\boldsymbol{\theta}_{t-1|T}^{{T}}+\boldsymbol{P}_{t, t-1|T}\right). \end{cases} \end{align} $$ (32)

    It is significant to mention that the EM algorithm is applied to obtain maximum likelihood (ML) estimates of the above parameters and states, which can reduce the computational complexity and guarantee the convergence to a stationary point while continuously increasing the ML function. But EM algorithm is computationally expensive when the state dimension is high.

    To evaluate the performance of the presented approach, we predict the well-known Mackey-Glass time series through three cases. In the first case, the data is generated from the following Mackey-Glass equation:

    $$ \begin{align} \dfrac{dy_{t}}{dt}=\dfrac{ay_{t-\tau_{0}}}{1+y_{t-\tau_{0}}^{c}}-by_{t} \end{align} $$ (33)

    where the parameters are chosen to be $a=0.2$ , $b=0.1$ , $c$ $=$ $10$ and $\tau_{0}=20$ . A thousand values are sampled and the series is shown in Fig. 2. This chaotic benchmark time series was studied in [1], [2]. We use the RBF-AR $(p, m, d)$ model to predict the nonlinear time series, where the parameters $p$ , $m$ and $d$ are defined as shown in model (2). The first 500 data points are used to train the model, and the last 500 data are used to test the model.

    图 2  The original Mackey-Glass time series.
    Fig. 2  The original Mackey-Glass time series.

    To make the comparisons between the SNPOM, the EKF and the EM-EKF, we predict the value $y_{t}$ from the fixed input vector $[y_{t-5}, y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1}]$ . Thus, an RBF-AR(5, 3, 2) model is used to predict the one-step ahead output of this complex nonlinear time series as follows:

    $$ \begin{align} \begin{cases} y_{t}=\phi_{0}\, (\boldsymbol{X}_{t-1})+\sum\limits_{i=1}^{5}\phi_{i}\, (\boldsymbol{X}_{t-1})y_{t-i}+e_{t}\\[3mm] \phi_{i}\, (\boldsymbol{X}_{t-1})=\omega_{i, 0}+\sum\limits_{k=1}^{3}\omega_{i, k}\exp\left\{-\lambda_{k}\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|_{2}^{2}\right\}\\[3mm] \boldsymbol{X}_{t-1}=[y_{t-1}, y_{t-2}]^{{T}}. \end{cases} \end{align} $$ (34)

    Table Ⅰ reports the comparison results of the proposed EM-EKF, the EKF and the SNPOM. The estimated results are the mean squared errors (MSEs), observation noise variance and the standard deviations (given in the parentheses). It is worth mentioning that Table Ⅰ gives two different estimation results using EKF in the cases of given different measurement noise variances. From Table Ⅰ we can see that the MSEs of the training data using EM-EKF are lesser than those using SNPOM and EKF, while the testing data is only slightly lesser. We attribute this to the fact that the SNPOM has good estimation results for clean data. Besides, the SNPOM obtains slightly better results than does the EKF in these conditions of given different measurement noise variances ( $R=0.002$ or $R=0.0002$ ). Furthermore, by comparison between the two conditions in the EKF, the closer the given measurement noise variance is to the true value ( $R=0$ ), the better the results are. This illustrates that setting the measurement noise appropriately leads to success or failure when using EKF. The estimate of observation noise variance using EM-EKF is close to the true value (zero), while the estimate of that using SNPOM is unknown, and the observation noise variance in the EKF must be given in advance. In Fig. 3, the top plot shows the increase of the log-likelihood at each step, the middle plot shows the convergence of the measurements noise variance $R$ , and the bottom plot shows the trace of the process noise variance $\boldsymbol{Q}$ . The initial conditions for the state $\boldsymbol{\theta}_{0}$ and the measurements noise variance $R$ are randomly chosen with a uniform distribution from the range [0,1] and [0, 0.01], respectively, and matrices $\boldsymbol{Q}$ and $\boldsymbol{P}$ are initialized as identity matrix and 100 times identity matrix, respectively.

    表 Ⅰ  COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIES
    Table Ⅰ  COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIES
    Method MSE (Training) MSE (Testing) $R$
    SNPOM 1.0800E-7 1.2600E-7 unknown
    EKF 1.9560E-7 2.1786E-7 0.002 (given)
    EKF 1.2559E-7 1.9793E-7 0.0002 (given)
    EM-EKF 7.1765E-8 1.2008E-7 1.8146E-7
    下载: 导出CSV 
    | 显示表格
    图 3  The convergence process of the Log-likelihood, the measurement and process noise variance.
    Fig. 3  The convergence process of the Log-likelihood, the measurement and process noise variance.

    To examine the noise effect in the time series, the EM-EKF, the EKF and the SNPOM are used to estimate the Mackey-Glass chaotic time series corrupted by additive white noise with variance 0.25 and 1 in the following two cases (see Fig. 4). Table Ⅱ gives the performance of the EM-EKF compared to those of the SNPOM and the EKF on estimating the Mackey-Glass chaotic time series corrupted by additive white noise. From Table Ⅱ, one can see that the overall performance obtained by the EM-EKF is better than that of the SNPOM and the EKF. In detail, the EM-EKF has smaller values of MSEs than those of the SNPOM and the EKF for the training data and testing data, although the improvement is not significant. As for the EKF, when the given values of the measurement noise variance ( $R=0.1$ and $R=0.6$ ) deviate relatively far from the true values ( $R=0.25$ and $R=1$ ), the results become worse, even worse than those obtained by the SNPOM. On the contrary, when the given values ( $R=0.2$ and $R=0.8$ ) are relatively close to the true values, the EKF obtains slightly better results than does the SNPOM, but still obtains worse results than does the EM-EKF. This illustrates that setting of the measurement noise plays an important part in the use of the EKF again. In both cases, the values of measurements noise variance $R$ using EM-EKF are 0.25950 and 1.0589, respectively, which are very close to the true values. In contrast, the SNPOM can not estimate the values of measurement noise variance, and the values of measurements noise variance should be given beforehand in the use of the EKF. The initial conditions are the same as case one except that $R$ is randomly chosen with a uniform distribution from the range $[0,1]$ . Figs. 5 and 6 show the convergent process of the log-likelihood and noise variance.

    图 4  The Mackey-Glass chaotic time series corrupted by additive white noise.
    Fig. 4  The Mackey-Glass chaotic time series corrupted by additive white noise.
    表 Ⅱ  COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIES CORRUPTES BY ADDITIVE WHITE NOISE
    Table Ⅱ  COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIES CORRUPTES BY ADDITIVE WHITE NOISE
    Noise variance Method MSE (Training) MSE (Testing) R
    0.25 SNPOM 0.26606 0.28120 unknown
    EKF 0.26577 0.28082 0.2 (given)
    EKF 0.27343 0.28825 0.1 (given)
    EM-EKF 0.25750 0.27825 0.25950
    1 SNPOM 0.97452 1.1644 unknown
    EKF 0.97215 1.1512 0.8 (given)
    EKF 0.98262 1.1785 0.6 (given)
    EM-EKF 0.96590 1.13907 1.0589
    下载: 导出CSV 
    | 显示表格
    图 5  The convergence process of the Log-likelihood, the measurement and process noise variance.
    Fig. 5  The convergence process of the Log-likelihood, the measurement and process noise variance.
    图 6  The convergence process of the Log-likelihood, the measurement and process noise variance.
    Fig. 6  The convergence process of the Log-likelihood, the measurement and process noise variance.

    Statistical $F$ test is used to judge whether the error variance estimated by the EM-EKF is equal to the variance estimated by the SNPOM. As shown in Table Ⅲ, the values of the computing $F$ test statistic ( $F$ ) are greater than the table value ( $F_{\alpha=0.05}$ ) except for the testing data without additional white noise, so the null hypothesis that two variances are equal is rejected at level $\alpha=0.05$ . This indicates that there are significant differences between the above two methods and the performance of the EM-EKF is superior to the SNPOM.

    表 Ⅲ  THE RESULTS OF STATISTICAL F TEST AT LEVEL 0.05
    Table Ⅲ  THE RESULTS OF STATISTICAL F TEST AT LEVEL 0.05
    Case $F_{\alpha=0.05}$ $F$ Results
    Case 1 (Training) 1.16 1.4642 Reject; Difference
    Case 1 (Testing) 1.16 1.0105 No reject; No difference
    Case 2 (Training) 1.16 14.1212 Reject; Significant difference
    Case 2 (Testing) 1.16 9.2984 Reject; Significant difference
    Case 3 (Training) 1.16 37.8297 Reject; Significant difference
    Case 3 (Testing) 1.16 15.5751 Reject; Significant difference
    下载: 导出CSV 
    | 显示表格

    To compare the computational complexity, Table Ⅳ lists the computational time for the three methods. The simulations are implemented on a computer (Inter (R) Core (TM)2 Duo CPU E7200 @2.53 GHz, 8 G-RAM). The average running time (100 iterations) for EM-EKF is approximately 30.766 seconds, for SNPOM (100 iterations) 10.690 seconds, and for EKF 0.40572 seconds. Obviously, the EM-EKF is more time-consuming than the other two methods since EM algorithm is computationally expensive for high-dimension states. Also, for a large RBF network, the computational expense of the EKF could be burdensome and it increases linearly with the number of training samples. Instead, the SNPOM is a hybrid method, depending partly on the Levenberg-Marquardt method (LMM) for nonlinear parameter optimization and partly on the least-squares method (LSM) for linear parameter estimation. The SNPOM can greatly accelerate the computational convergence of the parameter search process, especially for the RBF-type models with larger number of linear weights and smaller number of nonlinear parameters. However, the EM-EKF, as an alternative way, is superior to the SNPOM and the EKF and can accurately estimate the noise variance.

    表 Ⅳ  THE COMPUTATION TIME OF DIFFERENT METHODS (S)
    Table Ⅳ  THE COMPUTATION TIME OF DIFFERENT METHODS (S)
    Method Time
    SNPOM 10.690
    EKF 0.40572
    EM-EKF 30.766
    下载: 导出CSV 
    | 显示表格

    As a whole, the EM-EKF method is capable of estimating the parameters of the RBF-AR model, and the initial conditions and the noise variances are identified by use of the EM algorithm, which can further improve the modeling precision. Comparison results indicate that the RBF-AR model estimated by the EM-EKF makes more accurate predictions than do the SNPOM and the EKF, although the values of the MSEs using EM-EKF are only slightly smaller than those of the SNPOM and EKF in some cases. However, $F$ test shows there is significant difference between results obtained by the SNPOM and the EM-EKF. Furthermore, the estimation of observation noise variance using EM-EKF is close to the true value, while the estimate of that using SNPOM is unknown, and the observation noise variance in the EKF must be given in advance. Therefore, we can conclude that the EM-EKF method is an advisable choice for estimating RBF-AR model and is especially appropriate for signals disturbed by noise.

    In this paper, the EM-EKF method is developed to estimate the parameter of RBF-AR model. Firstly, the model is reconstructed as a new type of general radial basis function neural networks. Secondly, to circumvent the EKF's limitation of unknown prior knowledge, the EM is proposed to calculate the initial states and the measurement and process noise variance. By combining the EM and extended Kalman filtering and smoothing process, the EM-EKF method is proposed to estimate the parameters of the RBF-AR model, the initial conditions and the noise variances jointly, and can further improve the modeling precision. Comparisons of the performance of the EM-EKF with the SNPOM and the EKF are performed, and the results indicate that the RBF-AR model estimated by the EM-EKF makes more accurate predictions than do the SNPOM and the EKF, although the EM-EKF is more time-consuming. Moreover, the estimate of observation variance converges to the true value. Finally, F test indicates there is significant difference between results obtained by the SNPOM and the EM-EKF. Our future work would develop the EM-EKF algorithm for RBF-ARX (Peng et al. [2]) estimation and apply the RBF-AR model based on the EM-EKF algorithm to other types of time series and complex systems analysis.

  • Fig.  1  The schematic of the RBF-AR model.

    Fig.  2  The original Mackey-Glass time series.

    Fig.  3  The convergence process of the Log-likelihood, the measurement and process noise variance.

    Fig.  4  The Mackey-Glass chaotic time series corrupted by additive white noise.

    Fig.  5  The convergence process of the Log-likelihood, the measurement and process noise variance.

    Fig.  6  The convergence process of the Log-likelihood, the measurement and process noise variance.

    Table  Ⅰ  COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIES

    Method MSE (Training) MSE (Testing) $R$
    SNPOM 1.0800E-7 1.2600E-7 unknown
    EKF 1.9560E-7 2.1786E-7 0.002 (given)
    EKF 1.2559E-7 1.9793E-7 0.0002 (given)
    EM-EKF 7.1765E-8 1.2008E-7 1.8146E-7
    下载: 导出CSV

    Table  Ⅱ  COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIES CORRUPTES BY ADDITIVE WHITE NOISE

    Noise variance Method MSE (Training) MSE (Testing) R
    0.25 SNPOM 0.26606 0.28120 unknown
    EKF 0.26577 0.28082 0.2 (given)
    EKF 0.27343 0.28825 0.1 (given)
    EM-EKF 0.25750 0.27825 0.25950
    1 SNPOM 0.97452 1.1644 unknown
    EKF 0.97215 1.1512 0.8 (given)
    EKF 0.98262 1.1785 0.6 (given)
    EM-EKF 0.96590 1.13907 1.0589
    下载: 导出CSV

    Table  Ⅲ  THE RESULTS OF STATISTICAL F TEST AT LEVEL 0.05

    Case $F_{\alpha=0.05}$ $F$ Results
    Case 1 (Training) 1.16 1.4642 Reject; Difference
    Case 1 (Testing) 1.16 1.0105 No reject; No difference
    Case 2 (Training) 1.16 14.1212 Reject; Significant difference
    Case 2 (Testing) 1.16 9.2984 Reject; Significant difference
    Case 3 (Training) 1.16 37.8297 Reject; Significant difference
    Case 3 (Testing) 1.16 15.5751 Reject; Significant difference
    下载: 导出CSV

    Table  Ⅳ  THE COMPUTATION TIME OF DIFFERENT METHODS (S)

    Method Time
    SNPOM 10.690
    EKF 0.40572
    EM-EKF 30.766
    下载: 导出CSV
  • [1] Z. Y. Shi, Y. Tamura, and T. Ozaki, "Nonlinear time series modelling with the radial basis function-based state-dependent autoregressive model, "Int. J. Syst. Sci. , vol. 30, no. 7, pp. 717-727, Jul. 1999. doi: 10.1080/002077299292038
    [2] H. Peng, T. Ozaki, V. Haggan-Ozaki, and Y. Toyoda, "A parameter optimization method for radial basis function type models, "IEEE Trans. Neural Netw. , vol. 14, no. 2, pp. 432-438, Mar. 2003. http://europepmc.org/abstract/MED/18238025
    [3] M. Gan, H. Peng, X. Y. Peng, X. H. Chen, and G. Inoussa, "A locally linear RBF network-based state-dependent AR model for nonlinear time series modeling, "Inform. Sci. , vol. 180, no. 22, pp. 4370-4380, Nov. 2010. http://www.sciencedirect.com/science/article/pii/S0020025510003300
    [4] M. Gan, C. L. P. Chen, H. X. Li, and L. Chen, "Gradient radial basis function based varying-coefficient Autoregressive model for nonlinear and nonstationary time series, "IEEE Signal Proc. Lett. , vol. 22, no. 7, pp. 809-812, Jul. 2015. http://ieeexplore.ieee.org/document/6953131/
    [5] H. Peng, J. Wu, G. Inoussa, Q. L. Deng, and K. Nakano, "Nonlinear system modeling and predictive control using the RBF nets-based quasi-linear ARX model, "Control Eng. Pract. , vol. 17, no. 1, pp. 59-66, Jan. 2009. http://www.sciencedirect.com/science/article/pii/S0967066108000993
    [6] H. Peng, G. Kitagawa, J. Wu, and K. Ohtsu, "Multivariable RBF-ARX model-based robust MPC approach and application to thermal power plant, "Appl. Math. Model. , vol. 35, no. 7, pp. 3541-3551, Jul. 2011. http://www.sciencedirect.com/science/article/pii/S0307904X11000151
    [7] M. Gan, H. Peng, and L. Y. Chen, "A global-local optimization approach to parameter estimation of RBF-type models, "Inform. Sci. , vol. 197, pp. 144-160, Aug. 2012. http://www.sciencedirect.com/science/article/pii/S0020025512000679
    [8] M. Gan, H. X. Li, and H. Peng, "A variable projection approach for efficient estimation of RBF-ARX model, "IEEE Trans. Cybern. , vol. 45, no. 3, pp. 462-471, Mar. 2015. http://europepmc.org/abstract/med/24988599
    [9] H. K. Wei and S. I. Amari, "Dynamics of learning near singularities in radial basis function networks, "Neural Netw. , vol. 21, no. 7, pp. 989-1005, Sep. 2008. http://dl.acm.org/citation.cfm?id=1411874
    [10] J. S. Kim and S. Jung, "Implementation of the RBF neural chip with the back-propagation algorithm for on-line learning, "Appl. Soft Comput. , vol. 29, pp. 233-244, Apr. 2015. http://www.sciencedirect.com/science/article/pii/S1568494614006620
    [11] F. P. Härter and H. F. de Campos Velho, "New approach to applying neural network in nonlinear dynamic model, "Appl. Math. Model. , vol. 32, no. 12, pp. 2621-2633, Dec. 2008. http://www.sciencedirect.com/science/article/pii/S0307904X07002296
    [12] H. Z. Yang, J. Li, and F. Ding, "A neural network learning algorithm of chemical process modeling based on the extended Kalman filter, "Neurocomputing, vol. 70, no. 4-6, pp. 625-632, Jan. 2007. http://dl.acm.org/citation.cfm?id=1224059&CFID=448466066&CFTOKEN=67010250
    [13] K. Salahshoor, S. Zakeri, and M. H. Sefat, "Stabilization of gas-lift oil wells by a nonlinear model predictive control scheme based on adaptive neural network models, "Eng. Appl. Artif. Intel. , vol. 26, no. 8, pp. 1902-1910, Sep. 2013. http://dl.acm.org/citation.cfm?id=2508322
    [14] N. Y. Zeng, Z. D. Wang, Y. R. Li, M. Du, and X. H. Liu, "Inference of nonlinear state-space models for sandwich-type lateral flow immunoassay using extended Kalman filtering, "IEEE Trans. Biomed. Eng. , vol. 58, no. 7, pp. 1959-1966, Jul. 2011. http://www.ncbi.nlm.nih.gov/pubmed/21245000
    [15] J. F. G. de Freitas, M. Niranjan, and A. H. Gee, "Hierarchical Bayesian models for regularization in sequential learning, "Neural Comput. , vol. 12, no. 4, pp. 933-953, Apr. 2000. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6789428
    [16] Y. S. T. Hong, "Dynamic nonlinear state-space model with a neural network via improved sequential learning algorithm for an online real-time hydrological modeling, "J. Hydrol. , vol. 468-469, pp. 11-21, Oct. 2012. http://www.sciencedirect.com/science/article/pii/S0022169412006658
    [17] Y. H. Xi, H. Peng, and X. H. Chen, "A sequential learning algorithm based on adaptive particle filtering for RBF networks, "Neural Comput. Appl. , vol. 25, no. 3-4, pp. 807-814, Sep. 2014. http://dl.acm.org/citation.cfm?id=2662643
    [18] N. Y. Zeng, Z. D. Wang, Y. R. Li, M. Du, and X. H. Liu, "A hybrid EKF and switching PSO algorithm for joint state and parameter estimation of lateral flow immunoassay models, "IEEE/ACM Trans. Comput. Biol. Bioinform. , vol. 9, no. 2, pp. 321-329, Mar. -Apr. 2012.
    [19] N. Y. Zeng, Z. D. Wang, Y. R. Li, M. Du, and X. H. Liu, "Identification of nonlinear lateral flow immunoassay state-space models via particle filter approach, "IEEE Trans. Nanotechnol. , vol. 11, no. 2, pp. 321-327, Mar. 2012. http://ieeexplore.ieee.org/abstract/document/6081979/
    [20] M. Hürzeler and H. R. Künsch, "Approximating and maximising the likelihood for a general state-space model, "in Sequential Monte Carlo Methods in Practice, A. Doucet, N. de Freitas, and N. Gordon, Eds. New York: Springer, 2001, pp. 159-175. http://www.ams.org/mathscinet-getitem?mr=1847791
    [21] N. Y. Zeng, Z. D. Wang, Y. R. Li, M. Du, J. Cao, and X. H. Liu, "Time series modeling of nano-gold immunochromatographic assay via expectation maximization algorithm, "IEEE Trans. Biomed. Eng. , vol. 60, no. 12, pp. 3418-3424, Dec. 2013. http://www.ncbi.nlm.nih.gov/pubmed/23629840
    [22] M. Lázaro, I. Santamaría, and C. Pantaleón, "A new EM-based training algorithm for RBF networks, "Neural Netw. , vol. 16, no. 1, pp. 69-77, Jan. 2003. http://dl.acm.org/citation.cfm?id=776140
    [23] C. Constantinopoulos and A. Likas, "Semi-supervised and active learning with the probabilistic RBF classifier, "Neurocomputing, vol. 71, no. 13-15, pp. 2489-2498, Aug. 2008. http://www.sciencedirect.com/science/article/pii/S0925231208002117
  • 加载中
图(6) / 表(4)
计量
  • 文章访问数:  1635
  • HTML全文浏览量:  389
  • PDF下载量:  2142
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-11-14
  • 录用日期:  2016-12-20
  • 刊出日期:  2017-09-20

目录

/

返回文章
返回