Robust H∞ Fuzzy Output-feedback Control With Both General Multiple Probabilistic Delays and Multiple Missing Measurements and Random Missing Control
-
Abstract: In this paper, the robust H∞-control problem is reported for a class of uncertain discrete-time fuzzy systems with both multiple probabilistic delays and multiple missing measurements and random missing control from the fuzzy controllers to the actuator. A sequence of random variables including accounting for the probabilistic communication delays and the random missing control are thought as mutually independent and obey the Bernoulli distribution. The measurement-missing phenomenon can be assumed to occur stochastically. Assumption that the missing probability for each sensor satisfies a certain probabilistic distribution in the interval [0 1] is given. Much attention is focused on design of H∞ the fuzzy output feedback controllers to ensure that the resulting close-loop Takagi-Sugeno (T-S) system is exponentially stable in the mean square. The developed method makes disturbance rejection attenuation satisfy a given level by means of the H∞-performance index. Intensive analysis is employed to reach the sufficient conditions about the existence of admissible output feedback controllers which satisfies the exponential stability as well as the prescribed H∞ performance. In addition, the cone-complementarity linearization procedure is utilized to transform the controller-design problem into a sequential minimization one which can be solved by the semi-definite program method. Simulation results conform the feasibility as weil as the effectiveness of the proposed design method.
-
Key words:
- Discrete-time fuzzy systems /
- fuzzy control /
- multiple missing control /
- multiple missing measurements /
- multiple probabilistic time delays /
- networked-control systems (NCSs) /
- robust H∞ control /
- stochastic systems
摘要: 这篇文章研究了一类不确定离散时间模糊系统的鲁棒H∞控制问题,这类系统是既含有多个随机延迟、多测量丢失又含有从模糊控制器到发生器的随机控制信号丢失.描述随机通信延迟和随机控制信号丢失的随机变量认为是相互独立服从贝努利分布.测量丢失现象认为是随机发生的.假定对每个传感器发生丢失的概率位于区间[0,1]上是给定的.大量的注意力集中在设计H∞模糊输出反馈控制器以确保所得到的闭环Tagagi-Segeno(T-S)系统按均方意义是指数稳定的.文章所用的方法能使扰动拒绝达到说给定的指标.通过大量的分析得出了满足指数稳定性以及预先给定的H∞性能指标的容许输出反馈控制器的存在性的充分条件.另外,锥型补线性化过程用于将控制设计问题转化成能用半正定规划方法求解的序列极小化问题.最后,模拟结果证实了文中所提出的设计方法的可行性和有效性. -
1. Introduction
The radial basis function network-based state-dependent autoregressive (RBF-AR) model, which offers a very flexible structure for nonlinear time-series modeling, has been extensively studied. For example, Shi et al. [1] developed the RBF-AR model to reconstruct the dynamics of given nonlinear time series. Peng et al. [2] extended the RBF-AR model to the case where there are several exogenous variables (RBF-ARX model) for the system. Following this method, Gan et al. [3], [4] successively developed the locally linear radial basis function network-based autoregressive (LLRBF-AR) model and a gradient radial basis function based varying-coefficient autoregressive (GRBF-AR) model. The major feature of the RBF-AR (X) model which is superior to the black-box models based on general function approximations is that: the RBF-AR (X) model may provide some insights into the system dynamics due to its quasi-linear structure, whereas the general function approximations cannot.
The identification of the RBF networks includes the choice of topology (e.g., the number of neurons) and estimation of all the parameters. For selecting the number of neurons, several methods have been proposed, e.g., the Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross validation. Therefore we must first have a good model parameter estimation method, and then we can repeat the method for models with different number of neurons, before finally selecting the best model. In this paper, the majority of our works focuses on determining the parameters of RBF networks.
The estimation of the RBF-AR (X) model is a difficult task but is a key procedure for successfully applying the RBF-AR (X) model. Peng et al. [2] presented a structured nonlinear parameter optimization method (SNPOM) for the RBF-AR (X) model estimation. Researches [2], [5], [6] shown the SNPOM can optimize all of the model parameters and can also accelerate the computational convergence of the optimization search process. However, the primary shortcoming of the SNPOM is that it can be easily trapped in a local optimum because of an initial guess value. To overcome the problem of local minima, Gan et al. [7] proposed the hybrid evolutionary algorithm (EA)-SNPOM and simulation results indicated that the hybrid algorithm provides better results than either method alone (EA and SNPOM), but the complexity and running time increased substantially. Also, Gan et al. [8] proposed a variable projection (VP) approach to efficiently estimate the parameters of the RBF-ARX model. Although these identification methods for RBF-AR (X) have good estimation results, their performance may get deteriorated in case of measurement with noise interference.
As Peng et al. [2] and Gan et al. [7] mentioned that RBF-AR (X) can be regarded as a more general nonlinear model than the RBF neural network and as a generalization of the RBF network. Furthermore, any kind of RBF and the RBF-ARX model parameter estimation procedure must include the selection of appropriate centers and scaling factors, and estimation of all the linear weights of the RBF networks in the model.
Various derivative-based methods have been used to train RBF neural network, including gradient descent [9] and the well-known back propagation [10]. Although gradient descent training for RBF networks has proven to be much more effective than lots of conventional methods, it can be computationally expensive. An alternative to sequentially estimate RBF network is the extended Kalman filter (EKF)-based neural network sequential learning method [11]-[13]. The sequential EKF algorithm is a fast training algorithm, which takes all the network parameters (weights or weights and centers) as a state vector in state-space and adjusts them to adapt to measurements [14]. Because it makes use of second order statistics (covariance), the EKF is an improvement over conventional neural network estimation techniques, such as back-propagation and gradient descent training algorithm. However, a well known limitation of the EKF is the assumption of known a priori statistics to describe the measurement and process noise. Setting these noise levels appropriately often makes the difference between success and failure in the use of the EKF. Especially in environments where the noise statistics change with time, the EKF can lead to large estimation errors and even to divergence. Thus, the selection of the initial states, the measurement noise covariance matrix and the process noise covariance matrix are very important in the control of the convergence of the EKF learning algorithm. To overcome the difficulties above, a number of improved learning methods were proposed, such as EKFQ (the EKF algorithm with evidence maximization and sequentially updated priors) [15], [16], APNCPF (adaptive process noise covariance particle filter) [17], a hybrid EKF and switching PSO (particle swarm optimization) algorithm [18], and a particle filtering algorithm in combination with the kernel smoothing method [19]. The EKFQ and APNCPF algorithms employ adaptive noise parameters, but the problem of choosing the right window length, which is used to update noise covariance matrices, results in a regularization/tracking dilemma because of unknown prior knowledge. Moreover, the above algorithms do not consider the optimal initial values of parameters, which will lead to some imprecise information about the spread or the shape of the posterior [20].
The expectation-maximization (EM) algorithm [21] was developed to learn parameters of statistical models in the presence of incomplete data or hidden variables. Lázaro et al. [22] and Constantinopoulos et al. [23] extended the EM learning strategy for estimation of the neural network weights, such as the multi-layer perception (MLP) and the Gaussian radial basis function (RBF) networks. Research results indicate that the EM method is effective and simple to obtain maximum likelihood (ML) estimates of the parameter and states.
In this paper, the proposed expectation-maximization extended Kalman filter (EM-EKF) method, which combines the expectation-maximization and the extended Kalman filtering, is developed to identify the RBF-AR model. The method is more accurate as it involves extended Kalman smoothing, which provides a minimum variance Gaussian approximation to the posterior probability density function. To use the EM-EKF algorithm for state space learning, RBF-AR model is reconstructed as a general RBF network, which has additional linear output weight layer compared with the traditional three-layer RBF network. Thus, the general RBF network is represented by state-space model, and the centers and the weights of the general RBF network are treated as hidden state variables. The learning algorithm for RBF-AR model possesses advantages of both the expectation-maximization and of the extended Kalman filtering and smoothing. Specifically, the EM algorithm is utilized to estimate parameters of the state-space model, such as the measurement and process noise variance and the initial states, while the extended Kalman filtering and smoothing are used to estimate the approximate state distribution. The proposed algorithm simplifies the optimizing estimation of the maximum likelihood by making the expectation maximal, and EKF and smoothing process realize the more exact estimation of the expectation. This learning technique can improve the performance of the EKF-based neural network sequential learning method by estimating the noise variance and the initial states. The performance and effectiveness of our proposed method are evaluated by the Mackey-Glass time series through three cases.
The contributions of this paper comprise two parts. First, RBF-AR model is reconstructed as a new type of general radial basis function (RBF) neural network, which makes it able to estimate the parameters of RBF-AR model using the EKF. Second, by combining the expectation maximization and extended Kalman filtering and smoothing process, the EM-EKF method is developed to estimate RBF-AR model, which can give joint state and parameter estimates.
The structure of the remaining of this paper is as follows. Section 2 presents the state-space representation of the RBF-AR model. The EM-EKF method is developed to identify the RBF-AR model in Section 3. The cases studies are shown in Section 4. Finally, the paper is concluded in Section 5.
2. The State-space Representation of the RBF-AR Model
We are interested in the nonlinear time series that can be described by the following state-dependent AR (SD-AR) model
$$ \begin{align} \begin{cases} y_{t}=\phi_{0}\, (\boldsymbol{X}_{t-1})+\sum\limits_{i=1}^p\phi_{i}\, (\boldsymbol{X}_{t-1})y_{t-i}+e_{t}\\[1mm] \boldsymbol{X}_{t-1}=[y_{t-1}, y_{t-2}, \ldots, y_{t-d}]^{{T}} \end{cases} \end{align} $$ (1) where $y_{t}$ $(t=1, \ldots, T)$ is the output, $\boldsymbol{X}_{t-1}$ is regarded as the state vector at time $t$ , which contains only the output series in this case (in other cases it may contain the input series or another). $\phi_{i}(\boldsymbol{X}_{t-1})$ $(i=0, 1, \ldots, p)$ are the state-dependent function coefficients of the model, $p$ is the model order, and $d$ is the dimension of the state vector. $e_{t}$ denotes Gaussian white noise.
Although the SD-AR model provides a useful framework for general nonlinear time series modeling, the problem is how to specify the functional form of its state-dependent coefficients. An efficient approach to solve the problem is to use Gaussian RBF neural networks approximations of coefficients of model (1) [2], and then the model derived is called the RBF-AR model, which is given by
$$ \begin{align} \begin{cases} y_{t}=\phi_{0}\, (\boldsymbol{X}_{t-1})+\sum\limits_{i=1}^p\phi_{i}\, (\boldsymbol{X}_{t-1})y_{t-i}+e_{t}\\[1mm] \phi_{0}\, (\boldsymbol{X}_{t-1})=\omega_{0, 0}+\sum\limits_{k=1}^m\omega_{0, k}\exp\left\{-\lambda_{k}\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|_{2}^{2}\right\}\\[1mm] \phi_{i}\, (\boldsymbol{X}_{t-1})=\omega_{i, 0}+\sum\limits_{k=1}^m\omega_{i, k}\exp\left\{-\lambda_{k}\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|_{2}^{2}\right\}\\[1mm] \boldsymbol{Z}_{k}=[z_{k, 1}, \ldots, z_{k, d}]^{{T}} \end{cases} \end{align} $$ (2) where $\boldsymbol{Z}_{k}$ $(k=1, 2, \ldots, m)$ are the centers of the local linear RBF networks, $\lambda_{k}$ $(k=1, 2, \ldots, m)$ are the scaling parameters, $\omega_{i, k}$ $(i=0, 1, 2, \ldots, p; k=0, 1, 2, \ldots, m)$ are the linear weights, $m$ and $d$ are the number of hidden neurons and the dimension of the centers (the dimension of the centers is the same as the dimension of the state vector), respectively, and $\|\cdot\|_{2}$ denotes the vector 2-norm.
In general case, the RBF networks in model (2) may have different centers for different regression variables. However, in some applications, all the RBF networks may be allowed to share the same centers, because model (2) possesses the autoregressive structure, thus, although the RBF centers are the same in that case, the regressive polynomials' coefficients are different. Thus, the RBF-AR model can be seen as a general RBF network, which has two hidden layers with $m$ and $p+1$ neurons, respectively, and the output layer with one output. In this structure, the identification of RBF-AR model is to estimate the centers, the scaling parameters and the weights of the general RBF network. Fig. 1 shows the schematic of the RBF-AR model as a general RBF network.
To simplify the nonlinear optimization steps, the scaling parameters may be selected as [2]
$$ \begin{align} %\begin{cases} \lambda_{k}=\dfrac{-\log\epsilon_{k}}{\max\limits_{t-1}\{\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|^{2}\}}, \quad \epsilon_{k}\in[0.0001, 0.1]. %\end{cases} \end{align} $$ (3) Using this heuristic way, the weights will become bounded and stable when the signal $\boldsymbol{X}_{t-1}$ is far away from the centers $\boldsymbol{Z}_{k}$ .
To apply a filtering algorithm to the RBF network training, the state-space model is established, which is given by
$$ \begin{align} \begin{cases} \boldsymbol {\theta}_{t}=\boldsymbol{\theta}_{t-1}+\boldsymbol{v}_{t}\\[1mm] y_{t}=g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})+\xi_{t} \\[1mm] \ \ \ \, =\phi_{0}\, (\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})+\sum\limits_{i=1}^{p}\phi_{i}\, (\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})y_{t-i}+\xi_{t} \end{cases} \end{align} $$ (4) where $\boldsymbol{\theta}=[\boldsymbol{\omega}_{0}^{{T}}~ \boldsymbol{\omega}_{1}^{{T}}~ \ldots~ \boldsymbol{\omega}_{p}^{{T}}~ \boldsymbol{Z}_{1}^{{T}}~ \ldots~ \boldsymbol{Z}_{m}^{{T}}]^{{T}}$ ( $\boldsymbol{\omega}_j^{{T}}=[\omega_{0, j}$ $\omega_{1, j}$ $\ldots~\omega_{m, j}]$ , $0 \leq j \leq p$ ; $\boldsymbol{Z}_k=[z_{k, 1}~z_{k, 2}~\ldots~z_{k, d}]^{{T}}$ , $1$ $\leq$ $k$ $\leq$ $m$ ) represents the system state vector, $y_{t}$ is the observation variable, $\boldsymbol{v}_{t}$ and $\xi_{t}$ denote the process noise and observation noise, which are assumed to be zero-mean Gaussian processes with the covariance $\boldsymbol{Q}$ and $R$ , namely $\boldsymbol{v}_{t}$ $\sim$ ${\rm N}(0, \boldsymbol{Q}) $ , $\xi_{t}\sim {\rm N}(0, R) $ . The initial state (parameters) $\boldsymbol{\theta}_{0}$ is normally-distributed with mean $\boldsymbol{\mu}_{0}$ and covariance $\boldsymbol{\Xi}_{0}$ . Obviously, the crux of the matter is that both the system hidden state $\boldsymbol{\theta}_{t}$ and the parameters $\boldsymbol{\varphi}=(\boldsymbol{Q}, R, \boldsymbol{\mu}_{0}, \boldsymbol{\Xi}_{0})$ are unknown.
3. The EM-EKF Method
To simultaneously estimate parameters and hidden states, the expectation maximization (EM) is incorporated with extended Kalman filtering and smoothing, which aims to integrate over the uncertain estimates of the unknown hidden states and optimize the resulting marginal likelihood of the parameters given the observed data. The algorithm realizes the more exact estimation of the posterior distribution by use of the extended Kalman filtering and smoothing.
The EM algorithm is an iterative method for finding a mode of the likelihood function. To derive the EM algorithm for nonlinear state space models, we need to develop an expression for the likelihood of the completed data. We assume that the likelihood of the data given the states, the initial conditions and the evolution of the states can be represented by Gaussian distributions. Thus, if the initial mean and covariance of the states is given by $\boldsymbol{\mu}_{0}$ and $\boldsymbol{\Xi}_{0}$ , then
$$ p(\boldsymbol{\theta}_{0}|\boldsymbol{\varphi})= (2\pi)^{-\frac{l}{2}}|\boldsymbol{\Xi}_{0}|^{-\frac{1}{2}} \nonumber\\ \quad\ \times\exp\left[-\dfrac{1}{2}\, (\boldsymbol{\theta}_{0}- \boldsymbol{\mu}_{0})^{T}\boldsymbol{\Xi}_{0}^{-1}\, (\boldsymbol{\theta}_{0}-\boldsymbol{\mu}_{0})\right] $$ (5) $$ p(\boldsymbol{\theta}_{t}|\boldsymbol{\theta}_{t-1}, \boldsymbol{\varphi}) =(2\pi)^{-\frac{l}{2}}|\boldsymbol{Q}|^{-\frac{1}{2}}\nonumber\\ \quad\ \times\exp\left[-\dfrac{1}{2}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1})^{T}\boldsymbol{Q}^{-1}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1})\right] $$ (6) $$ p(y_{t}|\boldsymbol{\theta}_{t}, \boldsymbol{\varphi})=(2\pi)^{-\frac{n}{2}}|R|^{-\frac{1}{2}}\nonumber\\ \quad\ \times\exp\left[-\dfrac{1}{2}\, (y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))^{T}R^{-1}\, (y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))\right] $$ (7) where $l=(1+m)(1+p)+md$ is the dimension of the state vector, $n=1$ is the dimension of the observation vector.
Then, the log-likelihood of the complete data is given by:
$$ \begin{align} \ln p(&\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi}) =-\dfrac{1}{2}[\boldsymbol{\theta}_{0}-\boldsymbol{\mu}_{0}]^{T} \boldsymbol{\Xi}_{0}^{-1}[\boldsymbol{\theta}_{0}-\boldsymbol{\mu}_{0}] \nonumber\\ &-\dfrac{1}{2}\ln |\boldsymbol{\Xi}_{0}|-\dfrac{T(l+n)+l}{2}\ln 2\pi-\dfrac{T}{2}\ln |\boldsymbol{Q}|\nonumber\\ &-\dfrac{T}{2}\ln |R|-\sum\limits_{t=1}^{T}\dfrac{1}{2} [\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1}]^{T} \boldsymbol{Q}^{-1}[\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1}] \nonumber\\ &-\sum\limits_{t=1}^{T}\dfrac{1}{2}[y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})]^{T}R^{-1}[y_{t}-g(\boldsymbol {\theta}_{t}, \boldsymbol{X}_{t-1})]. \end{align} $$ (8) If we take the expectation of the log-likelihood for the complete data, we get the following expression:
$$ \begin{align} & E\ln\left[p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})\right]=-\dfrac{1}{2}\ln |\boldsymbol{\Xi}_{0}|-\dfrac{T}{2}\ln |\boldsymbol{Q}|-\dfrac{T}{2}\ln |R|\nonumber\\ &\ \ \, -\dfrac{1}{2}E\left[\boldsymbol{\theta}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\theta}_{0}-\boldsymbol{\theta}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\mu}_{0}-\boldsymbol{\mu}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\theta}_{0}+\boldsymbol{\mu}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\mu}_{0}\right]\nonumber\\ &\ \ \, -\dfrac{1}{2}\sum\limits_{t=1}^{T}E\left[\boldsymbol{\theta}_{t}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t-1}-\boldsymbol{\theta}_{t-1}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t}\right.\nonumber\\ &\ \ \, +\left.\boldsymbol{\theta}_{t-1}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t-1}\right]-\dfrac{T(l+n)+l}{2}\ln 2\pi\nonumber\\ &\ \ \, -\dfrac{1}{2}\sum\limits_{t=1}^{T}E\left[(y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))^{{T}}R^{-1}\, (y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))\right]. \end{align} $$ (9) To compute the expectation of the measurement mapping $g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})$ , the Taylor expansion of this mapping around the smoothing estimate $\boldsymbol{\theta}_{t|T}$ is given by
$$ \begin{align} &g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})=g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\nonumber\\ &\qquad +\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})+\cdots \end{align} $$ (10) where the smoothing estimate $\boldsymbol{\theta}_{t|T}$ denotes the conditional mean of $\boldsymbol{\theta}_{t}$ given $y_{1:T}=\{y_1, \ldots, y_{T}\}$ .
The EKF utilizes only the first-order Taylor expansion, so we can obtain
$$ \begin{align} &g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})\approx g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\nonumber\\ &\qquad\left. +\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \boldsymbol{\theta}_{t}}\right|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T}). \end{align} $$ (11) Under the basic assumptions of the EKF (no model and parameter uncertainty, zero-mean white-noise sequence, known process and measurement models, etc.), the smoothing estimates are unbiased, and the smoothing estimates satisfy
$$ \begin{align} &E(\boldsymbol{\theta}_{t|T})=\boldsymbol{\theta}_{t}. \end{align} $$ (12) So we can get
$$ \begin{align} &E(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))\approx g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}). \end{align} $$ (13) Subsequently, we compute the covariance of $g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})$
$$ \begin{align} &E\Big[\big(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)\nonumber\\ &\qquad\qquad\qquad\quad\quad\quad\big(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}}\Big]\nonumber\\ &\quad\quad\, \, \approx E\Bigg[\bigg[\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})\bigg]\nonumber\\ &\qquad\quad\ \, \times\bigg[\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})\bigg]^{{T}}\Bigg]\nonumber\\ &\quad\quad\, \, =\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}. \end{align} $$ (14) Hence, it follows that:
$$ \begin{align} &E\left[g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})\big(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})\big)^{{T}}\right]\nonumber\\ &\qquad \approx \boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}+g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big(g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}} \end{align} $$ (15) where $\boldsymbol{G}_{t}$ corresponds to the Jacobian matrix of the measurement function:
$$ \begin{align} &\boldsymbol{G}_{t}=\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|t-1})}\nonumber\\ &\quad\, =\bigg[\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \theta_{1, t}}~\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \theta_{2, t}}~ \cdots ~\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \theta_{l, t}}\bigg] \end{align} $$ (16) and $\boldsymbol{P}_{t|T}$ denotes the conditional covariance of $\boldsymbol{\theta}_{t}$ given $y_{1:T}$ $=$ $\{y_{1}, \ldots, y_{T}\}$ :
$$ \begin{align} &\boldsymbol{P}_{t|T}=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})^{{T}}\right]. \end{align} $$ (17) Equations (13)-(15) are substituted in (9), and then we get
$$ \begin{align} &E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})] =-\dfrac{1}{2}\ln |\boldsymbol{\Xi}_{0}|-\dfrac{T}{2}\ln |\boldsymbol{Q}| \nonumber\\ &\qquad -\dfrac{T}{2}\ln |R|-\dfrac{1}{2}{\rm tr}\bigg\{\boldsymbol{\Xi}_{0}^{-1}[\boldsymbol{\theta}_{0|T} \boldsymbol{\theta}_{0|T}^{{T}}\nonumber\\ &\qquad-2\boldsymbol{\theta}_{0|T} \boldsymbol{\mu}_{0}^{{T}}+\boldsymbol{\mu}_{0} \boldsymbol{\mu}_{0}^{{T}}+\boldsymbol{P}_{0|T}]\bigg\}\nonumber\\ &-\dfrac{1}{2}\sum\limits_{t=1}^{T}{\rm tr}\bigg\{\boldsymbol{Q}^{-1} \big[\boldsymbol{\theta}_{t|T}\boldsymbol{\theta}_{t|T}^{{T}} +\boldsymbol{P}_{t|T}\nonumber\\ &-2(\boldsymbol{\theta}_{t|T} \boldsymbol{\theta}_{t-1|T}^{{T}}+\boldsymbol{P}_{t, t-1|T})^{{T}}\nonumber\\ &+\boldsymbol{\theta}_{t-1|T}\boldsymbol{\theta}_{t-1|T}^{{T}} +\boldsymbol{P}_{t-1|T}\big]\bigg\}\nonumber\\ &-\dfrac{(l+n)T+l}{2}\ln 2\pi\nonumber\\ &-\dfrac{1}{2}\sum\limits_{t=1}^{T}{\rm tr}\bigg\{R^{-1}\big[(y_{t}-g(\boldsymbol {\theta}_{t|T}, \boldsymbol{X}_{t-1}))\nonumber\\ &\times(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}))^{{T}}+\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\big]\bigg\}. \end{align} $$ (18) Smoothing often includes the forward and backward filtering over a segment of data so as to obtain improved estimates. The forward filtering stage involves computing the estimates $\boldsymbol{\theta}_{t|t}$ and $\boldsymbol{P}_{t|t}$ over a segment of $T$ samples, where $\boldsymbol{\theta}_{t|t}$ and $\boldsymbol{P}_{t|t}$ denote the conditional mean and conditional covariance of $\boldsymbol{\theta}_{t}$ given $y_{1:t}=\{y_{1}, \ldots, y_{t}\}$ . Then, the EKF-based forward filtering for estimating model (4) can be derived as follows. The predicted state vector:
$$ \begin{align} &\boldsymbol{\theta}_{t|t-1}=E[\boldsymbol{\theta}_{t}|y_{1:t-1}]=\boldsymbol{\theta}_{t-1|t-1}. \end{align} $$ (19) The predicted conditional covariance of $\boldsymbol{\theta}_{t}$ :
$$ \begin{align} \boldsymbol{P}_{t|t-1}&=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t-1})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t-1})^{{T}}\right]\notag\\ &=\boldsymbol{P}_{t-1|t-1}+\boldsymbol{Q}. \end{align} $$ (20) The Kalman gain:
$$ \begin{align} &\boldsymbol{K}_{t}=\boldsymbol{P}_{t|t-1}\boldsymbol{G}^{{T}}\, (\boldsymbol{G}_{t}\boldsymbol{P}_{t|t-1}\boldsymbol{G}_{t}^{{T}}+R)^{-1}. \end{align} $$ (21) Updated system state estimate:
$$ \begin{align} \boldsymbol{\theta}_{t|t}&=E[\boldsymbol{\theta}_{t}|y_{1:t}]\notag\\ &=\boldsymbol{\theta}_{t|t-1}+\boldsymbol{K}_{t}[y_{t}-g(\boldsymbol{\theta}_{t|t-1}, \boldsymbol{X}_{t-1})]. \end{align} $$ (22) Updated estimate covariance:
$$ \begin{align} \boldsymbol{P}_{t|t}&=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t})^{{T}}\right]\notag\\ &=\boldsymbol{P}_{t|t-1}-\boldsymbol{K}_{t}\boldsymbol{G}_{t}\boldsymbol{P}_{t|t-1}. \end{align} $$ (23) To obtain the smoothing estimates $\boldsymbol{\theta}_{t|T}$ and $\boldsymbol{P}_{t|T}$ , we employ the Rauch-Tung-Strieber smoother to do the following backward recursions
$$ \begin{align} \begin{cases} \boldsymbol{J}_{t}=\boldsymbol{P}_{t|t}\boldsymbol{P}_{t+1|t}^{-1}\\[1mm] \boldsymbol{\theta}_{t|T}=\boldsymbol{\theta}_{t|t}+\boldsymbol{J}_{t}\, (\boldsymbol{\theta}_{t+1|T}-\boldsymbol{\theta}_{t+1|t})\\[1mm] \boldsymbol{P}_{t|T}=\boldsymbol{P}_{t|t}+\boldsymbol{J}_{t}\, (\boldsymbol{P}_{t+1|T}-\boldsymbol{P}_{t+1|t})\boldsymbol{J}_{t}^{{T}}. \end{cases} \end{align} $$ (24) We also require the cross-covariance $\boldsymbol{P}_{t, t-1|T}$ , which is defined as follows
$$ \begin{align} &\boldsymbol{P}_{t, t-1|T}=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})^{{T}}\right]. \end{align} $$ (25) And it can be obtained through the backward recursions
$$ \begin{align} &\boldsymbol{P}_{t, t-1|T}=\boldsymbol{P}_{t|t}\boldsymbol{J}_{t-1}^{{T}}+\boldsymbol{J}_{t}\, (\boldsymbol{P}_{t+1, t|T}-\boldsymbol{P}_{t|t})\boldsymbol{J}_{t-1}^{{T}}. \end{align} $$ (26) The backward recursions as above are initialized as following
$$ \begin{align} \begin{cases} \boldsymbol{\theta}_{T|T}=\boldsymbol{\theta}_{T}\\[1mm] \boldsymbol{P}_{T|T}=\boldsymbol{P}_{T}\\[1mm] \boldsymbol{P}_{T, T-1|T}=(\boldsymbol{I}-\boldsymbol{K}_{T}\boldsymbol{G}_{T})\boldsymbol{P}_{T-1|T-1}. \end{cases} \end{align} $$ (27) Compared with the filtering algorithm that uses the observations up to time $t$ for estimation of the state $\boldsymbol{\theta}_{t}$ , the smoothing yields a more accurate estimate $\boldsymbol{\theta}_{t|T}$ by using all available data up to time $T$ .
To find the optimal parameters $\boldsymbol{\varphi}$ , we need to maximize the expected value of the log-likelihood with respect to the parameters, and then compute the derivatives with respect to each parameter individually. That is
$$ \begin{align} \begin{cases} \dfrac{\partial }{\partial \boldsymbol{\mu}_{0}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[3mm] -\dfrac{1}{2}\dfrac{\partial}{\partial \boldsymbol{\mu}_{0}}{\rm tr} \left\{\boldsymbol{\Xi}_{0}^{-1}[(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})^{{T}}+\boldsymbol{P}_{0|T}]\right\}=0\\[3mm] \boldsymbol{\Xi}_{0}^{-1}\, (\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})=0\\ \boldsymbol{\mu}_{0}=\boldsymbol{\theta}_{0|T} \end{cases} \end{align} $$ (28) $$ \begin{align} \begin{cases} \dfrac{\partial }{\partial \boldsymbol{\Xi}_{0}^{-1}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[3mm] \dfrac{1}{2}\dfrac{\partial}{\partial \boldsymbol{\Xi}_{0}^{-1}}\Big\{\ln |\boldsymbol{\Xi}_{0}^{-1}|-{\rm tr}\big[\boldsymbol{\Xi}_{0}^{-1}\, ((\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})^{{T}}\\[1mm] \qquad\ \ +~\boldsymbol{P}_{0|T})\big]\Big\}=0\\[2mm] \dfrac{\boldsymbol{\Xi}_{0}}{2}-\dfrac{1}{2}\left((\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})^{{T}}+\boldsymbol{P}_{0|T}\right)=0\\[2mm] \boldsymbol{\Xi}_{0}=\boldsymbol{P}_{0|T} \end{cases} \end{align} $$ (29) $$ \begin{align} \begin{cases} \dfrac{\partial }{\partial \boldsymbol{Q}^{-1}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[3mm] \dfrac{\partial}{\partial \boldsymbol{Q}^{-1}}\left\{\dfrac{T}{2}\ln |\boldsymbol{Q}^{-1}|-\dfrac{1}{2}\big[{\rm tr}(\boldsymbol{Q}^{-1}\, (\boldsymbol{\Gamma}-2\boldsymbol{\Upsilon}^{{T}}+\boldsymbol{\Lambda}))\big]\right\}=0\\[3mm] \dfrac{T}{2}\boldsymbol{Q}-\dfrac{1}{2}\left(\boldsymbol{\Gamma}-2\boldsymbol{\Upsilon}^{{T}}+\boldsymbol{\Lambda}\right)=0\\[1mm] \boldsymbol{Q}=\dfrac{1}{T}\left(\boldsymbol{\Gamma}-2\boldsymbol{\Upsilon}^{{T}}+\boldsymbol{\Lambda}\right) \end{cases} \end{align} $$ (30) $$ \begin{align} \begin{cases} \dfrac{\partial }{\partial R^{-1}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[2mm] \dfrac{\partial}{\partial R^{-1}}\bigg\{\dfrac{T}{2}\ln |R^{-1}|-\dfrac{1}{2}\sum\limits_{t=1}^{T}{\rm tr}\left[R^{-1}\big((y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}))\right.\\[1mm] \qquad\times\left.(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}))^{{T}}+\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\big)\right]\bigg\}=0\\[2mm] \dfrac{T}{2}R-\dfrac{1}{2}\sum\limits_{t=1}^{T}\bigg[\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}}\\[1mm] \qquad +~\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\bigg]=0\\[2mm] R=\dfrac{1}{T}\sum\limits_{t=1}^{T}\bigg[\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}}\\[1mm] \qquad +~\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\bigg] \end{cases} \end{align} $$ (31) where
$$ \begin{align} \begin{cases} \boldsymbol{\Gamma}=\sum\limits_{t=1}^{T}\left(\boldsymbol{\theta}_{t|T}\boldsymbol{\theta}_{t|T}^{{T}}+\boldsymbol{P}_{t|T}\right)\\[2mm] \boldsymbol{\Lambda}=\sum\limits_{t=1}^{T}\left(\boldsymbol{\theta}_{t-1|T}\boldsymbol{\theta}_{t-1|T}^{{T}}+\boldsymbol{P}_{t-1|T}\right)\\[2mm] \boldsymbol{\Upsilon}=\sum\limits_{t=1}^{T}\left(\boldsymbol{\theta}_{t|T}\boldsymbol{\theta}_{t-1|T}^{{T}}+\boldsymbol{P}_{t, t-1|T}\right). \end{cases} \end{align} $$ (32) It is significant to mention that the EM algorithm is applied to obtain maximum likelihood (ML) estimates of the above parameters and states, which can reduce the computational complexity and guarantee the convergence to a stationary point while continuously increasing the ML function. But EM algorithm is computationally expensive when the state dimension is high.
4. Simulation Results
To evaluate the performance of the presented approach, we predict the well-known Mackey-Glass time series through three cases. In the first case, the data is generated from the following Mackey-Glass equation:
$$ \begin{align} \dfrac{dy_{t}}{dt}=\dfrac{ay_{t-\tau_{0}}}{1+y_{t-\tau_{0}}^{c}}-by_{t} \end{align} $$ (33) where the parameters are chosen to be $a=0.2$ , $b=0.1$ , $c$ $=$ $10$ and $\tau_{0}=20$ . A thousand values are sampled and the series is shown in Fig. 2. This chaotic benchmark time series was studied in [1], [2]. We use the RBF-AR $(p, m, d)$ model to predict the nonlinear time series, where the parameters $p$ , $m$ and $d$ are defined as shown in model (2). The first 500 data points are used to train the model, and the last 500 data are used to test the model.
To make the comparisons between the SNPOM, the EKF and the EM-EKF, we predict the value $y_{t}$ from the fixed input vector $[y_{t-5}, y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1}]$ . Thus, an RBF-AR(5, 3, 2) model is used to predict the one-step ahead output of this complex nonlinear time series as follows:
$$ \begin{align} \begin{cases} y_{t}=\phi_{0}\, (\boldsymbol{X}_{t-1})+\sum\limits_{i=1}^{5}\phi_{i}\, (\boldsymbol{X}_{t-1})y_{t-i}+e_{t}\\[3mm] \phi_{i}\, (\boldsymbol{X}_{t-1})=\omega_{i, 0}+\sum\limits_{k=1}^{3}\omega_{i, k}\exp\left\{-\lambda_{k}\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|_{2}^{2}\right\}\\[3mm] \boldsymbol{X}_{t-1}=[y_{t-1}, y_{t-2}]^{{T}}. \end{cases} \end{align} $$ (34) Table Ⅰ reports the comparison results of the proposed EM-EKF, the EKF and the SNPOM. The estimated results are the mean squared errors (MSEs), observation noise variance and the standard deviations (given in the parentheses). It is worth mentioning that Table Ⅰ gives two different estimation results using EKF in the cases of given different measurement noise variances. From Table Ⅰ we can see that the MSEs of the training data using EM-EKF are lesser than those using SNPOM and EKF, while the testing data is only slightly lesser. We attribute this to the fact that the SNPOM has good estimation results for clean data. Besides, the SNPOM obtains slightly better results than does the EKF in these conditions of given different measurement noise variances ( $R=0.002$ or $R=0.0002$ ). Furthermore, by comparison between the two conditions in the EKF, the closer the given measurement noise variance is to the true value ( $R=0$ ), the better the results are. This illustrates that setting the measurement noise appropriately leads to success or failure when using EKF. The estimate of observation noise variance using EM-EKF is close to the true value (zero), while the estimate of that using SNPOM is unknown, and the observation noise variance in the EKF must be given in advance. In Fig. 3, the top plot shows the increase of the log-likelihood at each step, the middle plot shows the convergence of the measurements noise variance $R$ , and the bottom plot shows the trace of the process noise variance $\boldsymbol{Q}$ . The initial conditions for the state $\boldsymbol{\theta}_{0}$ and the measurements noise variance $R$ are randomly chosen with a uniform distribution from the range [0,1] and [0, 0.01], respectively, and matrices $\boldsymbol{Q}$ and $\boldsymbol{P}$ are initialized as identity matrix and 100 times identity matrix, respectively.
表 Ⅰ COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIESTable Ⅰ COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIESMethod MSE (Training) MSE (Testing) $R$ SNPOM 1.0800E-7 1.2600E-7 unknown EKF 1.9560E-7 2.1786E-7 0.002 (given) EKF 1.2559E-7 1.9793E-7 0.0002 (given) EM-EKF 7.1765E-8 1.2008E-7 1.8146E-7 To examine the noise effect in the time series, the EM-EKF, the EKF and the SNPOM are used to estimate the Mackey-Glass chaotic time series corrupted by additive white noise with variance 0.25 and 1 in the following two cases (see Fig. 4). Table Ⅱ gives the performance of the EM-EKF compared to those of the SNPOM and the EKF on estimating the Mackey-Glass chaotic time series corrupted by additive white noise. From Table Ⅱ, one can see that the overall performance obtained by the EM-EKF is better than that of the SNPOM and the EKF. In detail, the EM-EKF has smaller values of MSEs than those of the SNPOM and the EKF for the training data and testing data, although the improvement is not significant. As for the EKF, when the given values of the measurement noise variance ( $R=0.1$ and $R=0.6$ ) deviate relatively far from the true values ( $R=0.25$ and $R=1$ ), the results become worse, even worse than those obtained by the SNPOM. On the contrary, when the given values ( $R=0.2$ and $R=0.8$ ) are relatively close to the true values, the EKF obtains slightly better results than does the SNPOM, but still obtains worse results than does the EM-EKF. This illustrates that setting of the measurement noise plays an important part in the use of the EKF again. In both cases, the values of measurements noise variance $R$ using EM-EKF are 0.25950 and 1.0589, respectively, which are very close to the true values. In contrast, the SNPOM can not estimate the values of measurement noise variance, and the values of measurements noise variance should be given beforehand in the use of the EKF. The initial conditions are the same as case one except that $R$ is randomly chosen with a uniform distribution from the range $[0,1]$ . Figs. 5 and 6 show the convergent process of the log-likelihood and noise variance.
表 Ⅱ COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIES CORRUPTES BY ADDITIVE WHITE NOISETable Ⅱ COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIES CORRUPTES BY ADDITIVE WHITE NOISENoise variance Method MSE (Training) MSE (Testing) R 0.25 SNPOM 0.26606 0.28120 unknown EKF 0.26577 0.28082 0.2 (given) EKF 0.27343 0.28825 0.1 (given) EM-EKF 0.25750 0.27825 0.25950 1 SNPOM 0.97452 1.1644 unknown EKF 0.97215 1.1512 0.8 (given) EKF 0.98262 1.1785 0.6 (given) EM-EKF 0.96590 1.13907 1.0589 Statistical $F$ test is used to judge whether the error variance estimated by the EM-EKF is equal to the variance estimated by the SNPOM. As shown in Table Ⅲ, the values of the computing $F$ test statistic ( $F$ ) are greater than the table value ( $F_{\alpha=0.05}$ ) except for the testing data without additional white noise, so the null hypothesis that two variances are equal is rejected at level $\alpha=0.05$ . This indicates that there are significant differences between the above two methods and the performance of the EM-EKF is superior to the SNPOM.
表 Ⅲ THE RESULTS OF STATISTICAL F TEST AT LEVEL 0.05Table Ⅲ THE RESULTS OF STATISTICAL F TEST AT LEVEL 0.05Case $F_{\alpha=0.05}$ $F$ Results Case 1 (Training) 1.16 1.4642 Reject; Difference Case 1 (Testing) 1.16 1.0105 No reject; No difference Case 2 (Training) 1.16 14.1212 Reject; Significant difference Case 2 (Testing) 1.16 9.2984 Reject; Significant difference Case 3 (Training) 1.16 37.8297 Reject; Significant difference Case 3 (Testing) 1.16 15.5751 Reject; Significant difference To compare the computational complexity, Table Ⅳ lists the computational time for the three methods. The simulations are implemented on a computer (Inter (R) Core (TM)2 Duo CPU E7200 @2.53 GHz, 8 G-RAM). The average running time (100 iterations) for EM-EKF is approximately 30.766 seconds, for SNPOM (100 iterations) 10.690 seconds, and for EKF 0.40572 seconds. Obviously, the EM-EKF is more time-consuming than the other two methods since EM algorithm is computationally expensive for high-dimension states. Also, for a large RBF network, the computational expense of the EKF could be burdensome and it increases linearly with the number of training samples. Instead, the SNPOM is a hybrid method, depending partly on the Levenberg-Marquardt method (LMM) for nonlinear parameter optimization and partly on the least-squares method (LSM) for linear parameter estimation. The SNPOM can greatly accelerate the computational convergence of the parameter search process, especially for the RBF-type models with larger number of linear weights and smaller number of nonlinear parameters. However, the EM-EKF, as an alternative way, is superior to the SNPOM and the EKF and can accurately estimate the noise variance.
表 Ⅳ THE COMPUTATION TIME OF DIFFERENT METHODS (S)Table Ⅳ THE COMPUTATION TIME OF DIFFERENT METHODS (S)Method Time SNPOM 10.690 EKF 0.40572 EM-EKF 30.766 As a whole, the EM-EKF method is capable of estimating the parameters of the RBF-AR model, and the initial conditions and the noise variances are identified by use of the EM algorithm, which can further improve the modeling precision. Comparison results indicate that the RBF-AR model estimated by the EM-EKF makes more accurate predictions than do the SNPOM and the EKF, although the values of the MSEs using EM-EKF are only slightly smaller than those of the SNPOM and EKF in some cases. However, $F$ test shows there is significant difference between results obtained by the SNPOM and the EM-EKF. Furthermore, the estimation of observation noise variance using EM-EKF is close to the true value, while the estimate of that using SNPOM is unknown, and the observation noise variance in the EKF must be given in advance. Therefore, we can conclude that the EM-EKF method is an advisable choice for estimating RBF-AR model and is especially appropriate for signals disturbed by noise.
5. Conclusion
In this paper, the EM-EKF method is developed to estimate the parameter of RBF-AR model. Firstly, the model is reconstructed as a new type of general radial basis function neural networks. Secondly, to circumvent the EKF's limitation of unknown prior knowledge, the EM is proposed to calculate the initial states and the measurement and process noise variance. By combining the EM and extended Kalman filtering and smoothing process, the EM-EKF method is proposed to estimate the parameters of the RBF-AR model, the initial conditions and the noise variances jointly, and can further improve the modeling precision. Comparisons of the performance of the EM-EKF with the SNPOM and the EKF are performed, and the results indicate that the RBF-AR model estimated by the EM-EKF makes more accurate predictions than do the SNPOM and the EKF, although the EM-EKF is more time-consuming. Moreover, the estimate of observation variance converges to the true value. Finally, F test indicates there is significant difference between results obtained by the SNPOM and the EM-EKF. Our future work would develop the EM-EKF algorithm for RBF-ARX (Peng et al. [2]) estimation and apply the RBF-AR model based on the EM-EKF algorithm to other types of time series and complex systems analysis.
-
[1] G. Feng, "A survey on analysis and design of model-based fuzzy control systems, "IEEE Trans. Fuzzy Syst. , vol. 14, no. 5, pp. 676-697, Oct. 2006. [2] C. C. Hua, Q. G. Wang, and X. P. Guan, "Adaptive fuzzy output-feedback controller design for nonlinear time-delay systems with unknown control direction, "IEEE Trans. Syst. Man Cybern. B Cybern. , vol. 39, no. 2, pp. 306-317, Apr. 2009. http://www.ncbi.nlm.nih.gov/pubmed/19095556 [3] K. Tanaka, H. Ohtake, and H. O. Wang, "Guaranteed cost control of polynomial fuzzy systems via a sum of squares approach, "IEEE Trans. Syst. Man Cybern. B Cybern. , vol. 39, no. 2, pp. 561-567, Apr. 2009. http://europepmc.org/abstract/MED/19095549 [4] L. G. Wu and W. X. Zheng, " L2/L∞ control of nonlinear fuzzy Itô stochastic delay systems via dynamic output feedback, "IEEE Trans. Syst. Man Cybern. B Cybern. , vol. 39, no. 5, pp. 1308-1315, Oct. 2009. http://ieeexplore.ieee.org/document/4804613/ [5] H. L. Dong, Z. D. Wang, D. W. C. Ho, and H. J. Gao, "Robust H∞ fuzzy output-feedback control with multiple probabilistic delays and multiple missing measurements, "IEEE Trans. Fuzzy Syst. , vol. 18, no. 4, pp. 712-725, Aug. 2010. http://dl.acm.org/citation.cfm?id=1856597 [6] J. B. Qiu, G. Feng, and H. J. Gao, "Fuzzy-model-based piecewise H∞ static-output-feedback controller design for networked nonlinear system, "IEEE Trans. Fuzzy Syst. , vol. 18, no. 5, pp. 919-934, Oct. 2010. http://dl.acm.org/citation.cfm?id=1892548 [7] J. B. Qiu, G. Feng, and H. J. Gao, "Observer-based piecewise affine output feedback controller synthesis of continuous-time T-S fuzzy affine dynamic systems using quantized measurements, "IEEE Trans. Fuzzy Syst. , vol. 20, no. 6, pp. 1046-1062, Dec. 2012. http://dl.acm.org/citation.cfm?id=2721232 [8] J. B. Qi, G. Feng, and H. J. Gao, "Static-output-feedback H∞ control of continuous-time T-S fuzzy affine systems via piecewise Lyapunov functions, "IEEE Trans. Fuzzy Syst. , vol. 21, no. 2, pp. 245-261, Apr. 2013. doi: 10.1109/tfuzz.2012.2210555 [9] Y. Y. Cao and P. M. Frank, "Analysis and synthesis of nonlinear time-delay systems via fuzzy control approach, "IEEE Trans. Fuzzy Syst. , vol. 8, no. 2, pp. 200-211, Apr. 2000. http://dl.acm.org/citation.cfm?id=2234847 [10] B. Chen, X. P. Liu, S. C. Tong, and C. Lin, "Observer-based stabilization of T-S fuzzy systems with input delay, "IEEE Trans. Fuzzy Syst. , vol. 16, no. 3, pp. 652-663, Jun. 2008. http://ieeexplore.ieee.org/document/4358823/ [11] B. Chen and X. P. Liu, "Delay-dependent robust H∞ control for T-S fuzzy systems with time delay, "IEEE Trans. Fuzzy Syst. , vol. 13, no. 4, pp. 544-556, Aug. 2005. http://dl.acm.org/citation.cfm?id=2235274 [12] H. J. Gao, Y. Zhao, and T. W. Chen, " H∞ fuzzy control of nonlinear systems under unreliable communication links, "IEEE Trans. Fuzzy Syst. , vol. 17, no. 2, pp. 265-278, Apr. 2009. http://ieeexplore.ieee.org/document/4505329/ [13] X. F. Jiang and Q. L. Han, "On designing fuzzy controllers for a class of nonlinear networked control systems, "IEEE Trans. Fuzzy Syst. , vol. 16, no. 4, pp. 1050-1060, Aug. 2008. http://ieeexplore.ieee.org/document/4601110/ [14] C. Lin, Q. G. Wang, T. H. Lee, and Y. He, "Design of observer-based H∞ control for fuzzy time-delay systems, "IEEE Trans. Fuzzy Syst. , vol. 16, no. 2, pp. 534-543, Apr. 2008. http://ieeexplore.ieee.org/document/4358786/ [15] X. W. Liu, "Delay-dependent H∞ control for uncertain fuzzy systems with time-varying delays, "Nonlinear Anal. Theory Methods Appl. , vol. 68, no. 5, pp. 1352-1361, Mar. 2008. http://www.ams.org/mathscinet-getitem?mr=2381676 [16] M. Liu, D. W. C. Ho, and Y. G. Niu, "Stabilization of markovian jump linear system over networks with random communication delay, "Automatica, vol. 45, no. 2, pp. 416-421, Feb. 2009. http://dl.acm.org/citation.cfm?id=1497637.1497905 [17] S. K. Nguang and P. Shi, "Fuzzy H∞ output feedback control of nonlinear systems under sampled measurements, "Automatica, vol. 45, no. 12, pp. 2169-2174, Dec. 2003. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=980889 [18] S. K. Nguang and P. Shi, "H∞ fuzzy output feedback control design for nonlinear systems: An LMI approach, "IEEE Trans. Fuzzy Syst. , vol. 11, no. 3, pp. 331-340, Jun. 2003. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=1203792 [19] M. Sahebsara, T. W. Chen, and S. L. Shah, "Optimal H∞ filtering in networked control systems with multiple packet dropouts, "Syst. Control Lett. , vol. 57, no. 9, pp. 696-702, Sep. 2008. doi: 10.1016/j.sysconle.2008.01.011 [20] P. Seiler and R. Sengupta, "An H∞ approach to networked control, "IEEE Trans. Automat. Control, vol. 50, no. 3, pp. 356-364, Mar. 2005. http://www.ams.org/mathscinet-getitem?mr=2123096 [21] S. L. Sun, L. H. Xie, and W. D. Xiao, "Optimal full-order and reduced-order estimators for discrete-time systems with multiple packet dropouts, "IEEE Trans. Signal Process. , vol. 56, no. 8, pp. 4031-4038, Aug. 2008. http://dl.acm.org/citation.cfm?id=2197942.2201355&coll=DL&dl=GUIDE&CFID=431149417&CFTOKEN=23120143 [22] Z. D. Wang, D. W. C. Ho, Y. R. Liu, and X. H. Liu, "Robust H∞ control for a class of nonlinear discrete time-delay stochastic systems with missing measurements, "Automatica, vol. 45, no. 3, pp. 684-691, Mar. 2009. http://dl.acm.org/citation.cfm?id=1513258 [23] H. N. Wu, "Delay-dependent H∞ fuzzy observer-based control for discrete-time nonlinear systems with state delay, "Fuzzy Sets Syst. , vol. 159, no. 20, pp. 2696-2712, Oct. 2008. http://dl.acm.org/citation.cfm?id=1403334 [24] H. N. Wu and K. Y. Cai, " H2 guaranteed cost fuzzy control for uncertain nonlinear systems via linear matrix inequalities, "Fuzzy Sets Syst. , vol. 148, no. 3, pp. 411-429, Dec. 2004. http://www.ams.org/mathscinet-getitem?mr=2101200 [25] F. W. Yang, Z. D. Wang, Y. S. Hung, and M. Gani, "H∞ control for networked systems with random communication delays, "IEEE Trans. Automat. Control, vol. 51, no. 3, pp. 511-518, Mar. 2006. http://ieeexplore.ieee.org/document/1605414/ [26] D. Yue, E. G. Tian, Y. J. Zhang, and C. Peng, "Delay-distribution-dependent stability and stabilization of T-S fuzzy systems with probabilistic interval delay, "IEEE Trans. Syst. Man Cybern. Part B, Cybern. , vol. 39, no. 2, pp. 503-516, Apr. 2009. [27] H. G. Zhang, M. Li, J. Yang, and D. D. Yang, "Fuzzy model-based robust networked control for a class of nonlinear systems, "IEEE Trans. Syst. Man Cybern. Part A, Syst. Hum. , vol. 39, no. 2, pp. 437-447, Mar. 2009. http://ieeexplore.ieee.org/document/4757263/ [28] L. Q. Zhang, Y. Shi, T. W. Chen, and B. Huang, "A new method for stabilization of networked control systems with random delays, "IEEE Trans. Automat. Control, vol. 50, no. 8, pp. 1177-1181, Aug. 2005. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1470028 [29] Y. Zhao, J. Wu, and P. Shi, "H∞ control of non-linear dynamic systems: A new fuzzy delay partitioning approach, "IET Control Theory Appl. , vol. 3, no. 7, pp. 917-928, Jul. 2009. http://www.ams.org/mathscinet-getitem?mr=2537969 [30] M. X. Liu, X. T. Liu, Y. Shi, and S. Q. Wang, "T-S fuzzy-model-based H2 and H∞ filtering for networked control systems with two-channel Markovian random delays, "Dig. Signal Process. , vol. 27, pp. 167-174, Apr. 2014. http://dl.acm.org/citation.cfm?id=2608860.2609119 [31] L. Qiu, Y. Shi, F. Q. Yao, G. Xu, and B. G. Xu, "Network-based robust H2/ H∞ control for linear systems with two-channel random packet dropouts and time delays, "IEEE Trans. Cyber. , vol. 45, no. 8, pp. 1450-1462, Aug. 2015. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6897996 [32] L. X. Zhang, "H∞ estimation for discrete-time piecewise homogeneous Markov jump linear systems, "Automatica, vol. 45, no. 11, pp. 2570-2576, Nov. 2009. http://www.ams.org/mathscinet-getitem?mr=2889314 [33] L. X. Zhang, N. G. Cui, M. Liu, and Y. Zhao, "Asynchronous filtering of discrete-time switched linear systems with average dwell time, "IEEE Trans. Circ. Syst. Ⅰ Regul. Pap., vol. 58, no.5, pp.1109-1118, May2011. doi: 10.1109/TCSI.2010.2092151 [34] W. Assawinchaichote, S. K. Nguang, P. Shi, and E. Boukas, "H∞ fuzzy state-feedback control design for nonlinear systems with D-stability constraints: An LMI approach, "Math. Comput. Simul. , vol. 78, no. 4, pp. 514-531, Aug. 2008. http://www.ams.org/mathscinet-getitem?mr=2424560 [35] X. P. Guan and C. L. Chen, "Delay-dependent guaranteed cost control for T-S fuzzy systems with time delays, "IEEE Trans. Fuzzy Syst. , vol. 12, no. 2, pp. 236-249, Apr. 2004. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=1284326 [36] L. El Ghaoui, F. Oustry, and M. A. Rami, "A cone complementarity linearization algorithm for static output-feedback and related problems, "IEEE Trans. Automat. Control, vol. 42, no. 8, pp. 1171-1176, Aug. 1997. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=618250 [37] H. J. Gao, Z. D. Wang, and C. H. Wang, "Improved H∞ control of discrete-time fuzzy systems: A cone complementarity linearization approach, "Inf. Sci. , vol. 175, no. 1-2, pp. 57-77, Sep. 2005. http://www.sciencedirect.com/science/article/pii/S0020025504002932 -