-
摘要: 根据医疗行业现状,不难发现各医疗机构间共享数据困难,因为医疗数据的校验、保存和同步一直是一个难点.病人、医生以及研究人员在访问和共享医疗数据时存在严格的限制,这一过程需要花费大量的资源和时间用于权限审查和数据校验.本文提出一个基于区块链的医疗数据共享模型,具有去中心化、安全可信、集体维护、不可篡改等特点,适用于解决各医疗机构数据共享的难题.本文详细介绍了模型的组件以及实现原理.将现有医疗机构进行分类,配合使用改进的共识机制实现了方便、安全、快捷的数据共享.此外,通过对比医疗数据共享存在的问题,分析了本模型的优势以及带来的影响.Abstract: According to the status quo of medical industry, verification, storage and synchronization of clinical data are difficult, therefore, clinic data sharing between institutions become a challenging task. There are many restrictions in data access and sharing for patients, doctors and even researchers, which results in a high cost of both resources and time for authority authentication and verification. To solve this problem, we propose a blockchain-based medical data sharing model, with advantages of decentralization, high security, collective maintenance and tamper resistance. We discuss the critical principles and components of this model in detail. Furthermore, we improve the consensus mechanism so as to better match different types of medical institutions for a more convenient, secure and faster data sharing. In addition, the merits and impacts of this model are presented and analyzed by comparisons in terms of existing issues in medical data.
-
Key words:
- Medical data /
- blockchain /
- consensus mechanism /
- sharing model
-
1. Introduction
The radial basis function network-based state-dependent autoregressive (RBF-AR) model, which offers a very flexible structure for nonlinear time-series modeling, has been extensively studied. For example, Shi et al. [1] developed the RBF-AR model to reconstruct the dynamics of given nonlinear time series. Peng et al. [2] extended the RBF-AR model to the case where there are several exogenous variables (RBF-ARX model) for the system. Following this method, Gan et al. [3], [4] successively developed the locally linear radial basis function network-based autoregressive (LLRBF-AR) model and a gradient radial basis function based varying-coefficient autoregressive (GRBF-AR) model. The major feature of the RBF-AR (X) model which is superior to the black-box models based on general function approximations is that: the RBF-AR (X) model may provide some insights into the system dynamics due to its quasi-linear structure, whereas the general function approximations cannot.
The identification of the RBF networks includes the choice of topology (e.g., the number of neurons) and estimation of all the parameters. For selecting the number of neurons, several methods have been proposed, e.g., the Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross validation. Therefore we must first have a good model parameter estimation method, and then we can repeat the method for models with different number of neurons, before finally selecting the best model. In this paper, the majority of our works focuses on determining the parameters of RBF networks.
The estimation of the RBF-AR (X) model is a difficult task but is a key procedure for successfully applying the RBF-AR (X) model. Peng et al. [2] presented a structured nonlinear parameter optimization method (SNPOM) for the RBF-AR (X) model estimation. Researches [2], [5], [6] shown the SNPOM can optimize all of the model parameters and can also accelerate the computational convergence of the optimization search process. However, the primary shortcoming of the SNPOM is that it can be easily trapped in a local optimum because of an initial guess value. To overcome the problem of local minima, Gan et al. [7] proposed the hybrid evolutionary algorithm (EA)-SNPOM and simulation results indicated that the hybrid algorithm provides better results than either method alone (EA and SNPOM), but the complexity and running time increased substantially. Also, Gan et al. [8] proposed a variable projection (VP) approach to efficiently estimate the parameters of the RBF-ARX model. Although these identification methods for RBF-AR (X) have good estimation results, their performance may get deteriorated in case of measurement with noise interference.
As Peng et al. [2] and Gan et al. [7] mentioned that RBF-AR (X) can be regarded as a more general nonlinear model than the RBF neural network and as a generalization of the RBF network. Furthermore, any kind of RBF and the RBF-ARX model parameter estimation procedure must include the selection of appropriate centers and scaling factors, and estimation of all the linear weights of the RBF networks in the model.
Various derivative-based methods have been used to train RBF neural network, including gradient descent [9] and the well-known back propagation [10]. Although gradient descent training for RBF networks has proven to be much more effective than lots of conventional methods, it can be computationally expensive. An alternative to sequentially estimate RBF network is the extended Kalman filter (EKF)-based neural network sequential learning method [11]-[13]. The sequential EKF algorithm is a fast training algorithm, which takes all the network parameters (weights or weights and centers) as a state vector in state-space and adjusts them to adapt to measurements [14]. Because it makes use of second order statistics (covariance), the EKF is an improvement over conventional neural network estimation techniques, such as back-propagation and gradient descent training algorithm. However, a well known limitation of the EKF is the assumption of known a priori statistics to describe the measurement and process noise. Setting these noise levels appropriately often makes the difference between success and failure in the use of the EKF. Especially in environments where the noise statistics change with time, the EKF can lead to large estimation errors and even to divergence. Thus, the selection of the initial states, the measurement noise covariance matrix and the process noise covariance matrix are very important in the control of the convergence of the EKF learning algorithm. To overcome the difficulties above, a number of improved learning methods were proposed, such as EKFQ (the EKF algorithm with evidence maximization and sequentially updated priors) [15], [16], APNCPF (adaptive process noise covariance particle filter) [17], a hybrid EKF and switching PSO (particle swarm optimization) algorithm [18], and a particle filtering algorithm in combination with the kernel smoothing method [19]. The EKFQ and APNCPF algorithms employ adaptive noise parameters, but the problem of choosing the right window length, which is used to update noise covariance matrices, results in a regularization/tracking dilemma because of unknown prior knowledge. Moreover, the above algorithms do not consider the optimal initial values of parameters, which will lead to some imprecise information about the spread or the shape of the posterior [20].
The expectation-maximization (EM) algorithm [21] was developed to learn parameters of statistical models in the presence of incomplete data or hidden variables. Lázaro et al. [22] and Constantinopoulos et al. [23] extended the EM learning strategy for estimation of the neural network weights, such as the multi-layer perception (MLP) and the Gaussian radial basis function (RBF) networks. Research results indicate that the EM method is effective and simple to obtain maximum likelihood (ML) estimates of the parameter and states.
In this paper, the proposed expectation-maximization extended Kalman filter (EM-EKF) method, which combines the expectation-maximization and the extended Kalman filtering, is developed to identify the RBF-AR model. The method is more accurate as it involves extended Kalman smoothing, which provides a minimum variance Gaussian approximation to the posterior probability density function. To use the EM-EKF algorithm for state space learning, RBF-AR model is reconstructed as a general RBF network, which has additional linear output weight layer compared with the traditional three-layer RBF network. Thus, the general RBF network is represented by state-space model, and the centers and the weights of the general RBF network are treated as hidden state variables. The learning algorithm for RBF-AR model possesses advantages of both the expectation-maximization and of the extended Kalman filtering and smoothing. Specifically, the EM algorithm is utilized to estimate parameters of the state-space model, such as the measurement and process noise variance and the initial states, while the extended Kalman filtering and smoothing are used to estimate the approximate state distribution. The proposed algorithm simplifies the optimizing estimation of the maximum likelihood by making the expectation maximal, and EKF and smoothing process realize the more exact estimation of the expectation. This learning technique can improve the performance of the EKF-based neural network sequential learning method by estimating the noise variance and the initial states. The performance and effectiveness of our proposed method are evaluated by the Mackey-Glass time series through three cases.
The contributions of this paper comprise two parts. First, RBF-AR model is reconstructed as a new type of general radial basis function (RBF) neural network, which makes it able to estimate the parameters of RBF-AR model using the EKF. Second, by combining the expectation maximization and extended Kalman filtering and smoothing process, the EM-EKF method is developed to estimate RBF-AR model, which can give joint state and parameter estimates.
The structure of the remaining of this paper is as follows. Section 2 presents the state-space representation of the RBF-AR model. The EM-EKF method is developed to identify the RBF-AR model in Section 3. The cases studies are shown in Section 4. Finally, the paper is concluded in Section 5.
2. The State-space Representation of the RBF-AR Model
We are interested in the nonlinear time series that can be described by the following state-dependent AR (SD-AR) model
$$ \begin{align} \begin{cases} y_{t}=\phi_{0}\, (\boldsymbol{X}_{t-1})+\sum\limits_{i=1}^p\phi_{i}\, (\boldsymbol{X}_{t-1})y_{t-i}+e_{t}\\[1mm] \boldsymbol{X}_{t-1}=[y_{t-1}, y_{t-2}, \ldots, y_{t-d}]^{{T}} \end{cases} \end{align} $$ (1) where $y_{t}$ $(t=1, \ldots, T)$ is the output, $\boldsymbol{X}_{t-1}$ is regarded as the state vector at time $t$ , which contains only the output series in this case (in other cases it may contain the input series or another). $\phi_{i}(\boldsymbol{X}_{t-1})$ $(i=0, 1, \ldots, p)$ are the state-dependent function coefficients of the model, $p$ is the model order, and $d$ is the dimension of the state vector. $e_{t}$ denotes Gaussian white noise.
Although the SD-AR model provides a useful framework for general nonlinear time series modeling, the problem is how to specify the functional form of its state-dependent coefficients. An efficient approach to solve the problem is to use Gaussian RBF neural networks approximations of coefficients of model (1) [2], and then the model derived is called the RBF-AR model, which is given by
$$ \begin{align} \begin{cases} y_{t}=\phi_{0}\, (\boldsymbol{X}_{t-1})+\sum\limits_{i=1}^p\phi_{i}\, (\boldsymbol{X}_{t-1})y_{t-i}+e_{t}\\[1mm] \phi_{0}\, (\boldsymbol{X}_{t-1})=\omega_{0, 0}+\sum\limits_{k=1}^m\omega_{0, k}\exp\left\{-\lambda_{k}\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|_{2}^{2}\right\}\\[1mm] \phi_{i}\, (\boldsymbol{X}_{t-1})=\omega_{i, 0}+\sum\limits_{k=1}^m\omega_{i, k}\exp\left\{-\lambda_{k}\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|_{2}^{2}\right\}\\[1mm] \boldsymbol{Z}_{k}=[z_{k, 1}, \ldots, z_{k, d}]^{{T}} \end{cases} \end{align} $$ (2) where $\boldsymbol{Z}_{k}$ $(k=1, 2, \ldots, m)$ are the centers of the local linear RBF networks, $\lambda_{k}$ $(k=1, 2, \ldots, m)$ are the scaling parameters, $\omega_{i, k}$ $(i=0, 1, 2, \ldots, p; k=0, 1, 2, \ldots, m)$ are the linear weights, $m$ and $d$ are the number of hidden neurons and the dimension of the centers (the dimension of the centers is the same as the dimension of the state vector), respectively, and $\|\cdot\|_{2}$ denotes the vector 2-norm.
In general case, the RBF networks in model (2) may have different centers for different regression variables. However, in some applications, all the RBF networks may be allowed to share the same centers, because model (2) possesses the autoregressive structure, thus, although the RBF centers are the same in that case, the regressive polynomials' coefficients are different. Thus, the RBF-AR model can be seen as a general RBF network, which has two hidden layers with $m$ and $p+1$ neurons, respectively, and the output layer with one output. In this structure, the identification of RBF-AR model is to estimate the centers, the scaling parameters and the weights of the general RBF network. Fig. 1 shows the schematic of the RBF-AR model as a general RBF network.
To simplify the nonlinear optimization steps, the scaling parameters may be selected as [2]
$$ \begin{align} %\begin{cases} \lambda_{k}=\dfrac{-\log\epsilon_{k}}{\max\limits_{t-1}\{\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|^{2}\}}, \quad \epsilon_{k}\in[0.0001, 0.1]. %\end{cases} \end{align} $$ (3) Using this heuristic way, the weights will become bounded and stable when the signal $\boldsymbol{X}_{t-1}$ is far away from the centers $\boldsymbol{Z}_{k}$ .
To apply a filtering algorithm to the RBF network training, the state-space model is established, which is given by
$$ \begin{align} \begin{cases} \boldsymbol {\theta}_{t}=\boldsymbol{\theta}_{t-1}+\boldsymbol{v}_{t}\\[1mm] y_{t}=g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})+\xi_{t} \\[1mm] \ \ \ \, =\phi_{0}\, (\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})+\sum\limits_{i=1}^{p}\phi_{i}\, (\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})y_{t-i}+\xi_{t} \end{cases} \end{align} $$ (4) where $\boldsymbol{\theta}=[\boldsymbol{\omega}_{0}^{{T}}~ \boldsymbol{\omega}_{1}^{{T}}~ \ldots~ \boldsymbol{\omega}_{p}^{{T}}~ \boldsymbol{Z}_{1}^{{T}}~ \ldots~ \boldsymbol{Z}_{m}^{{T}}]^{{T}}$ ( $\boldsymbol{\omega}_j^{{T}}=[\omega_{0, j}$ $\omega_{1, j}$ $\ldots~\omega_{m, j}]$ , $0 \leq j \leq p$ ; $\boldsymbol{Z}_k=[z_{k, 1}~z_{k, 2}~\ldots~z_{k, d}]^{{T}}$ , $1$ $\leq$ $k$ $\leq$ $m$ ) represents the system state vector, $y_{t}$ is the observation variable, $\boldsymbol{v}_{t}$ and $\xi_{t}$ denote the process noise and observation noise, which are assumed to be zero-mean Gaussian processes with the covariance $\boldsymbol{Q}$ and $R$ , namely $\boldsymbol{v}_{t}$ $\sim$ ${\rm N}(0, \boldsymbol{Q}) $ , $\xi_{t}\sim {\rm N}(0, R) $ . The initial state (parameters) $\boldsymbol{\theta}_{0}$ is normally-distributed with mean $\boldsymbol{\mu}_{0}$ and covariance $\boldsymbol{\Xi}_{0}$ . Obviously, the crux of the matter is that both the system hidden state $\boldsymbol{\theta}_{t}$ and the parameters $\boldsymbol{\varphi}=(\boldsymbol{Q}, R, \boldsymbol{\mu}_{0}, \boldsymbol{\Xi}_{0})$ are unknown.
3. The EM-EKF Method
To simultaneously estimate parameters and hidden states, the expectation maximization (EM) is incorporated with extended Kalman filtering and smoothing, which aims to integrate over the uncertain estimates of the unknown hidden states and optimize the resulting marginal likelihood of the parameters given the observed data. The algorithm realizes the more exact estimation of the posterior distribution by use of the extended Kalman filtering and smoothing.
The EM algorithm is an iterative method for finding a mode of the likelihood function. To derive the EM algorithm for nonlinear state space models, we need to develop an expression for the likelihood of the completed data. We assume that the likelihood of the data given the states, the initial conditions and the evolution of the states can be represented by Gaussian distributions. Thus, if the initial mean and covariance of the states is given by $\boldsymbol{\mu}_{0}$ and $\boldsymbol{\Xi}_{0}$ , then
$$ p(\boldsymbol{\theta}_{0}|\boldsymbol{\varphi})= (2\pi)^{-\frac{l}{2}}|\boldsymbol{\Xi}_{0}|^{-\frac{1}{2}} \nonumber\\ \quad\ \times\exp\left[-\dfrac{1}{2}\, (\boldsymbol{\theta}_{0}- \boldsymbol{\mu}_{0})^{T}\boldsymbol{\Xi}_{0}^{-1}\, (\boldsymbol{\theta}_{0}-\boldsymbol{\mu}_{0})\right] $$ (5) $$ p(\boldsymbol{\theta}_{t}|\boldsymbol{\theta}_{t-1}, \boldsymbol{\varphi}) =(2\pi)^{-\frac{l}{2}}|\boldsymbol{Q}|^{-\frac{1}{2}}\nonumber\\ \quad\ \times\exp\left[-\dfrac{1}{2}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1})^{T}\boldsymbol{Q}^{-1}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1})\right] $$ (6) $$ p(y_{t}|\boldsymbol{\theta}_{t}, \boldsymbol{\varphi})=(2\pi)^{-\frac{n}{2}}|R|^{-\frac{1}{2}}\nonumber\\ \quad\ \times\exp\left[-\dfrac{1}{2}\, (y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))^{T}R^{-1}\, (y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))\right] $$ (7) where $l=(1+m)(1+p)+md$ is the dimension of the state vector, $n=1$ is the dimension of the observation vector.
Then, the log-likelihood of the complete data is given by:
$$ \begin{align} \ln p(&\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi}) =-\dfrac{1}{2}[\boldsymbol{\theta}_{0}-\boldsymbol{\mu}_{0}]^{T} \boldsymbol{\Xi}_{0}^{-1}[\boldsymbol{\theta}_{0}-\boldsymbol{\mu}_{0}] \nonumber\\ &-\dfrac{1}{2}\ln |\boldsymbol{\Xi}_{0}|-\dfrac{T(l+n)+l}{2}\ln 2\pi-\dfrac{T}{2}\ln |\boldsymbol{Q}|\nonumber\\ &-\dfrac{T}{2}\ln |R|-\sum\limits_{t=1}^{T}\dfrac{1}{2} [\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1}]^{T} \boldsymbol{Q}^{-1}[\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t-1}] \nonumber\\ &-\sum\limits_{t=1}^{T}\dfrac{1}{2}[y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})]^{T}R^{-1}[y_{t}-g(\boldsymbol {\theta}_{t}, \boldsymbol{X}_{t-1})]. \end{align} $$ (8) If we take the expectation of the log-likelihood for the complete data, we get the following expression:
$$ \begin{align} & E\ln\left[p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})\right]=-\dfrac{1}{2}\ln |\boldsymbol{\Xi}_{0}|-\dfrac{T}{2}\ln |\boldsymbol{Q}|-\dfrac{T}{2}\ln |R|\nonumber\\ &\ \ \, -\dfrac{1}{2}E\left[\boldsymbol{\theta}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\theta}_{0}-\boldsymbol{\theta}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\mu}_{0}-\boldsymbol{\mu}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\theta}_{0}+\boldsymbol{\mu}_{0}^{{T}}\boldsymbol{\Xi}_{0}^{-1}\boldsymbol{\mu}_{0}\right]\nonumber\\ &\ \ \, -\dfrac{1}{2}\sum\limits_{t=1}^{T}E\left[\boldsymbol{\theta}_{t}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t-1}-\boldsymbol{\theta}_{t-1}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t}\right.\nonumber\\ &\ \ \, +\left.\boldsymbol{\theta}_{t-1}^{{T}}\boldsymbol{Q}^{-1}\boldsymbol{\theta}_{t-1}\right]-\dfrac{T(l+n)+l}{2}\ln 2\pi\nonumber\\ &\ \ \, -\dfrac{1}{2}\sum\limits_{t=1}^{T}E\left[(y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))^{{T}}R^{-1}\, (y_{t}-g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))\right]. \end{align} $$ (9) To compute the expectation of the measurement mapping $g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})$ , the Taylor expansion of this mapping around the smoothing estimate $\boldsymbol{\theta}_{t|T}$ is given by
$$ \begin{align} &g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})=g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\nonumber\\ &\qquad +\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})+\cdots \end{align} $$ (10) where the smoothing estimate $\boldsymbol{\theta}_{t|T}$ denotes the conditional mean of $\boldsymbol{\theta}_{t}$ given $y_{1:T}=\{y_1, \ldots, y_{T}\}$ .
The EKF utilizes only the first-order Taylor expansion, so we can obtain
$$ \begin{align} &g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})\approx g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\nonumber\\ &\qquad\left. +\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \boldsymbol{\theta}_{t}}\right|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T}). \end{align} $$ (11) Under the basic assumptions of the EKF (no model and parameter uncertainty, zero-mean white-noise sequence, known process and measurement models, etc.), the smoothing estimates are unbiased, and the smoothing estimates satisfy
$$ \begin{align} &E(\boldsymbol{\theta}_{t|T})=\boldsymbol{\theta}_{t}. \end{align} $$ (12) So we can get
$$ \begin{align} &E(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}))\approx g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}). \end{align} $$ (13) Subsequently, we compute the covariance of $g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})$
$$ \begin{align} &E\Big[\big(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)\nonumber\\ &\qquad\qquad\qquad\quad\quad\quad\big(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}}\Big]\nonumber\\ &\quad\quad\, \, \approx E\Bigg[\bigg[\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})\bigg]\nonumber\\ &\qquad\quad\ \, \times\bigg[\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|T})}\, (\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})\bigg]^{{T}}\Bigg]\nonumber\\ &\quad\quad\, \, =\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}. \end{align} $$ (14) Hence, it follows that:
$$ \begin{align} &E\left[g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})\big(g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})\big)^{{T}}\right]\nonumber\\ &\qquad \approx \boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}+g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big(g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}} \end{align} $$ (15) where $\boldsymbol{G}_{t}$ corresponds to the Jacobian matrix of the measurement function:
$$ \begin{align} &\boldsymbol{G}_{t}=\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \boldsymbol{\theta}_{t}}\bigg|_{(\boldsymbol{\theta}_{t}=\boldsymbol{\theta}_{t|t-1})}\nonumber\\ &\quad\, =\bigg[\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \theta_{1, t}}~\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \theta_{2, t}}~ \cdots ~\dfrac{\partial g(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})}{\partial \theta_{l, t}}\bigg] \end{align} $$ (16) and $\boldsymbol{P}_{t|T}$ denotes the conditional covariance of $\boldsymbol{\theta}_{t}$ given $y_{1:T}$ $=$ $\{y_{1}, \ldots, y_{T}\}$ :
$$ \begin{align} &\boldsymbol{P}_{t|T}=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})^{{T}}\right]. \end{align} $$ (17) Equations (13)-(15) are substituted in (9), and then we get
$$ \begin{align} &E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})] =-\dfrac{1}{2}\ln |\boldsymbol{\Xi}_{0}|-\dfrac{T}{2}\ln |\boldsymbol{Q}| \nonumber\\ &\qquad -\dfrac{T}{2}\ln |R|-\dfrac{1}{2}{\rm tr}\bigg\{\boldsymbol{\Xi}_{0}^{-1}[\boldsymbol{\theta}_{0|T} \boldsymbol{\theta}_{0|T}^{{T}}\nonumber\\ &\qquad-2\boldsymbol{\theta}_{0|T} \boldsymbol{\mu}_{0}^{{T}}+\boldsymbol{\mu}_{0} \boldsymbol{\mu}_{0}^{{T}}+\boldsymbol{P}_{0|T}]\bigg\}\nonumber\\ &-\dfrac{1}{2}\sum\limits_{t=1}^{T}{\rm tr}\bigg\{\boldsymbol{Q}^{-1} \big[\boldsymbol{\theta}_{t|T}\boldsymbol{\theta}_{t|T}^{{T}} +\boldsymbol{P}_{t|T}\nonumber\\ &-2(\boldsymbol{\theta}_{t|T} \boldsymbol{\theta}_{t-1|T}^{{T}}+\boldsymbol{P}_{t, t-1|T})^{{T}}\nonumber\\ &+\boldsymbol{\theta}_{t-1|T}\boldsymbol{\theta}_{t-1|T}^{{T}} +\boldsymbol{P}_{t-1|T}\big]\bigg\}\nonumber\\ &-\dfrac{(l+n)T+l}{2}\ln 2\pi\nonumber\\ &-\dfrac{1}{2}\sum\limits_{t=1}^{T}{\rm tr}\bigg\{R^{-1}\big[(y_{t}-g(\boldsymbol {\theta}_{t|T}, \boldsymbol{X}_{t-1}))\nonumber\\ &\times(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}))^{{T}}+\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\big]\bigg\}. \end{align} $$ (18) Smoothing often includes the forward and backward filtering over a segment of data so as to obtain improved estimates. The forward filtering stage involves computing the estimates $\boldsymbol{\theta}_{t|t}$ and $\boldsymbol{P}_{t|t}$ over a segment of $T$ samples, where $\boldsymbol{\theta}_{t|t}$ and $\boldsymbol{P}_{t|t}$ denote the conditional mean and conditional covariance of $\boldsymbol{\theta}_{t}$ given $y_{1:t}=\{y_{1}, \ldots, y_{t}\}$ . Then, the EKF-based forward filtering for estimating model (4) can be derived as follows. The predicted state vector:
$$ \begin{align} &\boldsymbol{\theta}_{t|t-1}=E[\boldsymbol{\theta}_{t}|y_{1:t-1}]=\boldsymbol{\theta}_{t-1|t-1}. \end{align} $$ (19) The predicted conditional covariance of $\boldsymbol{\theta}_{t}$ :
$$ \begin{align} \boldsymbol{P}_{t|t-1}&=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t-1})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t-1})^{{T}}\right]\notag\\ &=\boldsymbol{P}_{t-1|t-1}+\boldsymbol{Q}. \end{align} $$ (20) The Kalman gain:
$$ \begin{align} &\boldsymbol{K}_{t}=\boldsymbol{P}_{t|t-1}\boldsymbol{G}^{{T}}\, (\boldsymbol{G}_{t}\boldsymbol{P}_{t|t-1}\boldsymbol{G}_{t}^{{T}}+R)^{-1}. \end{align} $$ (21) Updated system state estimate:
$$ \begin{align} \boldsymbol{\theta}_{t|t}&=E[\boldsymbol{\theta}_{t}|y_{1:t}]\notag\\ &=\boldsymbol{\theta}_{t|t-1}+\boldsymbol{K}_{t}[y_{t}-g(\boldsymbol{\theta}_{t|t-1}, \boldsymbol{X}_{t-1})]. \end{align} $$ (22) Updated estimate covariance:
$$ \begin{align} \boldsymbol{P}_{t|t}&=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|t})^{{T}}\right]\notag\\ &=\boldsymbol{P}_{t|t-1}-\boldsymbol{K}_{t}\boldsymbol{G}_{t}\boldsymbol{P}_{t|t-1}. \end{align} $$ (23) To obtain the smoothing estimates $\boldsymbol{\theta}_{t|T}$ and $\boldsymbol{P}_{t|T}$ , we employ the Rauch-Tung-Strieber smoother to do the following backward recursions
$$ \begin{align} \begin{cases} \boldsymbol{J}_{t}=\boldsymbol{P}_{t|t}\boldsymbol{P}_{t+1|t}^{-1}\\[1mm] \boldsymbol{\theta}_{t|T}=\boldsymbol{\theta}_{t|t}+\boldsymbol{J}_{t}\, (\boldsymbol{\theta}_{t+1|T}-\boldsymbol{\theta}_{t+1|t})\\[1mm] \boldsymbol{P}_{t|T}=\boldsymbol{P}_{t|t}+\boldsymbol{J}_{t}\, (\boldsymbol{P}_{t+1|T}-\boldsymbol{P}_{t+1|t})\boldsymbol{J}_{t}^{{T}}. \end{cases} \end{align} $$ (24) We also require the cross-covariance $\boldsymbol{P}_{t, t-1|T}$ , which is defined as follows
$$ \begin{align} &\boldsymbol{P}_{t, t-1|T}=E\left[(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})(\boldsymbol{\theta}_{t}-\boldsymbol{\theta}_{t|T})^{{T}}\right]. \end{align} $$ (25) And it can be obtained through the backward recursions
$$ \begin{align} &\boldsymbol{P}_{t, t-1|T}=\boldsymbol{P}_{t|t}\boldsymbol{J}_{t-1}^{{T}}+\boldsymbol{J}_{t}\, (\boldsymbol{P}_{t+1, t|T}-\boldsymbol{P}_{t|t})\boldsymbol{J}_{t-1}^{{T}}. \end{align} $$ (26) The backward recursions as above are initialized as following
$$ \begin{align} \begin{cases} \boldsymbol{\theta}_{T|T}=\boldsymbol{\theta}_{T}\\[1mm] \boldsymbol{P}_{T|T}=\boldsymbol{P}_{T}\\[1mm] \boldsymbol{P}_{T, T-1|T}=(\boldsymbol{I}-\boldsymbol{K}_{T}\boldsymbol{G}_{T})\boldsymbol{P}_{T-1|T-1}. \end{cases} \end{align} $$ (27) Compared with the filtering algorithm that uses the observations up to time $t$ for estimation of the state $\boldsymbol{\theta}_{t}$ , the smoothing yields a more accurate estimate $\boldsymbol{\theta}_{t|T}$ by using all available data up to time $T$ .
To find the optimal parameters $\boldsymbol{\varphi}$ , we need to maximize the expected value of the log-likelihood with respect to the parameters, and then compute the derivatives with respect to each parameter individually. That is
$$ \begin{align} \begin{cases} \dfrac{\partial }{\partial \boldsymbol{\mu}_{0}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[3mm] -\dfrac{1}{2}\dfrac{\partial}{\partial \boldsymbol{\mu}_{0}}{\rm tr} \left\{\boldsymbol{\Xi}_{0}^{-1}[(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})^{{T}}+\boldsymbol{P}_{0|T}]\right\}=0\\[3mm] \boldsymbol{\Xi}_{0}^{-1}\, (\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})=0\\ \boldsymbol{\mu}_{0}=\boldsymbol{\theta}_{0|T} \end{cases} \end{align} $$ (28) $$ \begin{align} \begin{cases} \dfrac{\partial }{\partial \boldsymbol{\Xi}_{0}^{-1}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[3mm] \dfrac{1}{2}\dfrac{\partial}{\partial \boldsymbol{\Xi}_{0}^{-1}}\Big\{\ln |\boldsymbol{\Xi}_{0}^{-1}|-{\rm tr}\big[\boldsymbol{\Xi}_{0}^{-1}\, ((\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})^{{T}}\\[1mm] \qquad\ \ +~\boldsymbol{P}_{0|T})\big]\Big\}=0\\[2mm] \dfrac{\boldsymbol{\Xi}_{0}}{2}-\dfrac{1}{2}\left((\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})(\boldsymbol{\theta}_{0|T}-\boldsymbol{\mu}_{0})^{{T}}+\boldsymbol{P}_{0|T}\right)=0\\[2mm] \boldsymbol{\Xi}_{0}=\boldsymbol{P}_{0|T} \end{cases} \end{align} $$ (29) $$ \begin{align} \begin{cases} \dfrac{\partial }{\partial \boldsymbol{Q}^{-1}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[3mm] \dfrac{\partial}{\partial \boldsymbol{Q}^{-1}}\left\{\dfrac{T}{2}\ln |\boldsymbol{Q}^{-1}|-\dfrac{1}{2}\big[{\rm tr}(\boldsymbol{Q}^{-1}\, (\boldsymbol{\Gamma}-2\boldsymbol{\Upsilon}^{{T}}+\boldsymbol{\Lambda}))\big]\right\}=0\\[3mm] \dfrac{T}{2}\boldsymbol{Q}-\dfrac{1}{2}\left(\boldsymbol{\Gamma}-2\boldsymbol{\Upsilon}^{{T}}+\boldsymbol{\Lambda}\right)=0\\[1mm] \boldsymbol{Q}=\dfrac{1}{T}\left(\boldsymbol{\Gamma}-2\boldsymbol{\Upsilon}^{{T}}+\boldsymbol{\Lambda}\right) \end{cases} \end{align} $$ (30) $$ \begin{align} \begin{cases} \dfrac{\partial }{\partial R^{-1}}E[\ln p(\boldsymbol{\theta}_{0:T}, y_{1:T}|\boldsymbol{\varphi})]=0\\[2mm] \dfrac{\partial}{\partial R^{-1}}\bigg\{\dfrac{T}{2}\ln |R^{-1}|-\dfrac{1}{2}\sum\limits_{t=1}^{T}{\rm tr}\left[R^{-1}\big((y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}))\right.\\[1mm] \qquad\times\left.(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1}))^{{T}}+\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\big)\right]\bigg\}=0\\[2mm] \dfrac{T}{2}R-\dfrac{1}{2}\sum\limits_{t=1}^{T}\bigg[\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}}\\[1mm] \qquad +~\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\bigg]=0\\[2mm] R=\dfrac{1}{T}\sum\limits_{t=1}^{T}\bigg[\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)\big(y_{t}-g(\boldsymbol{\theta}_{t|T}, \boldsymbol{X}_{t-1})\big)^{{T}}\\[1mm] \qquad +~\boldsymbol{G}_{t}\boldsymbol{P}_{t|T}\boldsymbol{G}_{t}^{{T}}\bigg] \end{cases} \end{align} $$ (31) where
$$ \begin{align} \begin{cases} \boldsymbol{\Gamma}=\sum\limits_{t=1}^{T}\left(\boldsymbol{\theta}_{t|T}\boldsymbol{\theta}_{t|T}^{{T}}+\boldsymbol{P}_{t|T}\right)\\[2mm] \boldsymbol{\Lambda}=\sum\limits_{t=1}^{T}\left(\boldsymbol{\theta}_{t-1|T}\boldsymbol{\theta}_{t-1|T}^{{T}}+\boldsymbol{P}_{t-1|T}\right)\\[2mm] \boldsymbol{\Upsilon}=\sum\limits_{t=1}^{T}\left(\boldsymbol{\theta}_{t|T}\boldsymbol{\theta}_{t-1|T}^{{T}}+\boldsymbol{P}_{t, t-1|T}\right). \end{cases} \end{align} $$ (32) It is significant to mention that the EM algorithm is applied to obtain maximum likelihood (ML) estimates of the above parameters and states, which can reduce the computational complexity and guarantee the convergence to a stationary point while continuously increasing the ML function. But EM algorithm is computationally expensive when the state dimension is high.
4. Simulation Results
To evaluate the performance of the presented approach, we predict the well-known Mackey-Glass time series through three cases. In the first case, the data is generated from the following Mackey-Glass equation:
$$ \begin{align} \dfrac{dy_{t}}{dt}=\dfrac{ay_{t-\tau_{0}}}{1+y_{t-\tau_{0}}^{c}}-by_{t} \end{align} $$ (33) where the parameters are chosen to be $a=0.2$ , $b=0.1$ , $c$ $=$ $10$ and $\tau_{0}=20$ . A thousand values are sampled and the series is shown in Fig. 2. This chaotic benchmark time series was studied in [1], [2]. We use the RBF-AR $(p, m, d)$ model to predict the nonlinear time series, where the parameters $p$ , $m$ and $d$ are defined as shown in model (2). The first 500 data points are used to train the model, and the last 500 data are used to test the model.
To make the comparisons between the SNPOM, the EKF and the EM-EKF, we predict the value $y_{t}$ from the fixed input vector $[y_{t-5}, y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1}]$ . Thus, an RBF-AR(5, 3, 2) model is used to predict the one-step ahead output of this complex nonlinear time series as follows:
$$ \begin{align} \begin{cases} y_{t}=\phi_{0}\, (\boldsymbol{X}_{t-1})+\sum\limits_{i=1}^{5}\phi_{i}\, (\boldsymbol{X}_{t-1})y_{t-i}+e_{t}\\[3mm] \phi_{i}\, (\boldsymbol{X}_{t-1})=\omega_{i, 0}+\sum\limits_{k=1}^{3}\omega_{i, k}\exp\left\{-\lambda_{k}\|\boldsymbol{X}_{t-1}-\boldsymbol{Z}_{k}\|_{2}^{2}\right\}\\[3mm] \boldsymbol{X}_{t-1}=[y_{t-1}, y_{t-2}]^{{T}}. \end{cases} \end{align} $$ (34) Table Ⅰ reports the comparison results of the proposed EM-EKF, the EKF and the SNPOM. The estimated results are the mean squared errors (MSEs), observation noise variance and the standard deviations (given in the parentheses). It is worth mentioning that Table Ⅰ gives two different estimation results using EKF in the cases of given different measurement noise variances. From Table Ⅰ we can see that the MSEs of the training data using EM-EKF are lesser than those using SNPOM and EKF, while the testing data is only slightly lesser. We attribute this to the fact that the SNPOM has good estimation results for clean data. Besides, the SNPOM obtains slightly better results than does the EKF in these conditions of given different measurement noise variances ( $R=0.002$ or $R=0.0002$ ). Furthermore, by comparison between the two conditions in the EKF, the closer the given measurement noise variance is to the true value ( $R=0$ ), the better the results are. This illustrates that setting the measurement noise appropriately leads to success or failure when using EKF. The estimate of observation noise variance using EM-EKF is close to the true value (zero), while the estimate of that using SNPOM is unknown, and the observation noise variance in the EKF must be given in advance. In Fig. 3, the top plot shows the increase of the log-likelihood at each step, the middle plot shows the convergence of the measurements noise variance $R$ , and the bottom plot shows the trace of the process noise variance $\boldsymbol{Q}$ . The initial conditions for the state $\boldsymbol{\theta}_{0}$ and the measurements noise variance $R$ are randomly chosen with a uniform distribution from the range [0,1] and [0, 0.01], respectively, and matrices $\boldsymbol{Q}$ and $\boldsymbol{P}$ are initialized as identity matrix and 100 times identity matrix, respectively.
表 Ⅰ COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIESTable Ⅰ COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIESMethod MSE (Training) MSE (Testing) $R$ SNPOM 1.0800E-7 1.2600E-7 unknown EKF 1.9560E-7 2.1786E-7 0.002 (given) EKF 1.2559E-7 1.9793E-7 0.0002 (given) EM-EKF 7.1765E-8 1.2008E-7 1.8146E-7 To examine the noise effect in the time series, the EM-EKF, the EKF and the SNPOM are used to estimate the Mackey-Glass chaotic time series corrupted by additive white noise with variance 0.25 and 1 in the following two cases (see Fig. 4). Table Ⅱ gives the performance of the EM-EKF compared to those of the SNPOM and the EKF on estimating the Mackey-Glass chaotic time series corrupted by additive white noise. From Table Ⅱ, one can see that the overall performance obtained by the EM-EKF is better than that of the SNPOM and the EKF. In detail, the EM-EKF has smaller values of MSEs than those of the SNPOM and the EKF for the training data and testing data, although the improvement is not significant. As for the EKF, when the given values of the measurement noise variance ( $R=0.1$ and $R=0.6$ ) deviate relatively far from the true values ( $R=0.25$ and $R=1$ ), the results become worse, even worse than those obtained by the SNPOM. On the contrary, when the given values ( $R=0.2$ and $R=0.8$ ) are relatively close to the true values, the EKF obtains slightly better results than does the SNPOM, but still obtains worse results than does the EM-EKF. This illustrates that setting of the measurement noise plays an important part in the use of the EKF again. In both cases, the values of measurements noise variance $R$ using EM-EKF are 0.25950 and 1.0589, respectively, which are very close to the true values. In contrast, the SNPOM can not estimate the values of measurement noise variance, and the values of measurements noise variance should be given beforehand in the use of the EKF. The initial conditions are the same as case one except that $R$ is randomly chosen with a uniform distribution from the range $[0,1]$ . Figs. 5 and 6 show the convergent process of the log-likelihood and noise variance.
表 Ⅱ COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIES CORRUPTES BY ADDITIVE WHITE NOISETable Ⅱ COMPARISON RESULTS FOR MACKEY-GLASS TIME SERIES CORRUPTES BY ADDITIVE WHITE NOISENoise variance Method MSE (Training) MSE (Testing) R 0.25 SNPOM 0.26606 0.28120 unknown EKF 0.26577 0.28082 0.2 (given) EKF 0.27343 0.28825 0.1 (given) EM-EKF 0.25750 0.27825 0.25950 1 SNPOM 0.97452 1.1644 unknown EKF 0.97215 1.1512 0.8 (given) EKF 0.98262 1.1785 0.6 (given) EM-EKF 0.96590 1.13907 1.0589 Statistical $F$ test is used to judge whether the error variance estimated by the EM-EKF is equal to the variance estimated by the SNPOM. As shown in Table Ⅲ, the values of the computing $F$ test statistic ( $F$ ) are greater than the table value ( $F_{\alpha=0.05}$ ) except for the testing data without additional white noise, so the null hypothesis that two variances are equal is rejected at level $\alpha=0.05$ . This indicates that there are significant differences between the above two methods and the performance of the EM-EKF is superior to the SNPOM.
表 Ⅲ THE RESULTS OF STATISTICAL F TEST AT LEVEL 0.05Table Ⅲ THE RESULTS OF STATISTICAL F TEST AT LEVEL 0.05Case $F_{\alpha=0.05}$ $F$ Results Case 1 (Training) 1.16 1.4642 Reject; Difference Case 1 (Testing) 1.16 1.0105 No reject; No difference Case 2 (Training) 1.16 14.1212 Reject; Significant difference Case 2 (Testing) 1.16 9.2984 Reject; Significant difference Case 3 (Training) 1.16 37.8297 Reject; Significant difference Case 3 (Testing) 1.16 15.5751 Reject; Significant difference To compare the computational complexity, Table Ⅳ lists the computational time for the three methods. The simulations are implemented on a computer (Inter (R) Core (TM)2 Duo CPU E7200 @2.53 GHz, 8 G-RAM). The average running time (100 iterations) for EM-EKF is approximately 30.766 seconds, for SNPOM (100 iterations) 10.690 seconds, and for EKF 0.40572 seconds. Obviously, the EM-EKF is more time-consuming than the other two methods since EM algorithm is computationally expensive for high-dimension states. Also, for a large RBF network, the computational expense of the EKF could be burdensome and it increases linearly with the number of training samples. Instead, the SNPOM is a hybrid method, depending partly on the Levenberg-Marquardt method (LMM) for nonlinear parameter optimization and partly on the least-squares method (LSM) for linear parameter estimation. The SNPOM can greatly accelerate the computational convergence of the parameter search process, especially for the RBF-type models with larger number of linear weights and smaller number of nonlinear parameters. However, the EM-EKF, as an alternative way, is superior to the SNPOM and the EKF and can accurately estimate the noise variance.
表 Ⅳ THE COMPUTATION TIME OF DIFFERENT METHODS (S)Table Ⅳ THE COMPUTATION TIME OF DIFFERENT METHODS (S)Method Time SNPOM 10.690 EKF 0.40572 EM-EKF 30.766 As a whole, the EM-EKF method is capable of estimating the parameters of the RBF-AR model, and the initial conditions and the noise variances are identified by use of the EM algorithm, which can further improve the modeling precision. Comparison results indicate that the RBF-AR model estimated by the EM-EKF makes more accurate predictions than do the SNPOM and the EKF, although the values of the MSEs using EM-EKF are only slightly smaller than those of the SNPOM and EKF in some cases. However, $F$ test shows there is significant difference between results obtained by the SNPOM and the EM-EKF. Furthermore, the estimation of observation noise variance using EM-EKF is close to the true value, while the estimate of that using SNPOM is unknown, and the observation noise variance in the EKF must be given in advance. Therefore, we can conclude that the EM-EKF method is an advisable choice for estimating RBF-AR model and is especially appropriate for signals disturbed by noise.
5. Conclusion
In this paper, the EM-EKF method is developed to estimate the parameter of RBF-AR model. Firstly, the model is reconstructed as a new type of general radial basis function neural networks. Secondly, to circumvent the EKF's limitation of unknown prior knowledge, the EM is proposed to calculate the initial states and the measurement and process noise variance. By combining the EM and extended Kalman filtering and smoothing process, the EM-EKF method is proposed to estimate the parameters of the RBF-AR model, the initial conditions and the noise variances jointly, and can further improve the modeling precision. Comparisons of the performance of the EM-EKF with the SNPOM and the EKF are performed, and the results indicate that the RBF-AR model estimated by the EM-EKF makes more accurate predictions than do the SNPOM and the EKF, although the EM-EKF is more time-consuming. Moreover, the estimate of observation variance converges to the true value. Finally, F test indicates there is significant difference between results obtained by the SNPOM and the EM-EKF. Our future work would develop the EM-EKF algorithm for RBF-ARX (Peng et al. [2]) estimation and apply the RBF-AR model based on the EM-EKF algorithm to other types of time series and complex systems analysis.
-
表 1 模型与现有解决方案对比
Table 1 The model is compared to existing solutions
表 2 当前面临的问题以及模型应对的方法
Table 2 The problems faced and the coping methods of the model
类型 面临问题 模型应对方法及分析 隐私和安全 信任和访问控制 提倡医疗数据电子化, 医疗数据由可信的代理负责记录 黑客攻击和医疗数据保护 采用非对称加密技术加密数据 不可抵赖性 周期性将信息锚定到比特币公链, 借助公链实现不可篡改 医疗数据滥用和诈骗 被滥用, 追责困难 模型是轮流责任制, 采用区块链技术方便追责 记录难以辨识 数字化电子病历即时存储, 机器可识别易辨认 不正当收费, 虚假声明等 每种类型客户端均可快捷查询, 获得统一且可信的结果 用户参与度 用户无法管理自身的健康数据 完全由用户管理自己的医疗数据, 代理重加密实现权利委托 公共卫生相关研究与用户无直接关联 开放自己的医疗数据, 做为研究素材推动相关研究 数据分别存储在不同的数据中心, 形成数据孤岛共享困难 模型打通各医疗机构之间的数据孤岛, 实现方便的数据互操作 互操作性, 可访问性, 数据完整性 各机构权限不明确, 数据所属权不明确 医疗联合服务器负责代理记账, 审计服务器负责校验记账.客户端分不同类型, 支持不同的访问限制.另外, 明确各个医疗机构的职责, 方便外部审计. 数据容易丢失, 造成数据不完整 分布式的存储节点, 保证数据的多重备份 规则 不同的数据标准和共享规则 统一的数据查询接口, 统一的数据标准, 可以实时数据共享 -
[1] Charles D, Gabriel M, Furukawa M F. Adoption of electronic health record systems among U.S. non-federal acute care hospitals: 2008-2012. ONC data brief, 2013, 9: 1-9 [2] Nakamoto S. Bitcoin: a peer-to-peer electronic cash system [Online], available: http://bitcoin.org/bitcoin.pdf, June 12, 2016 [3] Healthbank [Online], available: http://www.healthbank.coop, August 18, 2016 [4] Health [Online], available: https://gem.co/health, January 22, 2016 [5] China Blockchain Development Forum. China Blockchain Technology and Application Development White Paper (2016) [Online], available: http://chainb.com/download/工信部—中国区块链技术和应用发展白皮书1014. pdf, October 18, 2016
中国区块链技术和产业发展论坛. 中国区块链技术和应用发展白皮书(2016) [Online], available: http://chainb.com/download/工信部—中国区块链技术和应用发展白皮书1014. pdf, October 18, 2016[6] Lvan D. Moving toward a blockchain-based method for the secure storage of patient records [Online], available: http://www.healthit.gov/sites/default/files/9-16-drew_ivan_20160804_blockchain_for_healthcare_final.pdf, August 4, 2016 [7] Shrier A A, Chang A, Diakun-thibault N, Forni L, Landa F, Mayo J, van Riezen R. Blockchain and Health IT: Algorithms, Privacy, and Data. [Online], available: http://www.truevaluemetrics.org/DBpdfs/Technology/Blockchain/1-78-blockchainandhealthitalgorithmsprivacydata_whitepaper.pdf, September 18, 2016 [8] Kuo T T, Hsu C N, Ohno-Machado L. ModelChain: decentralized privacy-preserving healthcare predictive modeling framework on private blockchain networks [Online], available: https://www.healthit.gov/sites/default/files/10-30-ucsd-dbmi-onc-blockchain-challenge.pdf, January 22, 2016. [9] Ekblaw A, Azaria A, Halamka J D, Lippman A. A Case Study for Blockchain in Healthcare: "MedRec"prototype for electronic health records and medical research data. In: Proceedings of the 2016 IEEE of International Conference on Open and Big Data, 2016. 25-30 [10] Yuan B, Lin W, McDonnell C. Blockchains and electronic health records [Online], available: http://mcdonnell.mit.edu/blockchain_ehr.pdf, May 4, 2016 [11] Witchey N J. Healthcare Transaction Validation Via Blockchain Proof-of-Work, Systems and Methods: U.S. Patent 2015/0332283, November 2015. [12] 袁勇, 王飞跃.区块链技术发展现状与展望.自动化学报, 2016, 42(4): 481-494 http://www.aas.net.cn/CN/abstract/abstract18837.shtmlYuan Yong, Wang Fei-Yue. Blockchain: the state of the art and future trends. Acta Automatica Sinica, 2016, 42(4): 481-494 http://www.aas.net.cn/CN/abstract/abstract18837.shtml [13] Merkle R C. A digital signature based on a conventional encryption function. In: Proceedings of the 1987 Conference on the Theory and Applications of Cryptographic Techniques. Berlin Heidelberg, Germany: Springer, 1987. 369-378 [14] Blaze M, Bleumer G, Strauss M. Divertible protocols and atomic proxy cryptography. In: Proceedings of the 1998 International Conference on the Theory and Applications of Cryptographic Techniques. Berlin Heidelberg, Germany: Springer, 1998. 127-144 [15] Aono Y, Boyen X, Phong L T, Wang L. Key-private proxy re-encryption under LWE. In: Proceedings of the 2013 International Conference on Cryptology in India. Cham, Germany: Springer International Publishing, 2013. 1-18 [16] Blockchain: the chain of trust and its potential to transform healthcare-our point of view [Online], available: http://www.healthit.gov/sites/default/files/8-31-blockchain-ibm_ideation-challenge_aug8.pdf, August 8, 2016. [17] Snow P, Deery B, Lu J, Johnston D, Kirby P. Factom: business processes secured by immutable audit trails on the blockchain [Online], available: http://bravenewcoin.com/assets/Whitepapers/Factom-Whitepaper.pdf, November 17, 2014. -