2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

结合领域知识的因子分析: 在金融风险模型上的应用

冯栩 喻文健 李凌

冯栩, 喻文健, 李凌. 结合领域知识的因子分析: 在金融风险模型上的应用. 自动化学报, 2022, 48(1): 121−132 doi: 10.16383/j.aas.c200342
引用本文: 冯栩, 喻文健, 李凌. 结合领域知识的因子分析: 在金融风险模型上的应用. 自动化学报, 2022, 48(1): 121−132 doi: 10.16383/j.aas.c200342
Feng Xu, Yu Wen-Jian, Li Ling. Combining domain knowledge with statistical factor analysis: An application to financial risk modeling. Acta Automatica Sinica, 2022, 48(1): 121−132 doi: 10.16383/j.aas.c200342
Citation: Feng Xu, Yu Wen-Jian, Li Ling. Combining domain knowledge with statistical factor analysis: An application to financial risk modeling. Acta Automatica Sinica, 2022, 48(1): 121−132 doi: 10.16383/j.aas.c200342

结合领域知识的因子分析: 在金融风险模型上的应用

doi: 10.16383/j.aas.c200342
基金项目: 国家自然科学基金(61872206)资助
详细信息
    作者简介:

    冯栩:清华大学计算机科学与技术系博士研究生. 2017年获得清华大学计算机科学与技术系学士学位. 主要研究方向为数值线性代数算法, 机器学习, 大数据分析. E-mail: fx17@mails.tsinghua.edu.cn

    喻文健:清华大学计算机科学与技术系长聘教授. 2003年获得清华大学计算机科学与技术系博士学位, 随后留校任教. 主要研究方向为集成电路计算机辅助设计算法, 机器学习, 大数据分析算法、数值计算及其应用. 本文通信作者. E-mail: yu-wj@tsinghua.edu.cn

    李凌:加州理工学院计算机科学博士(辅修电子工程). 主要研究方向为机器学习, 量化投资, 自动化交易. E-mail: liling@flowam.com

Combining Domain Knowledge with Statistical Factor Analysis: An Application to Financial Risk Modeling

Funds: Supported by National Natural Science Foundation of China (61872206)
More Information
    Author Bio:

    FENG Xu Ph. D. candidate in the Department of Computer Science and Technology, Tsinghua University. He received his bachelor degree from Tsinghua University in 2017. His research interest covers numerical linear algebra algorithms, machine learning, and big-data analytics

    YU Wen-Jian Professor in the Department of Computer Science and Technology, Tsinghua University. He received his Ph. D. degree in Computer Science from Tsinghua University in 2003. His research interest covers EDA algorithm and software, machine learning, big-data analytics, and numerical algorithms and applications. Corresponding author of this paper

    LI Ling Ph. D. in computer science (minor in electrical engineering) from California Institute of Technology. His research interest covers machine learning, quantitative investing, and automated trading

  • 摘要: 因子分析是一种在工业领域广泛使用的统计学方法. 在金融资产管理中, 因子分析通过对历史价格波动的极大似然估计推导自适应的统计学因子来生成风险模型. 与通过使用预先设定具有经济学含义的因子来生成风险模型的基本面因子模型相比, 通过因子分析生成的模型不仅更灵活, 还能发现在基本面模型中缺失的因子. 然而, 由于因子分析所生成模型中的统计学因子缺少可解释性, 因此当金融数据中存在显著噪音时容易过拟合. 针对中国股市数据的风险模型生成问题, 本文提出快速因子分析算法以及将基本面因子结合到因子分析中的挑选基本面因子的混合因子分析方法, 使风险模型同时在因子探索及模型可解释性上达到最优. 实验结果显示快速因子分析方法能够达到31倍以上的加速比, 且新混合因子分析方法能够增大人造数据集以及真实数据集上预测的对数似然估计值. 在真实数据集上, 新方法能最好够达到平均对数似然估计值12.00, 比因子分析构建模型的7.56大4.44, 并且两个算法均值差值的标准差为1.58, 表现出新方法能构建更准确的风险模型.
  • 图  1  算法1、算法3和算法6 $(r = 0.9)$前30天风险模型在真实数据集上预测的对数似然估计值的结果

    Fig.  1  The predicted log-likelihood of the risk models estimated by Alg.1, Alg.3 and Alg.6 $(r = 0.9)$ on first 30 days

    表  1  算法1和算法2在第一个人造数据集上的实验结果

    Table  1  Results on first synthetic dataset of Alg.1 and Alg.2

    因子数算法$\delta=5$$\delta=10$
    时间 (s)${\rm E}\,(LL)$$\sigma\,({\rm E})$${\rm E}\,(iter)$加速比时间 (s)$ {\rm E}\,(LL)$$\sigma\,( {\rm E})$$ {\rm E}\,(iter)$加速比
    $s=10$算法1 (FA)444.21−1927.6310.01709.02252.07−2059.106.30405.51
    $s=10$算法2 (FFA)14.05−1927.6310.01709.0131.67.80−2059.106.30405.5532.3
    $s=13$算法1 (FA)653.60−1861.0211.481019.66313.86−2027.426.29492.91
    $s=13$算法2 (FFA)20.38−1863.7311.391020.7732.19.58−2027.086.24492.7932.8
    下载: 导出CSV

    表  2  算法1、算法2、算法3和算法4在第二个人造数据集上的实验结果

    Table  2  Results on second synthetic dataset of Alg.1, Alg.2, Alg.3 and Alg.4

    因子数算法$\delta=3$$\delta=5$$\delta=10$
    时间 (s)${\rm E}\,(LL)$$\sigma\,( {\rm E})$时间 (s)$ {\rm E}\,(LL)$$\sigma\,( {\rm E})$时间 (s)$ {\rm E}\,(LL)$$\sigma\,( {\rm E})$
    $s=10,f=0$算法1 (FA)484.94−3789.0328.85389.27−3866.5924.26227.39−4116.6814.46
    $s=10,f=0$算法2 (FFA)207.17−3789.0228.85171.58−3866.6024.6599.19−4116.6814.46
    $s=0,f=10$算法3 (OLS)0.23−3734.4122.600.23−3815.2319.050.23−4072.6011.39
    $s=13,f=0$算法1 (FA)779.07−3617.3433.82562.08−3732.6125.96279.78−4046.2513.46
    $s=13,f=0$算法2 (FFA)331.09−3616.6433.72247.17−3731.4926.16121.48−4045.9213.46
    $s=3,f=10$算法4 (HFA)123.49−3564.7225.9492.44−3678.2720.2348.63−4002.8110.52
    下载: 导出CSV

    表  3  算法1、算法2、算法3、算法4及算法6在第三个人造数据集上的实验结果

    Table  3  Results on third synthetic dataset of Alg.1, Alg.2, Alg.3, Alg.4 and Alg.6

    因子数算法$\delta=3$$\delta=5$$\delta=10$
    时间 (s)$ {\rm E}\,(LL)$$\sigma\,( {\rm E})$时间 (s)$ {\rm E}\,(LL)$$\sigma\,( {\rm E})$时间 (s)$ {\rm E}\,(LL)$$\sigma\,( {\rm E})$
    $s=10,f=0$算法1 (FA)693.57−3594.5126.64507.47−3709.9420.81263.41−4019.6710.86
    $s=10,f=0$算法2 (FFA)290.06−3593.8426.97215.58−3703.3921.36105.50−4019.9310.88
    $s=0,f= 10$算法3 (OLS)0.24−3712.7522.240.25−3796.1918.780.24−4059.7111.19
    $s+f=10$算法6 (HFA+)1726.79−3561.0425.261307.14−3683.0020.75596.52−4005.3411.34
    $s=13,f=0$算法1 (FA)845.15−3389.0030.54721.67−3550.6521.55312.41−3955.4710.07
    $s=13,f=0$算法2 (FFA)351.61−3389.8230.89304.29−3549.5321.45123.93−3955.4710.07
    $s=3,f=10$算法4 (HFA)111.30−3536.5228.3981.30−3661.6322.1148.15−3993.7011.43
    $s+f=13$算法6 (HFA+)2314.11−3378.8632.361826.26−3522.2321.72796.98−3933.0810.42
    下载: 导出CSV

    表  4  算法1、算法2、算法3和算法6在真实数据集上的实验结果比较

    Table  4  Results on real-world dataset of Alg.1, Alg.2, Alg.3 and Alg.6

    因子数算法$r$时间 (s)$ {\rm E}\,({LL})$$\sigma\,( {\rm E})$$ {\rm E}\,(LL_{Alg.6} - LL_{Alg.2})$$\sigma( {\rm E}\,(LL_{Alg.6} - LL_{Alg.2}))$
    $s=15, f=0$算法1 (FA)1.617.569.06
    $s=15, f=0$算法2 (FFA)0.557.549.06
    $s=0, f=22$算法3 (OLS)0.01−108.015.50
    $s+f=15$算法6 (HFA+)$0.6$10.8211.968.184.401.64
    $0.7$10.9011.868.204.291.61
    $0.8$10.4011.778.244.211.58
    $0.9$10.5112.008.254.441.58
    下载: 导出CSV
  • [1] Alexander C. Market Models: A guide to financial data analysis. John Wiley & Sons, 2001
    [2] Alexander C. Market Risk Analysis, Practical Financial Econometrics. John Wiley & Sons, 2008
    [3] MSCI Barra. Barra Risk Model Handbook. MSCI Barra Applied Research, 2007, 43
    [4] Christoffersen P, Goncalves S. Estimation Risk in Financial Risk Management. CIRANO, 2004.
    [5] Christoffersen P, Diebold F. How relevant is volatility forecasting for financial risk management? Review of Economic and Statistics, 2000, 82(1): 12-22 doi: 10.1162/003465300558597
    [6] Higgins R C, Reimers M. Analysis for Financial Management. Number 53. Irwin Chicago, 1995
    [7] Smith C W, Smithson C W, Wilford D S. Managing Financial Risk. Irein Burr Ridge, 1995
    [8] Connor G. The three types of factor models: A comparison of their explanatory power. Finanacial Analysis Journal, 1995, 51(3): 42-46 doi: 10.2469/faj.v51.n3.1904
    [9] Boyer M M, Filion D. Common and fundamental factors in stock returns of Canadian oil and gas companies. Energy Economics, 2007, 29(3): 428-453 doi: 10.1016/j.eneco.2005.12.003
    [10] Dechow P M, Hutton A P, Meubroek L, Sloan R G. Short-sellers, fundamental analysis, and stock returns. Journal of financial Economics, 2001, 61(1): 71-106
    [11] Doshi-Velez F, Kim B. Towards a Rigorous science of interpretable machine learning. arXiv preprint arXiv: 1702.08608, 2017
    [12] Molnar C. Interpretable Machine Learning: A guide for making black box models explainable [Online]. available: https://christophm.github.io/interpretable-ml-book, 2020
    [13] Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT press, 2016.
    [14] Dempster A P. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, 1977, 39(1): 1-38
    [15] Rubin D and Thayer D. EM algorithms for ML factor analysis Psychometrika, 1982, 47(1): 69-76 doi: 10.1007/BF02293851
    [16] Ghahramani Z, Hinton G. The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, Canada, 1996
    [17] Kaiser H. The application of electronic computers to factor analysis. Educational and Psychological Measurement, 1960, 20(1): 141-151 doi: 10.1177/001316446002000116
    [18] Roweis S, Ghahramani Z. A unifying review of linear Gaussian models. Nueral Computation, 1999, 11(2): 305-345 doi: 10.1162/089976699300016674
    [19] Saqib U, Gannot S, Jensen J R. Estimation of Acoustic Echoes Using Expectation-Maximization Methods. Eurasip Journal on Audio, Speech, and Music Processing, 2020, 2020(1): 1-15 doi: 10.1186/s13636-019-0169-5
    [20] Sun Z, Yang Y. An EM Approach to Non-autoregressive Conditional Sequence Generation. In: Proceedings of the 37th International Conference on Machine Learning. arXiv: 2006.16378, 2020
    [21] Nan Y, Quan Y, Jim H. Variational-EM-Based Deep Learning for Noise-Blind Image Deblurring. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 3626−3635
    [22] 席燕辉, 彭辉, 莫红. 基于EM-EKF算法的RBF-AR模型参数估计. 自动化学报, 2017, 43(9): 1636-1643. doi: 10.16383/j.aas.2017.e160216

    Yanhui Xi, Hui Peng, Hong Mo. Parameter Estimation of RBF-AR Model Based on the EM-EKF Algorithm. Acta Automatica Sinica, 2017, 43(9): 1636-1643. doi: 10.16383/j.aas.2017.e160216
    [23] 马新科, 杨扬, 杨昆, 罗毅. 基于模糊形状上下文与局部向量相似性约束的配准算法. 自动化学报, 2020, 46(2): 342-357. doi: 10.16383/j.aas.c180118

    MA Xin-Ke, YANG Yang, YANG Kun, LUO Yi. Registration Algorithm Based on Fuzzy Shape Context and Local Vector Similarity Constraint. Acta Automatica Sinica, 2020, 46(2): 342-357. doi: 10.16383/j.aas.c180118
    [24] 姚红革, 董泽浩, 喻钧, 白小军. 深度EM胶囊网络全重叠手写数字识别与分离. 自动化学报, DOI: 10.16383/j.aas.c190849

    Yao Hong-Ge, Dong Ze-Hao, Yu Jun, Bai Xiao-Jun. Fully overlapped handwritten number recognition and separation based on deep EM capsule network. Acta Automatica Sinica, DOI: 10.16383/j.aas.c190849
    [25] Thomposon B. Exploratory and confirmatory factor analysis: Under concepts and applications. American Psycholohical Association, 2004
    [26] 郭武, 李轶杰, 戴礼荣, 王仁华. 说话人识别中的因子分析以及空间拼接. 自动化学报, 2009, 35(9): 1193-1198. doi: 10.3724/SP.J.1004.2009.01193

    Guo Wu, Li Yi-Jie, Dai Li-Rong, Wang Ren-Hua. Factor Analysis and Space Assembling in Speaker Recognition. Acta Automatica Sinica, 2009, 35(9): 1193-1198. doi: 10.3724/SP.J.1004.2009.01193
    [27] Gonzalez J A, et al. A silent speech system based on permanent magnet articulography and direct synthesis. Computer Speech & Language, 2016, 39: 67-87
    [28] Li Y, Dixit M, Vasconcelos N. Deep scene image classification with the MFAFVNet. In: Proceedings of the 2017 IEEE International Conference on Computer Vision, 2017: 5746−5754
    [29] Kesteren E van, Kievit R A. Exploratory Factor Analysis with Structured Residuals for Brain Network Data. Network Neuroscience, 2020: 1-45
  • 加载中
图(1) / 表(4)
计量
  • 文章访问数:  1608
  • HTML全文浏览量:  334
  • PDF下载量:  165
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-05-22
  • 录用日期:  2020-12-31
  • 网络出版日期:  2021-02-04
  • 刊出日期:  2022-01-25

目录

    /

    返回文章
    返回