社会心理学启发的多模态人格评分预测方法研究

李琳; 周阳; 王聪慧; 汪志浩; 田浩

doi:10.16383/j.aas.c250374

社会心理学启发的多模态人格评分预测方法研究

doi: 10.16383/j.aas.c250374 cstr: 32138.14.j.aas.c250374

李琳^1,,
周阳^1,,
王聪慧^1,,
汪志浩^1,,
田浩^2,

1.
武汉理工大学计算机与人工智能学院武汉 430070
2.
湖北经济学院数字金融创新湖北省重点实验室武汉 430205

基金项目: 国家自然科学基金(62276196)资助

详细信息

作者简介:
李琳：武汉理工大学教授. 主要研究方向为信息检索与推荐系统, 数据挖掘与模式识别和多模态机器学习. E-mail: cathylilin@whut.edu.cn

周阳：武汉理工大学硕士研究生. 主要研究方向为自然语言处理和情感计算. E-mail: ychow@whut.edu.cn

王聪慧：武汉理工大学硕士研究生. 主要研究方向为自然语言处理和多模态机器学习. E-mail: wch6606@csepdi.com

汪志浩：武汉理工大学硕士研究生. 主要研究方向为多模态机器学习和信息检索与推荐系统. E-mail: gm2wzh@gmail.com

田浩：湖北经济学院教授. 主要研究方向为金融风险, 服务发现与推荐和机器学习. 本文通信作者. E-mail: th@hbue.edu.cn

计量
- 文章访问数: 7
- HTML全文浏览量: 5
- 被引次数: 0
出版历程
- 收稿日期: 2025-08-14
- 录用日期: 2026-05-13
- 网络出版日期: 2026-07-02

Multimodal Personality Rating Prediction Method Inspired by Social Psychology

LI Lin^1
,,
ZHOU Yang^1
,,
WANG Cong-Hui^1
,,
WANG Zhi-Hao^1
,,
TIAN Hao^2
,

1.
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
2.
Hu-bei Key Laboratory of Digital Finance Innovation, Hubei University of Economics, Wuhan 430205, China

Funds: Supported by National Natural Science Foundation of China (62276196)

More Information

Author Bio:
LI Lin　Professor at School of Computer Science and Artificial Intelligence, Wuhan University of Technology. Her research interest covers information retrieval and recommender systems, data mining and pattern recognition, and multimodal machine learning

ZHOU Yang　Master student at School of Computer Science and Artificial Intelligence, Wuhan University of Technology. His research interest covers natural language processing and affective computing

WANG Cong-Hui　Master student at School of Computer Science and Artificial Intelligence, Wuhan University of Technology. Her research interest covers natural language processing and multimodal machine learning

WANG Zhi-Hao　Master student at School of Computer Science and Artificial Intelligence, Wuhan University of Technology. His research interest covers multimodal machine learning, information retrieval, and recommender systems

TIAN Hao　Professor at Hubei Key Laboratory of Digital Finance Innovation, Hubei University of Economics. His research interest covers financial risk, service discovery and recommendation, and machine learning. Corresponding author of this paper

摘要

摘要: 人格特质作为个体在思想、情感和行为模式上独特且相对稳定的心理特征, 是理解和预测人类行为的重要维度. 多模态人格评分预测研究已成为心理学、社会学与计算科学交叉融合的前沿热点. 然而, 现有评分预测方法在捕捉个体稳定人格特质时, 常因行为表现中的非典型成分(如停顿、思考或环境噪声)而产生偏差, 影响了人格特质多维度评分预测的准确性. 针对这一问题, 受认知−情感人格系统(Cognitive-Affective Personality System, CAPS)理论启发, 提出一种多模态人格评分预测框架EBPNet(Emotion-Behavior-based Personality Network). 该框架充分利用社会情境对人格表现的调节作用, 通过构建上下文情境感知模块, 系统整合视频数据中的动态情境发展过程, 减少了非典型行为对人格特质评分预测的影响. 同时, 框架融合视觉大模型的细粒度情感分析能力, 精确提取情绪演变轨迹与微表情特征, 并与语音转录文本形成多类型数据的协同评分预测, 提升了对个体情感-行为时序模式的建模能力. 通过显式建模社会情境与多模态行为数据的交互关系, 该框架实现了人格特质的多维度评分预测. 实验结果表明, EBPNet在目前广泛认可的多模态人格分析数据集First Impressions V2上的表现优于现有基线模型, 验证了社会心理学启发的多维度评分预测方法的有效性.
- 多模态人格评分 /
- 认知-情感人格系统 /
- 情感分析 /
- 情境感知
Abstract: Personality traits, as the unique and relatively stable psychological characteristics of individuals in thought, emotion, and behavioral patterns, are crucial dimensions for understanding and predicting human behavior. Multimodal personality rating prediction has become a frontier hotspot in the interdisciplinary integration of psychology, sociology, and computing. However, existing methods often suffer from biases when capturing stable personality traits due to atypical behavioral components in behavioral expressions, such as pauses, hesitations, or environmental noise, affecting the accuracy of multi-dimensional personality trait rating. Inspired by the Cognitive-Affective Personality System (CAPS) theory, EBPNet is proposed as a multimodal personality rating prediction framework. It leverages social context's regulatory effect on personality expression through a context-aware module that integrates dynamic contextual processes in video data, reducing interference from atypical behaviors on personality trait rating. Meanwhile, the framework integrates the fine-grained emotional analysis capabilities of vision large models to extract emotional evolution trajectories and micro-expression features, enabling collaborative rating across multiple data modalities with speech transcription text and improving the modeling of individual emotion-behavior temporal patterns. Through explicit modeling of the interaction between social contexts and multi-modal behavioral data, the framework achieves multi-dimensional precise rating prediction of personality traits. Experimental results demonstrate that EBPNet outperforms existing baseline models on the widely used multimodal personality analysis dataset First Impressions V2, validating the effectiveness of the social psychology-inspired multi-dimensional rating.
- Multimodal personality rating /
- cognitive-affective personality system /
- emotional analysis /
- context-aware
注释:

1) 1¹https://chalearnlap.cvc.uab.cat/challenge/14/description/

2) 2²https://huggingface.co/openai/clip-vit-large-patch14

HTML全文

图 1 个体特定行为模式生成过程

Fig. 1 The generation process of individual-specific behavioral patterns

下载: 全尺寸图片幻灯片

图 2 EBPNet整体框架图

Fig. 2 Overview of our EBPNet

下载: 全尺寸图片幻灯片

图 3 VideoLLaMA3生成摘要文本示例

Fig. 3 An example of summary text generated by VideoLLaMA3

下载: 全尺寸图片幻灯片

图 4 人格特质空间对齐模块可解释性分析图

Fig. 4 Interpretability analysis of the personality trait space alignment module

下载: 全尺寸图片幻灯片

表 1 First Impressions V2数据集统计信息

Table 1 Statistics of the First Impressions V2 dataset

统计项	数量/信息
数据集样本总数	10 000个
训练集样本总数	6 000个
验证集样本总数	2 000个
测试集样本总数	2 000个
标签个数	5个
数据模态	视频和文本
采集YouTube视频数	约3 000条
同源视频片段上限	6条
视频时长	15秒
平均转录文本单词个数	43个

下载: 导出CSV

表 2 软硬件实验环境

Table 2 Software and hardware experimental configuration

实验环境	参数	配置
硬件环境	GPU	NVIDIA TITAN Xp
硬件环境	CPU	Intel^(R) Xeon^(R) CPU E5-2650 V4 @ 2.20GHz
硬件环境	内存容量	512GB
硬件环境	显存容量	12GB
软件环境	操作系统	CentOS 7.2.1511(Core)
软件环境	Python	3.10
软件环境	PyTorch	2.4.0
软件环境	CUDA	12.1
软件环境	cuDNN	8.9.6

下载: 导出CSV

表 3 EBPNet框架与其他基线模型在Acc上效果对比表

Table 3 A performance comparison among EBPNet and baselines at Acc

模型	开放性$\uparrow$	尽责性$\uparrow$	外倾性$\uparrow$	宜人性$\uparrow$	神经质性$\uparrow$	平均结果$\uparrow$
NJU-LAMDA^[14]	91.23	91.66	91.33	91.26	91.00	91.30
Evolgen^[15]	91.17	91.19	91.50	91.19	90.99	91.21
DRN^[16]	91.11	91.38	91.07	91.02	90.89	91.09
CR-Net^[12]	91.95	92.18	92.02	91.77	91.46	91.88
EMP^[20]	91.72	92.05	92.10	91.52	91.68	91.81
PCENet^[13]	92.15	92.33	92.21	92.38	92.19	92.25
AMIF-Net^[17]	92.04	91.79	92.01	92.24	92.19	92.05
EBPNet(ours)	92.20	92.84	92.67	92.28	92.25	92.45

下载: 导出CSV

表 5 EBPNet框架与其他基线模型在PCC上效果对比表

Table 5 A performance comparison among EBPNet and baselines at PCC

模型	开放性$\uparrow$	尽责性$\uparrow$	外倾性$\uparrow$	宜人性$\uparrow$	神经质性$\uparrow$	平均结果$\uparrow$
NJU-LAMDA^[14]	0.36	0.45	0.43	0.37	0.34	0.39
DRN^[16]	0.25	0.20	0.36	0.12	0.25	0.24
CR-Net^[12]	0.62	0.60	0.59	0.51	0.47	0.56
EMP^[20]	0.52	0.58	0.63	0.42	0.55	0.54
PCENet^[13]	0.65	0.65	0.69	0.57	0.68	0.65
AMIF-Net^[17]	0.58	0.61	0.61	0.49	0.61	0.58
EBPNet(ours)	0.77	0.69	0.73	0.67	0.72	0.72

下载: 导出CSV

表 4 Acc多种子统计比较

Table 4 Acc comparison across multiple seeds

模型	均值$\uparrow$	标准差	差值(vs PCENet)	p值(vs PCENet)
EBPNet(ours)	92.45	$\pm$0.08	-	-
PCENet^[13]	92.25	$\pm$0.12	0.20	0.084

下载: 导出CSV

表 6 PCC多种子统计比较

Table 6 PCC comparison across multiple seeds

模型	均值$\uparrow$	标准差	差值(vs PCENet)	p值(vs PCENet)
EBPNet(ours)	0.720	$\pm$0.008	-	-
PCENet^[13]	0.650	$\pm$0.013	0.070	0.002

下载: 导出CSV

表 7 EBPNet框架在Acc上消融实验结果

Table 7 Ablation experimental results of EBPNet at Acc

模型	开放性$\uparrow$	尽责性$\uparrow$	外倾性$\uparrow$	宜人性$\uparrow$	神经质性$\uparrow$	平均结果$\uparrow$
EBPNet(ours)	92.20	92.84	92.67	92.28	92.25	92.45
w/o co-label	92.10	92.68	92.50	92.15	92.05	92.30
zero-context	91.98	92.45	92.20	92.05	91.82	92.06
w/o caption	91.88	92.42	92.18	91.98	91.75	92.04
w/o context	91.50	92.18	91.95	91.85	91.50	91.80

下载: 导出CSV

表 8 EBPNet框架在PCC上消融实验结果

Table 8 Ablation experimental results of EBPNet at PCC

模型	开放性$\uparrow$	尽责性$\uparrow$	外倾性$\uparrow$	宜人性$\uparrow$	神经质性$\uparrow$	平均结果$\uparrow$
EBPNet(ours)	0.77	0.69	0.73	0.67	0.72	0.72
w/o co-label	0.64	0.60	0.58	0.59	0.66	0.61
zero-context	0.65	0.57	0.59	0.52	0.66	0.60
w/o caption	0.60	0.55	0.55	0.50	0.64	0.57
w/o context	0.56	0.50	0.47	0.45	0.61	0.52

下载: 导出CSV

表 9 上下文模块引入前后的预测一致性对比

Table 9 Intra-person prediction consistency comparison with and without the context-aware module

模型	开放性$\downarrow$	尽责性$\downarrow$	外倾性$\downarrow$	宜人性$\downarrow$	神经质性$\downarrow$	平均结果$\downarrow$
w/o context	0.067	0.070	0.077	0.065	0.080	0.072
EBPNet(ours)	0.041	0.038	0.036	0.043	0.034	0.038

下载: 导出CSV

表 10 典型心理学预期关系的预测一致性验证

Table 10 Verification of prediction consistency for typical psychological expected relationships

心理学预期	预期方向	真实标签	模型预测	是否一致
外倾性(E) $\leftrightarrow$神经质性(N)	负相关	−0.22	−0.27	一致
尽责性(C) $\leftrightarrow$开放性(O)	正相关	+0.18	+0.22	一致
宜人性(A) $\leftrightarrow$外倾性(E)	正相关	+0.18	+0.22	一致

下载: 导出CSV

表 11 基于不同Prompt的视觉大模型实验结果

Table 11 Experimental results of visual large models based on different prompts

模型	Prompt	Micro-F1
模型	Prompt	开放性$ \uparrow $	尽责性$ \uparrow $	外倾性$ \uparrow $	宜人性$ \uparrow $	神经质性$ \uparrow $	平均结果$ \uparrow $
Ovis2-8B^[36]	P1提示策略	0.39	0.38	0.35	0.42	0.36	0.38
Ovis2-8B^[36]	P2提示策略	0.46	0.46	0.43	0.50	0.39	0.45
Ovis2-8B^[36]	P3提示策略	0.48	0.48	0.45	0.51	0.42	0.46
InternVL2_5-8B^[37]	P1提示策略	0.48	0.49	0.45	0.52	0.43	0.48
InternVL2_5-8B^[37]	P2提示策略	0.63	0.64	0.61	0.64	0.49	0.61
InternVL2_5-8B^[37]	P3提示策略	0.67	0.65	0.63	0.67	0.51	0.63
VideoLLaMA3-7B^[33]	P1提示策略	0.44	0.43	0.41	0.48	0.40	0.43
VideoLLaMA3-7B^[33]	P2提示策略	0.53	0.54	0.52	0.57	0.47	0.53
VideoLLaMA3-7B^[33]	P3提示策略	0.58	0.59	0.56	0.60	0.48	0.56
Qwen2.5-VL-7B^[38]	P1提示策略	0.56	0.48	0.41	0.63	0.46	0.51
Qwen2.5-VL-7B^[38]	P2提示策略	0.65	0.68	0.63	0.66	0.46	0.62
Qwen2.5-VL-7B^[38]	P3提示策略	0.76	0.71	0.69	0.70	0.51	0.67

下载: 导出CSV

A1 基于CAPS理论的三种Prompt提示策略设计示例

A1 Design Examples of Three Prompt Strategies Based on CAPS Theory

提示策略	核心Prompt内容	设计意图
P1基础预测	You are a personality assessment expert. Please watch this video carefully and predict the speaker's Big Five personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism). For each trait, assign a level (Low / Medium / High) and briefly explain your reasoning based on what you observe in the video. Make sure to provide a rationale for each trait prediction.	基准提示策略, 不引入情境或行为提示, 仅要求模型从视频中自主提取特征进行人格预测, 评估基础预测能力
P2情境−行为引导	You are a personality assessment expert. Please watch this video carefully. Pay close attention to the following aspects: (1) the social context and environment in which the speaker is situated; (2) the speaker's behavior patterns, including gestures, facial expressions, and speech style; (3) the speaker's responses to the context and any dynamic interactions. Based on these observations, assign a level (Low / Medium / High) for each of the Big Five personality traits and explain how each piece of evidence supports your prediction.	在P1基础上增加显式情境与行为分析, 引导模型同时考虑情境和行为特征, 体现CAPS理论中情境对人格表现调节作用
P3多轮渐进式分析	Stage 1–Contextual Analysis: Please describe the social and environmental context of this video in detail. Identify the setting, the social situation, and any relevant background cues. Stage 2– Behavioral Analysis: Based on the context, analyze the speaker's behavior in detail. Include gestures, facial expressions, speech patterns, and how the speaker interacts with the environment or other individuals. Stage 3–Overall Personality Prediction: Integrate your analysis of both context and behavior to predict the speaker's Big Five personality traits (Low / Medium / High for each trait). For each trait, provide a detailed rationale explaining how the observed evidence supports your prediction.	分三轮逐步引导模型分析: 第一轮提取情境特征, 第二轮分析行为, 第三轮整合情境和行为给出人格预测及详细依据, 较完整体现CAPS理论中情境–行为–人格推断链条

下载: 导出CSV

参考文献(38)

[1]	Masumura R, Orihashi S, Ihori M, Tanaka T, Makishima N, Suzuki S, et al. Multimodal fine-grained apparent personality trait recognition: Joint modeling of Big Five and questionnaire item-level scores. In: Proceedings of the 39th AAAI Conference on Artificial Intelligence. Philadelphia, USA: AAAI Press, 2025. 1456−1464
[2]	Alves G, Jannach D, Soares de Souza L, Garcia Manzato M. Towards personality-aware explanations for music recommendations using generative AI. In: Proceedings of the 19th ACM Conference on Recommender Systems. Prague, Czech Republic: ACM, 2025. 684−689
[3]	Wang X L, Li B, Dong J T, Lin Z J, Xing X J. PTDLRec: A recommendation model integrating personality traits and deep learning. Neurocomputing, 2025, 652: Article No. 131083 doi: 10.1016/j.neucom.2025.131083
[4]	Bi W H, Kou F F, Shi L, Li Y W, Li H S, Chen J P, et al. Leveraging the dual capabilities of LLM: LLM-enhanced text mapping model for personality detection. In: Proceedings of the 39th AAAI Conference on Artificial Intelligence. Philadelphia, USA: AAAI Press, 2025. 23487−23495
[5]	Zhang T Y, Qi T H, Koutsoumpis A, Zong Y, Zheng W M, Oostrom J K, et al. Assessing personality traits and interview performance from asynchronous video interviews. In: Proceedings of the 33rd ACM International Conference on Multimedia. Dublin, Ireland: ACM, 2025. 13895−13900
[6]	Li J, Wang Y, Qian W H, Hu J L, Hu Z Z, Hong R C, Wang M. Listening to the unspoken: Exploring “365” aspects of multimodal interview performance assessment. In: Proceedings of the 33rd ACM International Conference on Multimedia. Dublin, Ireland: ACM, 2025. 13909−13916
[7]	Carlyn M. An assessment of the Myers-Briggs type indicator. Journal of Personality Assessment, 1977, 41(5): 461−473
[8]	Matise M. The enneagram: An innovative approach. Journal of Professional Counseling: Practice, Theory & Research, 2007, 35(1): 38−58 doi: 10.1080/15566382.2007.12033832
[9]	Fiske D W. Consistency of the factorial structures of personality ratings from different sources. The Journal of Abnormal and Social Psychology, 1949, 44(3): 329−344 doi: 10.1037/h0057198
[10]	Tupes E C, Christal R E. Recurrent personality factors based on trait ratings. Journal of Personality, 1992, 60(2): 225−251 doi: 10.21236/ad0267778
[11]	Mocnik G, Rehberger A, Smogavc Z, Mlakar I, Smrke U, Mocnik S. Multimodal observable cues in mood, anxiety, and borderline personality disorders: A review of reviews to inform explainable AI in mental health. Frontiers in Artificial Intelligence, 2025, 8: Article No. 1696448 doi: 10.3389/frai.2025.1696448
[12]	Li Y N, Wan J, Miao Q G, Escalera S, Fang H J, Chen H Z, et al. CR-Net: A deep classification-regression network for multimodal apparent personality analysis. International Journal of Computer Vision, 2020, 128(12): 2763−2780 doi: 10.1007/s11263-020-01309-y
[13]	Zhu Y F, Wei Y T, Li M L, Zhang T T, Wei S Q, Wu B. PCENet: Psychological clues exploration network for multimodal personality assessment. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. Birmingham, United Kingdom: ACM, 2023. 3667−3676
[14]	Zhang C L, Zhang H, Wei X S, Wu J X. Deep bimodal regression for apparent personality analysis. In: Proceedings of Computer Vision–ECCV 2016 Workshops. Cham: Springer, 2016. 311−324
[15]	Subramaniam A, Patel V, Mishra A, Balasubramanian P, Mittal A. Bi-modal first impressions recognition using temporally ordered deep audio and stochastic visual features. In: Proceedings of Computer Vision–ECCV 2016 Workshops. Cham: Springer, 2016. 337−348
[16]	Güçlütürk Y, Güçlü U, van Gerven M A J, van Lier R. Deep impression: Audiovisual deep residual networks for multimodal apparent personality trait recognition. In: Proceedings of Computer Vision–ECCV 2016 Workshops. Cham: Springer, 2016. 349−358
[17]	Bao Y T, Liu X, Qi Y, Liu R J, Li H J. Adaptive information fusion network for multi-modal personality recognition. Computer Animation and Virtual Worlds, 2024, 35(3): Article No. e2268 doi: 10.1002/cav.2268
[18]	Costa P T, McCrae R R. The revised NEO Personality Inventory (NEO-PI-R). The SAGE Handbook of Personality Theory and Assessment: Volume 2–Personality Measurement and Testing. Thousand Oaks, USA: SAGE Publications, 2008. 179−198
[19]	Digman J M. Higher-order factors of the Big Five. Journal of Personality and Social Psychology, 1997, 73(6): 1246−1256
[20]	Wang Y S, Li D Y, Funakoshi K, Okumura M. EMP: Emotion-guided multi-modal fusion and contrastive learning for personality traits recognition. In: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. Thessaloniki, Greece: ACM, 2023. 243−252
[21]	Wang R Q, Zhao X L, Xu X Y, Hao Y. A multimodal personality prediction framework based on adaptive graph transformer network and multi-task learning. Computer Graphics Forum, 2025, 44(2): Article No. e70030 doi: 10.1111/cgf.70030
[22]	Yang L J, Yu C, Huang C X, Zhang F Y, Liu R, Wen Z F, et al. Enhancing multimodal personality assessment with LLM-augmented hierarchical fusion. In: Proceedings of the 33rd ACM International Conference on Multimedia. Dublin, Ireland: ACM, 2025. 13917−13923
[23]	郭浩, 李欣奕, 唐九阳, 郭延明, 赵翔. 自适应特征融合的多模态实体对齐研究. 自动化学报, 2024, 50(4): 758−770 doi: 10.16383/j.aas.c210518 Guo Hao, Li Xin-Yi, Tang Jiu-Yang, Guo Yan-Ming, Zhao Xiang. Adaptive feature fusion for multi-modal entity alignment. Acta Automatica Sinica, 2024, 50(4): 758−770 doi: 10.16383/j.aas.c210518
[24]	Zhang L, Peng S, Winkler S. PersEmoN: A deep network for joint analysis of apparent personality, emotion and their relationship. IEEE Transactions on Affective Computing, 2022, 13(1): 298−305 doi: 10.1109/TAFFC.2019.2951656
[25]	Principi R D P, Palmero C, Junior J C S J, Escalera S. On the effect of observed subject biases in apparent personality analysis from audio-visual signals. IEEE Transactions on Affective Computing, 2021, 12(3): 607−621 doi: 10.1109/taffc.2019.2956030
[26]	Tang B, Pan K Q, Zheng M, Zhou N, Sui J L, Zhu D D, et al. Pose as a modality: A psychology-inspired network for personality recognition with a new multimodal dataset. In: Proceedings of the 39th AAAI Conference on Artificial Intelligence. Philadelphia, USA: AAAI Press, 2025. 1538−1546
[27]	Zatarain Cabada R, Cardenas Lopez H M, Escalante H J. Multimodal personality recognition for affective computing. Multimodal Affective Computing: Technologies and Applications in Learning Environments. Cham: Springer, 2023. 173−208
[28]	Sun X, Huang J, Zheng S X, Rao X H, Wang M. Personality assessment based on multimodal attention network learning with category-based mean square error. IEEE Transactions on Image Processing, 2022, 31: 2162−2174 doi: 10.1109/tip.2022.3152049
[29]	张重生, 陈杰, 李岐龙, 邓斌权, 王杰, 陈承功. 深度对比学习综述. 自动化学报, 2023, 49(1): 15−39 doi: 10.16383/j.aas.c220421 Zhang Chong-Sheng, Chen Jie, Li Qi-Long, Deng Bin-Quan, Wang Jie, Chen Cheng-Gong. Deep contrastive learning: A survey. Acta Automatica Sinica, 2023, 49(1): 15−39 doi: 10.16383/j.aas.c220421
[30]	蒲志强, 易建强, 刘振, 丘腾海, 孙金林, 李飞漠. 知识与数据协同驱动的群体智能决策方法研究综述. 自动化学报, 2022, 48(3): 627−643 doi: 10.16383/j.aas.c210118 Pu Zhi-Qiang, Yi Jian-Qiang, Liu Zhen, Qiu Teng-Hai, Sun Jin-Lin, Li Fei-Mo. Knowledge-based and data-driven integrating methodologies for collective intelligence decision making: A survey. Acta Automatica Sinica, 2022, 48(3): 627−643 doi: 10.16383/j.aas.c210118
[31]	李霞, 卢官明, 闫静杰, 张正言. 多模态维度情感预测综述. 自动化学报, 2018, 44(12): 2142−2159 Li Xia, Lu Guan-Ming, Yan Jing-Jie, Zhang Zheng-Yan. A survey of dimensional emotion prediction by multimodal cues. Acta Automatica Sinica, 2018, 44(12): 2142−2159
[32]	权学良, 曾志刚, 蒋建华, 张亚倩, 吕宝粮, 伍冬睿. 基于生理信号的情感计算研究综述. 自动化学报, 2021, 47(8): 1769−1784 doi: 10.16383/j.aas.c200783 Quan Xue-Liang, Zeng Zhi-Gang, Jiang Jian-Hua, Zhang Ya-Qian, Lv Bao-Liang, Wu Dong-Rui. Physiological signals based affective computing: A systematic review. Acta Automatica Sinica, 2021, 47(8): 1769−1784 doi: 10.16383/j.aas.c200783
[33]	Zhang B Q, Li K H, Cheng Z S, Hu Z Q, Yuan Y Q, Chen G Z, et al. VideoLLaMA 3: Frontier multimodal foundation models for image and video understanding. arXiv preprint arXiv: 2501.13106, 2025
[34]	John O P, Naumann L P, Soto C J. Paradigm shift to the integrative Big Five trait taxonomy. Handbook of Personality: Theory and Research. New York: Guilford Press, 2008. 114−158
[35]	Shen P, Wang D D, Xu Y Y, Zhang S Q, Zhao X M. PACMR: Progressive adaptive crossmodal reinforcement for multimodal apparent personality traits analysis. IEEE Signal Processing Letters, 2025, 32: 161−165 doi: 10.1109/LSP.2024.3505799
[36]	Lu S Y, Li Y, Chen Q G, Xu Z, Luo W H, Zhang K F, Ye H J. Ovis: Structural embedding alignment for multimodal large language model. arXiv preprint arXiv: 2405.20797, 2024
[37]	Chen Z, Wang W Y, Cao Y, Liu Y Z, Gao Z W, Cui E F, et al. Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling. arXiv preprint arXiv: 2412.05271, 2024
[38]	Bai S, Chen K Q, Liu X J, Wang J L, Ge W B, Song S B, et al. Qwen2.5-VL technical report. arXiv preprint arXiv: 2502.13923, 2025