针对区级人大报告特定的几方面内容进行文本分类,可以让人大工作人员对不同工作内容进行快速分辨,是构建人大报告辅助生成系统的必要内容。为对不同内容分类,基于TF-IDF(词频-逆文档频率)与知识增强语义表示模型ERNIE(enhanced representation from knowledge integration)结合构建分类模型。ERNIE直接对语义知识单元进行建模,在此基础上加入TF-IDF提升模型性能。实验结果表明,该方法在分类的准确率和召回率上表现不错,使ERNIE模型收敛速度加快,通过该模型可以较好地对人大报告的文本进行分类。
目的:提出一种基于多特征融合的中医药问题生成模型(MFFQG),以改善现有的自动生成技术在处理特定领域时存在的领域关键词信息缺失和生成问题表达不规范问题。方法:利用RoBERTa向量和五笔向量捕捉输入序列的语义特征和字形特征,同时融合句法信息和所构建的中医药领域主副关键词信息,将得到的多特征向量信息送入UniLM生成模型得到生成结果,实现对中医药领域问题的自动生成。结果:MFFQG模型融合多种特征,在Rouge-1、Rouge-2、Rouge-L评价指标上分别达到64.93%、34.57%、63.05%。局限:数据主要来源于中医药领域,在其他领域中的效果有待验证。结论:MFFQG模型相较于对比模型,可以显著提升中医药问题的生成质量。Objective: To propose a traditional Chinese medicine problem generation model (MFFQG) based on multi feature fusion, in order to improve the problems of missing domain keyword information and non-standard expression of generation problems in existing automatic generation technologies when dealing with specific fields. Method: Using RoBERTa vectors and Wubi vectors to capture the semantic and glyph features of the input sequence, while integrating syntactic information and the constructed main and auxiliary keyword information in the field of traditional Chinese medicine, the obtained multi feature vector information is fed into the UniLM generation model to obtain the generated results, achieving automatic generation of problems in the field of traditional Chinese medicine. Result: The MFFQG model integrates multiple features and achieves 64.93%, 34.57%, and 63.05% in Rouge-1, Rouge-2, and Rouge-L evaluation indicators, respectively. Limitation: The data mainly comes from the field of traditional Chinese medicine, and its effectiveness in other fields needs to be verified. Conclusion: Compared to the comparative model, the MFFQG model can significantly improve the quality of generating traditional Chinese medicine problems.