Web of Science分类合理吗？——基于梯度显著度文本特征提取的分类预测方法研究

张涛; 翟梦婷; 马海群; 姜磊; 李峥

Web of Science分类合理吗？——基于梯度显著度文本特征提取的分类预测方法研究

Does Web of Science categorize reasonably?——Research on Classification Prediction Method Based on Gradient Saliency Text Feature Extraction

作者： 张涛 ¹ 翟梦婷 ¹ 马海群 ¹ 姜磊 ¹ 李峥 ¹
作者单位：

1. 黑龙江大学信息管理学院
通讯作者： 张涛
提交时间：2024-09-09

摘要: Web of Science是获取学术信息的重要数据库之一，拥有复杂的学科分类体系，该数据库的合理性和准确性对于学术资源的检索、促进学科内部的研究具有重要意义。本研究选取了Web of Science数据库中“多学科类别”的数据集，从极大似然理论出发进行推导，结合大模型梯度显著度的可解释理论，挖掘文本的分布特征并且量化类别特征并衡量类别相似度，由此提出了一种文本提取和分类预测方法。本文使用该方法不仅重新对Web of Science数据库中单分类标签进行预测，通过提高文本分类标注的准确率而改善了质量，而且实验证明了该方法也可对多分类有效预测，进而对文献分类提供决策依据。研究发现：通过本文所提出的方法对类别特征量化和类别相似度的计算，找出了预测标签经常在某几个特定类别集合中反复出现的原因。该方法不仅可以有效指导文献分类，也可以衡量数据库类别划分的合理性，还能通过分析期刊收录的论文，判断期刊所发表的论文与期刊实际类别相符的程度。

Abstract: Web of Science is one of the most important databases for obtaining academic information, which has a complex classification system, the rationality and accuracy of this database is of great significance for the retrieval of academic resources and the promotion of research within disciplines. In this research, a dataset was meticulously chosen from the "multi-disciplinary category" within the Web of Science database. Utilizing the foundational principles of Maximum Likelihood Theory, and integrating it with the interpretable theory of large model gradient significance, the study delved into the exploration of textual distribution characteristics. Furthermore, it succeeded in quantifying category traits and gauging inter-category similarity. Consequently, the study put forth a novel method for text extraction and classification prediction, enriching the academic discourse with its sophisticated approach and rigorous methodology. In this paper, we use this method not only to re-predict single-category labels in the Web of Science database, which improves the quality by increasing the accuracy of text categorization and annotation, but also experimentally proves that this method can also effectively predict multi-categories, which provides a basis for decision-making on document categorization. It is found that the reasons why the predicted labels often recur in a few specific sets of categories are identified through the computation of category feature quantization and category similarity by the method proposed in this paper. The method can not only effectively guide the classification of literature, but also measure the reasonableness of the classification of database categories, as well as determine the extent to which the papers published by journals match the actual categories of the journals by analyzing the papers included in the journals.

梯度显著度类别特征量化 Web of Science 文本特征提取分类预测

来自： zhangtao2668
分类： 信息资源管理 >> 情报学
稿件状态： 已投稿会议

会议名称：

数智赋能学术期刊创新发展 ——第四届信息资源管理期刊发展论坛征文（2024年7月 —2024年10月）

引用： PSSXiv:202409.00539 (或此版本 PSSXiv:202409.00539V1)
DOI:10.12451/202409.00539
CSTR:32012.36.PSSXiv.202409.00539
推荐引用方式： 张涛,翟梦婷,马海群,姜磊,李峥.Web of Science分类合理吗？——基于梯度显著度文本特征提取的分类预测方法研究.哲学社会科学预印本平台:https://zsyyb.cn/abs/202409.00539.[PSSXiv:202409.00539V1] (点此复制)

版本历史

[V1]

2024-09-09 22:40:09

PSSXiv:202409.00539V1

下载全文

1. 数智时代背景下高校图书馆赋能新质生产力的作用机理、主要应用与实现路径	2024-09-19
2. 科普类微博信息质量评价指标体系构建与实证研究	2024-09-19
3. 新质生产力背景下信息资源管理专业学生数智素养评价指标体系构建研究	2024-09-18
4. 建设文献情报领域的数据标注基地	2024-09-18
5. 基于LDA模型的国文本的主题热点及策略分析	2024-09-18

Web of Science分类合理吗？——基于梯度显著度文本特征提取的分类预测方法研究

版本历史

相关论文推荐

笔记记录


实名公开评论匿名评论仅发送给作者

Web of Science分类合理吗？——基于梯度显著度文本特征提取的分类预测方法研究

版本历史

相关论文推荐

填写意向审稿专家信息

提示：如有意向专家和回避专家请填写；如没有可直接跳过此步骤。

填写回避审稿专家信息

笔记记录