Methodological Bibliometrics Considerations for Creating and Maintaining Thesauri:
Bibliometrics is a field of study that deals with the analysis of bibliographic information. The term “bibliometrics” was first introduced by Alan Pritchard in 1969 to describe the application of statistical and mathematical methods to the analysis of scientific literature (Pritchard, 1969). Bibliometrics has since become a valuable tool for researchers in various disciplines, including library and information science, scientometrics, and information retrieval. One of the applications of bibliometrics is the creation and maintenance of thesauri. A thesaurus is a controlled vocabulary that is used to index and retrieve documents in a specific domain. Thesauri are widely used in information retrieval systems such as libraries, archives, and digital repositories. A thesaurus can be created manually or automatically, but in either case, the use of bibliometric methods can aid in the process. In this article, we will discuss the methodological bibliometrics considerations for creating and maintaining thesauri.
- Identify the Domain and Scope of the Thesaurus: The first step in creating a thesaurus is to identify the domain and scope of the thesaurus. The domain refers to the subject area or discipline that the thesaurus will cover, while the scope refers to the extent of the coverage within the domain. The domain and scope of the thesaurus will determine the types of terms and relationships that will be included in the thesaurus. It is important to ensure that the domain and scope are well-defined and consistent with the intended use of the thesaurus.
Bibliometric methods can be used to identify the domain and scope of the thesaurus. One such method is bibliometric mapping, which involves the analysis of the co-occurrence of terms in a corpus of documents. Bibliometric mapping can be used to identify the key topics and concepts within a domain and to identify relationships between these topics and concepts (Van Eck & Waltman, 2017). Another method is bibliometric clustering, which involves grouping documents based on their similarity in terms of the words and phrases they contain. Bibliometric clustering can be used to identify sub-domains within a larger domain (Ding et al., 2009).
- Identify Relevant Literature: The second step in creating a thesaurus is to identify relevant literature within the domain. The literature review should include both published and unpublished sources, such as conference proceedings and technical reports. The literature review should be comprehensive and up-to-date and should include both seminal works and recent developments in the field.
Bibliometric methods can be used to identify relevant literature within the domain. One such method is citation analysis, which involves the analysis of the references cited in a corpus of documents. Citation analysis can be used to identify the most influential works within a domain and to identify patterns of citation behavior (Garfield, 1979). Another method is co-citation analysis, which involves the analysis of the co-citation of works in a corpus of documents. Co-citation analysis can be used to identify relationships between works within a domain and to identify clusters of related works (Small, 1973).
- Extract Terms and Relationships: The third step in creating a thesaurus is to extract terms and relationships from the relevant literature. Terms are the key concepts within the domain, while relationships describe the connections between these concepts. The extraction of terms and relationships can be done manually or automatically.
Bibliometric methods can be used to extract terms and relationships from the relevant literature. One such method is term extraction, which involves the identification of key terms within a corpus of documents. Term extraction can be done using statistical or linguistic methods, such as frequency analysis and part-of-speech tagging (Frantzi et al., 1998). Another method is network analysis, which involves the identification of relationships between terms based on their co-occurrence in the literature. Network analysis can be used to identify semantic relationships such as synonymy, antonymy, and hierarchical relationships (Liu et al., 2010).
- Classify and Structure the Thesaurus: The fourth step in creating a thesaurus is to classify and structure the terms and relationships into a hierarchical or non-hierarchical structure. A hierarchical structure involves the arrangement of terms in a tree-like structure, with broader terms at the top and narrower terms at the bottom. A non-hierarchical structure involves the arrangement of terms in a network-like structure, with no clear hierarchy.
Bibliometric methods can be used to classify and structure the thesaurus. One such method is clustering analysis, which involves grouping terms based on their similarity in terms of their co-occurrence in the literature. Clustering analysis can be used to identify groups of related terms that can be used to form categories within the thesaurus (Boyack et al., 2005). Another method is multidimensional scaling, which involves the visualization of the relationships between terms in a multidimensional space. Multidimensional scaling can be used to identify the semantic relationships between terms and to identify clusters of related terms (Borg & Groenen, 2005).
- Evaluate and Maintain the Thesaurus: The final step in creating a thesaurus is to evaluate and maintain the thesaurus. Evaluation involves the assessment of the effectiveness and usefulness of the thesaurus in terms of its coverage, structure, and retrieval performance. Maintenance involves the updating of the thesaurus to reflect changes in the domain and to improve its effectiveness and usefulness over time.
Bibliometric methods can be used to evaluate and maintain the thesaurus. One such method is user testing, which involves the assessment of the usability and effectiveness of the thesaurus by end-users. User testing can be used to identify areas for improvement in the thesaurus and to ensure that the thesaurus meets the needs of its intended users (Hjørland, 2017). Another method is bibliometric analysis, which involves the assessment of the impact and use of the thesaurus over time. Bibliometric analysis can be used to identify trends and patterns in the use of the thesaurus and to identify areas for improvement in its coverage and structure (Zhou et al., 2017).
In conclusion, bibliometric methods can be valuable tools in the creation and maintenance of thesauri. These methods can be used to identify the domain and scope of the thesaurus, to identify relevant literature, to extract terms and relationships, to classify and structure the thesaurus, and to evaluate and maintain the thesaurus. The use of bibliometric methods can improve the effectiveness and usefulness of thesauri and can ensure that they meet the needs of their intended users over time.
References:
- Boyack, K. W., Klavans, R., & Börner, K. (2005). Mapping the backbone of science. Scientometrics, 64(3), 351-374.
- Borg, I., & Groenen, P. (2005). Modern multidimensional scaling: theory and applications. Springer.
- Ding, Y., Chowdhury, G. G., & Foo, S. (2009). Bibliometric cartography of information retrieval research by using co-word analysis. Information Processing & Management, 45(1), 38-55.
- Frantzi, K. T., Ananiadou, S., & Tsujii, J. (1998). The c-value/nc-value method of automatic recognition for multi-word terms. In Proceedings of the 17th international conference on computational linguistics-Volume 1 (pp. 1036-1042). Association for Computational Linguistics.
- Garfield, E. (1979). Citation Indexing: Its theory and application in science, technology, and humanities. Wiley.
- Hjørland, B. (2017). Thesauri and other information retrieval tools. In Encyclopedia of library and information sciences (pp. 1-13). Taylor & Francis.
- Liu, X., Bollen, J., Nelson, M. L., & Van de Sompel, H. (2010). Co-authorship networks in the digital library research community. Information Processing & Management, 46(1), 43-50.
- Pritchard, A. (1969). Statistical bibliography or bibliometrics. Journal of documentation, 25(4), 348-349.
- Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265-269.
- Zhou, W., Shen, Z., Shen, B., Wang, Z., & Wu, Y. (2017). Mapping knowledge domains of library and information science: An author co-citation analysis, 2010-2014. Library & Information Science Research, 39(4), 294-302.
- Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics, 111(2), 1053-1070.
Former Student at Rajshahi University