PUBLICLY AVAILABLE DATASETS AND METRICS TO ADVANCE RESEARCH ON QUESTION-ANSWERING SYSTEMS

Authors

DOI:

https://doi.org/10.32851/tnv-tech.2023.1.2

Keywords:

question-answering systems, text analytics, natural language processing, training datasets, evaluation metrics.

Abstract

Question-Answering systems are information systems designed to answer questions in natural languages. In recent years, the attention of researchers to the development of QA systems has increased and with it, the number of publicly available test datasets researches to facilitate research in this area of natural language text processing. Research on open datasets is important because it enables the development of better systems that can accurately answer a wide range of questions. In this paper, we have reviewed publicly available, large, original, and widely used training datasets that used in research on QA systems, and provided metrics that used to compare models of these systems. The research was conducted using a systemic approach, methods of abstraction, systemic analysis, comparison and synthesis. As a result of the work, an actual scientific task was solved, which consists in determining the current state of development of the methodological base for the development of information systems capable of answering user questions on the basis of information represented by unstructured textual data collections, reviewed existing publicly available datasets and evaluation metrics of QA system models, taking into account recent publications in this field. The practical significance of the research lies in the possibility of applying scientific provisions and conclusions for the development and implementation of response systems; in the process of teaching natural language processing disciplines in higher educational institutions; when writing manuals for natural language processing; during applied research of search engines and question-answering systems.

References

Recent trends in deep learning based open-domain textual question answering systems / Z. Huang et al. IEEE access. 2020. Vol. 8. P. 94341–94356. URL: https://doi.org/10.1109/access.2020.2988903 (date of access: 03.03.2023).

Dimitrakis E., Sgontzos K., Tzitzikas Y. A survey on question answering systems over linked data and documents. Journal of intelligent information systems. 2019. Vol. 55, no. 2. P. 233–259. URL: https://doi.org/10.1007/s10844-019-00584-7 (date of access: 03.03.2023).

A review of public datasets in question answering research / B. B. Cambazoglu et al. ACM SIGIR Forum. 2020. Vol. 54, no. 2. P. 1–23. URL: https://doi.org/10.1145/3483382.3483389 (date of access: 03.03.2023).

Wang Y. B. More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering. Computation and Language. 2021.

Wang Z. Modern Question Answering Datasets and Benchmarks: A Survey. CoRR. 2022.

Bleu / K. Papineni et al. The 40th annual meeting, Philadelphia, Pennsylvania, 7–12 July 2002. Morristown, NJ, USA, 2001. URL: https://doi.org/10.3115/1073083.1073135 (date of access: 03.03.2023).

Lin C. Y. ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004.

Banerjee S. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments / S. Banerjee, A. Lavie. // workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 2005. С. 65–72.

QuAC: question answering in context / E. Choi et al. Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium. Stroudsburg, PA, USA, 2018. URL: https://doi.org/10.18653/v1/d18-1241 (date of access: 03.03.2023).

RACE: large-scale reading comprehension dataset from examinations / G. Lai et al. Proceedings of the 2017 conference on empirical methods in natural language processing, Copenhagen, Denmark. Stroudsburg, PA, USA, 2017. URL: https://doi.org/10.18653/v1/d17-1082 (date of access: 03.03.2023).

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge / P. Clark et al. ArXiv, 3 March 2018.

Beat the AI: investigating adversarial human annotation for reading comprehension / M. Bartolo et al. Transactions of the association for computational linguistics. 2020. Vol. 8. P. 662–678. URL: https://doi.org/10.1162/tacl_a_00338 (date of access: 03.03.2023).

The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations. / F. Hill et al. ICLR.

Teaching Machines to Read and Comprehend Tibetan Text / Y. Sun et al. Journal of Computer and Communications. 2021. Vol. 09, no. 09. P. 143–152. URL: https://doi.org/10.4236/jcc.2021.99011 (date of access: 03.03.2023).

ComQA: question answering over knowledge base via semantic matching / H. Jin et al. IEEE access. 2019. Vol. 7. P. 75235–75246. URL: https://doi.org/10.1109/ access.2019.2918675 (date of access: 03.03.2023).

Reddy S., Chen D., Manning C. D. CoQA: a conversational question answering challenge. Transactions of the association for computational linguistics. 2019. Vol. 7. P. 249–266. URL: https://doi.org/10.1162/tacl_a_00266 (date of access: 03.03.2023).

Cosmos QA: machine reading comprehension with contextual commonsense reasoning / L. Huang et al. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Hong Kong, China. Stroudsburg,PA, USA, 2019. URL: https://doi.org/10.18653/v1/d19-1243 (date of access: 03.03.2023).

Dua et al. Proceedings of the 2019 conference of the north, Minneapolis, Minnesota. Stroudsburg, PA, USA, 2019. URL: https://doi.org/10.18653/v1/n19-1246 (date of access: 03.03.2023).

DuoRC: towards complex language understanding with paraphrased reading comprehension / A. Saha et al. Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers), Melbourne, Australia. Stroudsburg, PA, USA, 2018. URL: https://doi.org/10.18653/v1/p18-1156 (date of access: 03.03.2023).

ELI5: long form question answering / A. Fan et al. Proceedings of the 57th annual meeting of the association for computational linguistics, Florence, Italy. Stroudsburg, PA, USA, 2019. URL: https://doi.org/10.18653/v1/p19-1346 (date of access: 03.03.2023).

HotpotQA: a dataset for diverse, explainable multi-hop question answering / Z. Yang et al. Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium. Stroudsburg, PA, USA, 2018. URL: https://doi.org/10.18653/v1/d18-1259 (date of access: 03.03.2023).

MS MARCO: benchmarking ranking models in the large-data regime / N. Craswell et al. SIGIR '21: the 44th international ACM SIGIR conference on research and development in information retrieval, Virtual Event Canada. New York, NY, USA, 2021. URL: https://doi.org/10.1145/3404835.3462804 (date of access: 03.03.2023).

Looking beyond the surface: a challenge set for reading comprehension over multiple sentences / D. Khashabi et al. Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long papers), New Orleans, Louisiana. Stroudsburg, PA, USA, 2018. URL: https://doi.org/10.18653/v1/n18-1023 (date of access: 03.03.2023).

The narrativeqa reading comprehension challenge / T. Kočiský et al. Transactions of the association for computational linguistics. 2018. Vol. 6. P. 317–328. URL: https://doi.org/10.1162/tacl_a_00023 (date of access: 03.03.2023).

Natural questions: a benchmark for question answering research / T. Kwiatkowski et al. Transactions of the association for computational linguistics. 2019. Vol. 7. P. 453–466. URL: https://doi.org/10.1162/tacl_a_00276 (date of access: 03.03.2023).

NewsQA: a machine comprehension dataset / A. Trischler et al. Proceedings of the 2nd workshop on representation learning for NLP, Vancouver, Canada. Stroudsburg, PA, USA, 2017. URL: https://doi.org/10.18653/v1/w17-2623 (date of access: 03.03.2023).

Jiang Y. W., Dong Z., Feng J. ReClor: a reading comprehension dataset requiring. In international conference on learning representations.

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine / M. Dunn et al. CoRR. 2017.

SQuAD: 100,000+ questions for machine comprehension of text / P. Rajpurkar et al. Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, Texas. Stroudsburg, PA, USA, 2016. URL: https://doi.org/10.18653/ v1/d16-1264 (date of access: 03.03.2023).

LSDSem 2017 shared task: the story cloze test / N. Mostafazadeh et al. Proceedings of the 2nd workshop on linking models of lexical, sentential and discourselevel semantics, Valencia, Spain. Stroudsburg, PA, USA, 2017. URL: https://doi. org/10.18653/v1/w17-0906 (date of access: 03.03.2023).

TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension / M. Joshi et al. Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), Vancouver, Canada. Stroudsburg, PA, USA, 2017. URL: https://doi.org/10.18653/v1/p17-1147 (date of access: 03.03.2023).

TWEETQA: a social media focused question answering dataset / W. Xiong et al. Proceedings of the 57th annual meeting of the association for computational linguistics, Florence, Italy. Stroudsburg, PA, USA, 2019. URL: https://doi.org/10.18653/v1/ p19-1496 (date of access: 03.03.2023).

Published

2023-04-07

How to Cite

Вишняк, М. Ю., & Пироженко, М. Ю. (2023). PUBLICLY AVAILABLE DATASETS AND METRICS TO ADVANCE RESEARCH ON QUESTION-ANSWERING SYSTEMS. Таuridа Scientific Herald. Series: Technical Sciences, (1), 13-24. https://doi.org/10.32851/tnv-tech.2023.1.2

Issue

Section

COMPUTER SCIENCE AND INFORMATION TECHNOLOGY