|
|
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/40225
|
| Title: | Domain Adaptation in Transformer Models: Question Answering of Dutch Government Policies |
| Authors: | Blom, Berry Pereira, L. M. Pereira |
| Editors: | Quaresma, Paulo Camacho, David Yin, Hujun Gonçalves, Teresa Julian, Vicente Tallón-Ballesteros, Antonio J. |
| Keywords: | Natural Language Processing Question answering Transformers Domain adaptation Dutch |
| Issue Date: | 15-Nov-2023 |
| Publisher: | Springer, Cham |
| Abstract: | Automatic answering questions helps users in finding information efficiently, in contrast with web search engines that require keywords to be provided and large texts to be processed. The first Dutch Question Answering (QA) system uses basic natural language processing techniques based on text similarity between the question and the answer. After the introduction of pre-trained transformer-based models like BERT, higher scores were achieved with over 7.7% improvement for the General Language Understanding Evaluation (GLUE) score.
Pre-trained transformer-based models tend to over-generalize when applied to a specific domain, leading to less precise context-specific outputs. There is a marked research gap in experiment strategies to adapt these models effectively for domain-specific applications. Additionally, there is a lack of Dutch resources for automatic question answering, as the only existing dataset, Dutch SQuAD, is a translation of the SQuAD dataset in English.
We propose a new dataset, PolicyQA, containing questions and answers about Dutch government policies and use domain adaptation techniques to address the generalizability problem of transformer-based models.
The experimental setup includes the Long Short-Term memory (LSTM), a baseline neural network, and three BERT-based models, mBert, RobBERT, and BERTje, with domain adaptation. The datasets used for testing are the proposed PolicyQA dataset and the existing Dutch SQuAD.
From the results, we found that the multilanguage BERT-model, mBert, outperforms the Dutch BERT-based models (RobBERT and BERTje) on the both datasets. By introducing fine-tuning, a domain adaptation technique, the mBert model improved to 94.10% of F1-score, a gain of 226% compared to its performance without fine-tuning. |
| URI: | https://link.springer.com/chapter/10.1007/978-3-031-48232-8_19 http://hdl.handle.net/10174/40225 |
| ISBN: | 978-3-031-48232-8 |
| Type: | article |
| Appears in Collections: | INF - Artigos em Livros de Actas/Proceedings
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|