Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/40225

Title: Domain Adaptation in Transformer Models: Question Answering of Dutch Government Policies
Authors: Blom, Berry
Pereira, L. M. Pereira
Editors: Quaresma, Paulo
Camacho, David
Yin, Hujun
Gonçalves, Teresa
Julian, Vicente
Tallón-Ballesteros, Antonio J.
Keywords: Natural Language Processing
Question answering
Transformers
Domain adaptation
Dutch
Issue Date: 15-Nov-2023
Publisher: Springer, Cham
Abstract: Automatic answering questions helps users in finding information efficiently, in contrast with web search engines that require keywords to be provided and large texts to be processed. The first Dutch Question Answering (QA) system uses basic natural language processing techniques based on text similarity between the question and the answer. After the introduction of pre-trained transformer-based models like BERT, higher scores were achieved with over 7.7% improvement for the General Language Understanding Evaluation (GLUE) score. Pre-trained transformer-based models tend to over-generalize when applied to a specific domain, leading to less precise context-specific outputs. There is a marked research gap in experiment strategies to adapt these models effectively for domain-specific applications. Additionally, there is a lack of Dutch resources for automatic question answering, as the only existing dataset, Dutch SQuAD, is a translation of the SQuAD dataset in English. We propose a new dataset, PolicyQA, containing questions and answers about Dutch government policies and use domain adaptation techniques to address the generalizability problem of transformer-based models. The experimental setup includes the Long Short-Term memory (LSTM), a baseline neural network, and three BERT-based models, mBert, RobBERT, and BERTje, with domain adaptation. The datasets used for testing are the proposed PolicyQA dataset and the existing Dutch SQuAD. From the results, we found that the multilanguage BERT-model, mBert, outperforms the Dutch BERT-based models (RobBERT and BERTje) on the both datasets. By introducing fine-tuning, a domain adaptation technique, the mBert model improved to 94.10% of F1-score, a gain of 226% compared to its performance without fine-tuning.
URI: https://link.springer.com/chapter/10.1007/978-3-031-48232-8_19
http://hdl.handle.net/10174/40225
ISBN: 978-3-031-48232-8
Type: article
Appears in Collections:INF - Artigos em Livros de Actas/Proceedings

Files in This Item:

File Description SizeFormat
Thesis_Transformers_LNCS.pdf223.5 kBAdobe PDFView/Open
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois