Repositório Digital de Publicações Científicas: Domain Adaptation in Transformer Models: Question Answering of Dutch Government Policies


Sign on to:
	Login
	My DSpace authorized users
	Edit Profile
	Receive email updates

Browse
	Communities & Collections
	Issue Date
	Author
	Title
	Subject

Helps
	Regulamento RDPC
	Depósito RDPC
	Faq's RDPC

	Integração CV DeGóis
	Workshop Open Access

	Newsletter Open Access


	About Dspace
	DSpace Software

Repositorio Digital de Publicacoes Cientificas da Universidade de Evora

/ Departamento de Informática / INF - Artigos em Livros de Actas/Proceedings /

Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/40225

Title:	Domain Adaptation in Transformer Models: Question Answering of Dutch Government Policies
Authors:	Blom, Berry Pereira, L. M. Pereira
Editors:	Quaresma, Paulo Camacho, David Yin, Hujun Gonçalves, Teresa Julian, Vicente Tallón-Ballesteros, Antonio J.
Keywords:	Natural Language Processing Question answering Transformers Domain adaptation Dutch
Issue Date:	15-Nov-2023
Publisher:	Springer, Cham
Abstract:	Automatic answering questions helps users in finding information efficiently, in contrast with web search engines that require keywords to be provided and large texts to be processed. The first Dutch Question Answering (QA) system uses basic natural language processing techniques based on text similarity between the question and the answer. After the introduction of pre-trained transformer-based models like BERT, higher scores were achieved with over 7.7% improvement for the General Language Understanding Evaluation (GLUE) score. Pre-trained transformer-based models tend to over-generalize when applied to a specific domain, leading to less precise context-specific outputs. There is a marked research gap in experiment strategies to adapt these models effectively for domain-specific applications. Additionally, there is a lack of Dutch resources for automatic question answering, as the only existing dataset, Dutch SQuAD, is a translation of the SQuAD dataset in English. We propose a new dataset, PolicyQA, containing questions and answers about Dutch government policies and use domain adaptation techniques to address the generalizability problem of transformer-based models. The experimental setup includes the Long Short-Term memory (LSTM), a baseline neural network, and three BERT-based models, mBert, RobBERT, and BERTje, with domain adaptation. The datasets used for testing are the proposed PolicyQA dataset and the existing Dutch SQuAD. From the results, we found that the multilanguage BERT-model, mBert, outperforms the Dutch BERT-based models (RobBERT and BERTje) on the both datasets. By introducing fine-tuning, a domain adaptation technique, the mBert model improved to 94.10% of F1-score, a gain of 226% compared to its performance without fine-tuning.
URI:	https://link.springer.com/chapter/10.1007/978-3-031-48232-8_19 http://hdl.handle.net/10174/40225
ISBN:	978-3-031-48232-8
Type:	article
Appears in Collections:	INF - Artigos em Livros de Actas/Proceedings

Files in This Item:

File	Description	Size	Format
Thesis_Transformers_LNCS.pdf		223.5 kB	Adobe PDF	View/Open

Serviços de Ciência e Cooperação - Universidade de Évora