An automatic semantic role labeler for the portuguese language

Falci, Daniel Henrique Mourão

dc.contributor.advisor	Parreiras, Fernando Silva
dc.contributor.author	Falci, Daniel Henrique Mourão
dc.date.accessioned	2019-06-04T21:27:38Z
dc.date.available	2019-06-04T21:27:38Z
dc.date.issued	2018
dc.identifier.uri	https://repositorio.fumec.br/xmlui/handle/123456789/178
dc.description.abstract	A anotação de papéis semânticas (APS) é uma tarefa do processamento de linguagem natural que fornece os meios para analisar, do ponto de vista semântico, as informações expressas através de texto ou fala. O objetivo é capturar e representar os participantes e as circunstâncias de eventos ou situações descritas no nível sentencial. É tida como um importante passo para a compreensão da linguagem natural. A maior parte da pesquisa existente sobre a APS é focada na língua inglesa e, portanto, considera suas particularidades sintáticas e semânticas. Este fato impede a transposição direta de seus resultados para outras línguas. Quanto à língua portuguesa, há um pequeno número de estudos dedicados a esta tarefa, e nenhum deles conseguiu um desempenho semelhante ao obtido na língua inglesa. Além disso, ao que sabemos, existe apenas um sistema publicamente disponível capaz de executar a APS automatizada em texto bruto, o que dificulta a pesquisa e detém o potencial inovador para a língua. O objetivo desta dissertação é avaliar o desempenho de um anotador de papéis semânticos automático para a língua portuguesa construído considerando técnicas abordadas na literatura. Para atingir este objetivo, o primeiro passo consistiu em uma revisão sistemática da literatura na tarefa de APS que visou identificar as técnicas mais precisas abordadas na literatura. Com base em seus resultados, desenvolvemos e avaliamos um anotador de papéis semânticos para a língua portuguesa. Nossa abordagem é independente de análise sintática e se apóia em uma arquitetura de rede neural recorrente, bidirecional e profunda. As predições da rede são usadas como a entrada de um algoritmo de análise neural recursiva global que foi adaptado para a tarefa de APS. Nosso método superou, de forma consistente, o sistema mais preciso para a língua portuguesa no Corpus do PropBank-Br por uma margem de 3.05 pontos de F1-score, reduzindo o erro relativo em 8.74%. O modelo apresentado nesta pesquisa está disponível publicamente sob licença BSD e pode ajudar estudos futuros focados na língua portuguesa em tarefas que normalmente dependem da análise de conteúdo, que vão desde a tradução automática até os sistemas de perguntas e respostas.	pt_BR
dc.description.abstract	Semantic Role Labeling (SRL) is Natural Language Processing task that provides the means to analyze, from the semantic point of view, the information expressed through text or speech. Its purpose is to capture and represent the participants and circumstances of events or situations described at the sentential level. It is considered a major step towards natural language understanding. Most of the existing SRL research is focused on the English language, and thus, considers its syntactic and semantic particularities. This fact prevents a direct transposition of its results to other languages. Regarding the Portuguese language, there is a small number of studies dedicated to the task, and none of them achieved a similar performance to that obtained in the English language. Moreover, to the best of our knowledge, there is only one publicly available system capable of performing automated SRL on raw text what hampers research and detain the innovative potential for the language. The objective of this thesis is to evaluate the performance of an automatic semantic role labeler for the Portuguese language built considering techniques addressed in the literature. To achieve this goal, the first step consisted in a systematic literature review on SRL task that intended to identify the most accurate techniques addressed in the literature. Based on its results, we developed and evaluated a semantic role labeler of raw text for the Portuguese language. Our approach is independent of syntactic parsing and relies on a deep bidirectional recurrent neural network architecture. The network predictions are used as the input of a global recursive neural parsing algorithm that was tailored for the SRL task. Our method consistently outperformed the previous state-of-the-art system for the Portuguese language on PropBank-Br corpus by a margin of 3.05 𝐹�1-score points, reducing the relative error in 8.74%. The model presented in this research is publicly available under BSD license and may help future studies focused on the Portuguese language in tasks that are typically dependent on content-analysis, ranging from Machine Translation to Question and Answering Systems.	pt_BR
dc.language.iso	en	pt_BR
dc.rights	Acesso aberto	pt_BR
dc.subject	Linguística - Processamento de dados	pt_BR
dc.subject	Redes neurais (Computação) - Análise	pt_BR
dc.subject	Algoritmos - Análise	pt_BR
dc.title	An automatic semantic role labeler for the portuguese language	pt_BR
dc.type	Dissertation	pt_BR
dc.publisher.program	Mestrado em Sistemas de Informação e Gestão do Conhecimento	pt_BR
dc.publisher.initials	FUMEC	pt_BR
dc.publisher.departament	Faculdade de Ciências Empresariais	pt_BR

Arquivos deste item

Nome:: daniel_falci_mes_sigc_2018.pdf
Tamanho:: 1.149Mb
Formato:: PDF
Descrição:: Mestrado em Sistemas de Informação ...

Visualizar/Abrir

Este item aparece na(s) seguinte(s) coleção(s)

Dissertações

Mostrar registro simples