NAACL SemEval 2022 [Paper]
This research was presents a solution to SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding. The task focuses on determining whether a Multi-Word Expression (MWE) within a sentence is used idiomatically or not in a multilingual setting. Identifying idiomatic expressions in sentences using Large Language Models (LLMs) remains a challenging problem. This study explores how to effectively perform sentence embedding to address this task.
The expression “wet blanket,” when interpreted through the literal meanings of its individual words, refers to “a soaked cover.” However, in the context of a sentence, it is often used idiomatically to mean “a person who spoils the mood.” In other words, if the meaning derived from the sentence context differs from the meaning based solely on the combination of the individual words, it can be identified as an idiomatic expression. Building on this idea, when a Multi-Word Expression (MWE) is given, wouldn’t it be possible to effectively capture idiomaticity by generating semantic embeddings that combine the contextual embeddings of each word with their static embeddings (representing the literal combination of word meanings)?
We propose a framework for embedding MWEs and their related sentences that utilizes both contextualized and static representations to maximize semantic information. For more details, please refer to the paper.
You can download the data from here.
python main.py