Skip to content

NAACL SemEval 2022: Multilingual Idiomaticity Detection and Sentence Embedding

Notifications You must be signed in to change notification settings

ojoo-J/Multilingual-Idiomaticity-Detection

Repository files navigation

Effective Idiomaticity Detection with Consideration at Different Levels of Contextualization

NAACL SemEval 2022 [Paper]

Summary

This research was presents a solution to SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding. The task focuses on determining whether a Multi-Word Expression (MWE) within a sentence is used idiomatically or not in a multilingual setting. Identifying idiomatic expressions in sentences using Large Language Models (LLMs) remains a challenging problem. This study explores how to effectively perform sentence embedding to address this task.

Core Idea

image

The expression “wet blanket,” when interpreted through the literal meanings of its individual words, refers to “a soaked cover.” However, in the context of a sentence, it is often used idiomatically to mean “a person who spoils the mood.” In other words, if the meaning derived from the sentence context differs from the meaning based solely on the combination of the individual words, it can be identified as an idiomatic expression. Building on this idea, when a Multi-Word Expression (MWE) is given, wouldn’t it be possible to effectively capture idiomaticity by generating semantic embeddings that combine the contextual embeddings of each word with their static embeddings (representing the literal combination of word meanings)?

Methods

image

We propose a framework for embedding MWEs and their related sentences that utilizes both contextualized and static representations to maximize semantic information. For more details, please refer to the paper.

Results

image

run

You can download the data from here.

python main.py

About

NAACL SemEval 2022: Multilingual Idiomaticity Detection and Sentence Embedding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages