Skip to content

Conversation

twosom
Copy link
Contributor

@twosom twosom commented Jul 20, 2025

Description

Summary

Adds metadata support to Nori Korean analyzer, allowing users to attach additional information to dictionary words.

Changes

  • Added MetadataAttribute interface and implementation
  • Extended user dictionary format to support word >> metadata syntax
  • Preserves metadata during compound word decomposition
  • Maintains backward compatibility with existing dictionaries

Example

Dictionary:

자바 >> computer language
엘라스틱서치 엘라스틱 서치 >> search engine

Result:

  • 자바 → Term: "자바", Metadata: "computer language"
  • 엘라스틱서치 → All decomposed terms ("엘라스틱서치", "엘라스틱", "서치") carry "search engine" metadata

Fixes #14940

Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions github-actions bot added this to the 11.0.0 milestone Jul 20, 2025
@twosom twosom force-pushed the add_nori_metadata branch from 7be08b0 to 81ce2c8 Compare July 20, 2025 06:46
Copy link
Contributor

github-actions bot commented Aug 4, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Nori] Add metadata support for Korean analyzer tokens
1 participant