Skip to content

Update McpToolUtils.java #4192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

shishuiwuhen2009
Copy link
Contributor

fix McpToolUtils.prefixedToolName formatted in order to support Chinese characters in toolName

fix McpToolUtils.prefixedToolName formatted in order to support Chinese characters in toolName

Signed-off-by: shishuiwuhen2009 <[email protected]>
fix McpToolUtils.prefixedToolName formatted in order to support Chinese characters in toolName.

Signed-off-by: shishuiwuhen2009 <[email protected]>
spring-builds pushed a commit that referenced this pull request Aug 21, 2025
Replace hard-coded Unicode range with comprehensive Unicode property approach
to fix incomplete Han character coverage in MCP tool name formatting.

Changes:
- Replace \u4e00-\u9fa5 range with union of Unicode Script and Block properties
- Use \p{IsHan} + \p{InCJK_Unified_Ideographs} + \p{InCJK_Compatibility_Ideographs}
- Fix boundary case where \u9fff was incorrectly excluded by script-only approach
- Add comprehensive test coverage for all Han character blocks and edge cases

Technical details:
- Addresses Unicode Script vs Block classification differences across JDK versions
- \u9fff (鿿) is in CJK Unified Ideographs block but not Han script in some JDKs
- Union approach ensures complete coverage while maintaining exclusion of other scripts
- Future-proof solution that automatically includes new Han characters in Unicode updates

Test coverage added:
- CJK Unified Ideographs boundary cases (\u4e00, \u9fff)
- CJK Extension A characters (\u3400)
- CJK Compatibility Ideographs (\uf900)
- Mixed character block scenarios
- Proper exclusion verification for non-Han scripts (Hiragana, Emoji, etc.)

Fixes incomplete Chinese character support while maintaining backward compatibility
and minimal risk profile of the original change.

Signed-off-by: shishuiwuhen2009
Signed-off-by: Mark Pollack <[email protected]>

Fixes #4192

(cherry picked from commit 35486e9)
@markpollack
Copy link
Member

Thanks!!! had some help from ai! let me know what you think of the final changes @shishuiwuhen2009 . it is also backported.

@markpollack markpollack self-assigned this Aug 21, 2025
@markpollack markpollack added this to the 1.1.0.M1 milestone Aug 21, 2025
@shishuiwuhen2009
Copy link
Contributor Author

Thank you so much for your detailed modification suggestions! This Unicode property-based approach for Chinese character support is much more standardized and comprehensive than my original hard-coded range method. I’ve learned a lot about robust Unicode handling—especially the nuances of Script vs. Block classification across JDK versions and how to ensure full Han character coverage. It’s really helpful for improving the quality of my code. Great work on pointing out these key details! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants