-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Improve KeywordList parsing to support escaped characters and nested structures #12888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve KeywordList parsing to support escaped characters and nested structures #12888
Conversation
can someone help me on this. is this a valid comment from Bot ? I have never used DisplayName annotation and I changed method name to be more comprehensive. Am I missing anything here ? |
Seems like a false positive from the bot @koppor |
@krishnagjsForGit please change the PR title to contain text summarizing the fix. |
Done. is this fine @koppor ? |
keywordList.add(chainRoot); | ||
List<String> tokens = splitRespectingEscapes(keywordString, delimiter); | ||
|
||
for (String token : tokens) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there's an opportunity to eliminate this loop. In the _splitRespectingEscapes_
method, all tokens are already available and accessible. It feels inefficient to first collect all tokens in a List
and then iterate over it.
Sketch:
public static KeywordList parse(String keywordString, Character delimiter, Character hierarchicalDelimiter) {
if (StringUtil.isBlank(keywordString)) {
return new KeywordList();
}
Objects.requireNonNull(delimiter);
Objects.requireNonNull(hierarchicalDelimiter);
KeywordList keywordList = new KeywordList();
List<String> hierarchy = new ArrayList<>();
StringBuilder currentToken = new StringBuilder();
boolean isEscaping = false;
for (int i = 0; i < keywordString.length(); i++) {
char currentChar = keywordString.charAt(i);
if (isEscaping) {
currentToken.append(currentChar);
isEscaping = false;
} else if (currentChar == '\\') {
isEscaping = true;
} else if (currentChar == hierarchicalDelimiter) {
hierarchy.add(currentToken.toString().trim());
currentToken.setLength(0);
} else if (currentChar == delimiter) {
hierarchy.add(currentToken.toString().trim());
keywordList.add(Keyword.of(hierarchy.toArray(new String[0])));
hierarchy.clear();
currentToken.setLength(0);
} else {
currentToken.append(currentChar);
}
}
// Handle final token
if (currentToken.length() > 0 || !hierarchy.isEmpty()) {
hierarchy.add(currentToken.toString().trim());
keywordList.add(Keyword.of(hierarchy.toArray(new String[0])));
}
return keywordList;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krishnagjsForGit I see this marked as resolved. Was this discussed further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I see this loop still present)
String chain = tok.nextToken(); | ||
Keyword chainRoot = Keyword.of(chain.split(hierarchicalDelimiter.toString())); | ||
keywordList.add(chainRoot); | ||
List<String> tokens = splitRespectingEscapes(keywordString, delimiter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yubo-Cao What do you think about this? Maybe an ANTLR grammar would help here, too?
I'm sorry for not getting back to you sooner. I have two suggestions:
|
@koppor you want me to change the code accordingly ? |
I think the current code is good. Is a result of current JabRefs way of parsing. If it was done "correctly", the Bib parser would do the escape handling - and all other data structures would use tokens. But this is much much effort to change. |
@krishnagjsForGit please add a CHANGELOG.md entry |
@ungerts thank you for helping with reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes requested at #12888 (comment) need to be addressed.
Can you please check if I updated correctly ? |
Seems like it was not a clean merge. |
How did you merge, by the way? |
Just checking the output of GitHub here, I also suggest |
Alternatively, one could work with |
442cca3
to
869e80e
Compare
@@ -2,6 +2,7 @@ | |||
|
|||
import java.io.IOException; | |||
import java.nio.file.Path; | |||
import java.util.Collections; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import statement for Collections is unnecessary because the patch replaces its usage with List.of(), which is preferred in JabRef for creating empty lists.
@@ -431,7 +432,7 @@ private void initBindings() { | |||
// All databases are closed or an unknown tab is selected | |||
stateManager.setActiveDatabase(null); | |||
stateManager.activeTabProperty().set(Optional.empty()); | |||
stateManager.setSelectedEntries(List.of()); | |||
stateManager.setSelectedEntries(Collections.emptyList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In JabRef, to create an empty list, List.of() should be used instead of Collections.emptyList() for consistency with the project's coding standards.
@@ -151,7 +151,7 @@ public void listen(SharedEntriesNotPresentEvent event) { | |||
+ "\n" | |||
+ Localization.lang("You can restore the entry using the \"Undo\" operation.")); | |||
|
|||
stateManager.setSelectedEntries(List.of()); | |||
stateManager.setSelectedEntries(Collections.emptyList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In JabRef, to create an empty list we use List.of() instead of Collections.emptyList(). This aligns with the project's conventions and ensures consistency across the codebase.
Something went amiss and PR got closed. Can I create a separate Clean PR and fix this ? @koppor |
@@ -51,7 +53,7 @@ public void onStart(Stage stage) { | |||
when(stateManager.activeSearchQuery(SearchType.NORMAL_SEARCH)).thenReturn(OptionalObjectProperty.empty()); | |||
when(stateManager.searchQueryProperty()).thenReturn(mock(StringProperty.class)); | |||
when(stateManager.activeTabProperty()).thenReturn(OptionalObjectProperty.empty()); | |||
KeyBindingRepository keyBindingRepository = new KeyBindingRepository(List.of(), List.of()); | |||
KeyBindingRepository keyBindingRepository = new KeyBindingRepository(Collections.emptyList(), Collections.emptyList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In JabRef, to create an empty list, List.of() should be used instead of Collections.emptyList() to adhere to the project's coding standards.
private static Path downloadLtwaFile() throws IOException, URISyntaxException { | ||
LOGGER.info("Downloading LTWA file from {}.", LtwaListMvGenerator.LTWA_URL); | ||
var in = new URI(LTWA_URL).toURL().openStream(); | ||
var path = Files.writeString( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of 'Files.createTempFile' should be replaced with '@tempdir' JUnit5 annotation for better test management and cleanup.
if (word.startsWith("-")) { | ||
String key = word.substring(1); | ||
suffixMap.computeIfAbsent(key, k -> | ||
Stream.<LtwaEntry>builder().build().collect(Collectors.toList()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of 'collect(Collectors.toList())' should be replaced with 'toList()' for simplicity and modern Java practices.
@@ -80,7 +82,7 @@ private StringWriter getOutputWriter() throws IOException { | |||
StringWriter outputWriter = new StringWriter(); | |||
BibWriter bibWriter = new BibWriter(outputWriter, OS.NEWLINE); | |||
SelfContainedSaveConfiguration saveConfiguration = new SelfContainedSaveConfiguration(SaveOrder.getDefaultSaveOrder(), false, BibDatabaseWriter.SaveType.WITH_JABREF_META_DATA, false); | |||
FieldPreferences fieldPreferences = new FieldPreferences(true, List.of(), List.of()); | |||
FieldPreferences fieldPreferences = new FieldPreferences(true, Collections.emptyList(), Collections.emptyList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use List.of() instead of Collections.emptyList() for creating empty lists, as per the project's conventions.
Yes, please. |
081b9dc
to
85cb7c7
Compare
@trag-bot didn't find any issues in the code! ✅✨ |
I restored the state as of April, 16th - commit 081b9dc. Mayb,e this helps. If you do a Attention: The command |
Your code currently does not meet JabRef's code guidelines. We use Checkstyle to identify issues. You can see which checks are failing by locating the box "Some checks were not successful" on the pull request page. To see the test output, locate "Tests / Checkstyle (pull_request)" and click on it. In case of issues with the import order, double check that you activated Auto Import. You can trigger fixing imports by pressing Ctrl+Alt+O to trigger Optimize Imports. Please carefully follow the setup guide for the codestyle. Afterwards, please run checkstyle locally and fix the issues, commit, and push. |
Hey, we noticed that you force-pushed your changes. Force pushing is a bad practice when working together on a project (mainly because it is not supported well by GitHub itself). Commits are lost and comments on commits lose their context, thus making it harder to review changes. At the end, all commits will be squashed anyway before being merged into the In future, please avoid that. For now, you can continue working. |
Closes #12810
What I changed
Implemented escape handling in KeywordList.parse(String, Character, Character) to support escaping the keyword separator using backslash (). This prevents incorrect splitting of keywords when delimiters appear within the keyword itself.
Refactored the parsing logic for clarity, including renaming loop variables for better readability and intent.
Added test cases in KeywordListTest to cover escape scenarios such as:
Escaped delimiter: "one\,two" → ["one,two"]
Escaped backslash: "one\\two" → ["one\two"]
Mixed escaped and unescaped delimiters.
Where the changes are
KeywordList.java: Modified the parse method to include escape handling logic using a character-by-character loop.
KeywordListTest.java: Added JUnit test cases to ensure parsing behaves correctly with escaped delimiters and backslashes.
Why I made these changes
Fixes issue #12810: Current parsing breaks when delimiters appear inside keywords (e.g., in MeSH terms). There was no way to escape the delimiter character, leading to incorrect keyword splitting.
Improves consistency: Provides a more robust, predictable behavior for keyword parsing, especially when users or importers use delimiters in keyword values.
Next Steps
Awaiting review and feedback.
If the community prefers a different escape character or behavior, I’m happy to adjust the implementation.
After this is merged, similar logic could be extracted or reused where keyword parsing happens elsewhere (e.g., importers).
Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if change is visible to the user)