Skip to content

Latest commit

 

History

History
126 lines (100 loc) · 6.15 KB

java-api.md

File metadata and controls

126 lines (100 loc) · 6.15 KB

Embedding LanguageTool in Java applications

LanguageTool requires Java 8 or later.

Get LanguageTool by adding a dependency like this to your Maven pom.xml:

    <dependency>
      <groupId>org.languagetool</groupId>
      <artifactId>language-en</artifactId>
      <version>6.3</version>
    </dependency>

This will get the dependencies needed to check English. Use language-de as an artifactId for German etc. (show all artifacts). If you want to use all languages that LanguageTool supports, use language-all.

If you don't use Maven (or a similar system), download the stand-alone ZIP instead. You will need the JAR files in the libs directory, the org directory, and the META-INF directory in your classpath. We strongly recommend using Maven or Gradle instead.

To use LanguageTool, you just need to create a JLanguageTool object and use that to check your text. Also see the API documentation (Javadoc). For example:

    JLanguageTool langTool = new JLanguageTool(Languages.getLanguageForShortCode("en-GB"));
    // comment in to use statistical ngram data:
    //langTool.activateLanguageModelRules(new File("/data/google-ngram-data"));
    List<RuleMatch> matches = langTool.check("A sentence with a error in the Hitchhiker's Guide tot he Galaxy");
    for (RuleMatch match : matches) {
      System.out.println("Potential error at characters " +
          match.getFromPos() + "-" + match.getToPos() + ": " +
          match.getMessage());
      System.out.println("Suggested correction(s): " +
          match.getSuggestedReplacements());
    }

Use activateLanguageModelRules() to make use of Finding errors using Big Data.

Checking Text with Markup

LanguageTool usually works on plain text. If you need to check text with markup (HTML, XML, LaTeX, ...) you usually cannot just remove the markup, as that would mess up the position information that LanguageTool returns. In this case, use AnnotatedTextBuilder to create an AnnotatedText object that tells LanguageTool which parts of your text are markup and which are text. There's a check method in JLanguageTool that accepts the AnnotatedText object.

Multi-Threading

The JLanguageTool class is not thread safe. Create one instance of JLanguageTool per thread, but create the language only once (e.g. Languages.getLanguageForShortCode("en-GB")) and use that for all instances of JLanguageTool. The same is true for MultiThreadedJLanguageTool - its name refers to the fact that it uses threads internally, but it's not thread safe itself.

Spell Checking

If you want spell checking and the language you're working with has variants, you will need to specify that variant in the JLanguageTool constructor, e.g. Languages.getLanguageForShortCode("en-US") instead of Languages.getLanguageForShortCode("en").

To ignore words, i.e. exclude them from spell checking, call the addIgnoreTokens(...) method of the spell checking rule you're using. You first have to find the rule by iterating over all active rules. Example:

    JLanguageTool lt = new JLanguageTool(Langauges.getLanguageForShortCode("en-US")));
    for (Rule rule : lt.getAllActiveRules()) {
      if (rule instanceof SpellingCheckRule) {
        List<String> wordsToIgnore = Arrays.asList("specialword", "myotherword");
        ((SpellingCheckRule)rule).addIgnoreTokens(wordsToIgnore);
      }
    }

You can also ignore phrases with the acceptPhrases() method:

    for (Rule rule : lt.getAllActiveRules()) {
      if (rule instanceof SpellingCheckRule) {
        ((SpellingCheckRule)rule).acceptPhrases(Arrays.asList("foo bar", "producct namez"));
      }
    }

Supported Languages

LanguageTool determines which languages it supports at runtime by reading them from a file META-INF/org/languagetool/language-module.properties in the classpath. The file may look like this:

languageClasses=org.languagetool.language.it.Italian
languageClasses=org.languagetool.language.pl.Polish
languageClasses=org.languagetool.language.en.English,org.languagetool.language.en.AmericanEnglish

You either build that file yourself, adapted to the languages you support, or you take it from the LanguageTool stand-alone distribution. Of course, the classes referenced in that file actually need to be in your classpath.

Updating to a new Release

LanguageTool releases a new version every three months. While our API is generally quite stable, we sometimes make small changes that might affect your usage of the API. We recommend upgrading without skipping a version. So if you are using LanguageTool 3.2 and want to upgrade to 3.5, we recommend updating to 3.3 first, then to 3.4, then to 3.5. With each update, you should carefully look at the "API" section in our change log and re-compile the code that uses the LanguageTool API to see if there are deprecation warnings. If there are warnings, the javadoc we provide usually has a suggestion which method to use instead of the deprecated one.

Using a remote LanguageTool server

If you prefer to use a LanguageTool server to running LanguageTool in-process, use org.languagetool.remote.RemoteLanguageTool from the languagetool-http-client module. It takes care of sending the HTTP(S) request and parsing the JSON result. For testing, feel free to use our public HTTPS API server. The advantage of using a remote server is lower CPU, memory, and disk usage for your local process, especially if you'd like to use ngram data for detecting commonly confused words.