LanguageTool requires Java 8 or later.
Get LanguageTool by adding a dependency like this to your
Maven pom.xml
:
<dependency>
<groupId>org.languagetool</groupId>
<artifactId>language-en</artifactId>
<version>6.3</version>
</dependency>
This will get the dependencies needed to check English. Use language-de
as an
artifactId
for German etc. (show all artifacts).
If you want to use all languages that LanguageTool supports, use language-all
.
If you don't use Maven (or a similar system), download the stand-alone ZIP instead.
You will need the JAR files in the libs
directory, the org
directory, and the
META-INF
directory in your classpath. We strongly recommend using Maven or Gradle instead.
To use LanguageTool, you just need to create a JLanguageTool
object and use
that to check your text. Also see the API documentation (Javadoc).
For example:
JLanguageTool langTool = new JLanguageTool(Languages.getLanguageForShortCode("en-GB"));
// comment in to use statistical ngram data:
//langTool.activateLanguageModelRules(new File("/data/google-ngram-data"));
List<RuleMatch> matches = langTool.check("A sentence with a error in the Hitchhiker's Guide tot he Galaxy");
for (RuleMatch match : matches) {
System.out.println("Potential error at characters " +
match.getFromPos() + "-" + match.getToPos() + ": " +
match.getMessage());
System.out.println("Suggested correction(s): " +
match.getSuggestedReplacements());
}
Use activateLanguageModelRules()
to make use of Finding errors using Big Data.
LanguageTool usually works on plain text. If you need to check text with markup (HTML,
XML, LaTeX, ...) you usually cannot just remove the markup, as that would mess up
the position information that LanguageTool returns. In this case, use
AnnotatedTextBuilder to create an AnnotatedText object that tells LanguageTool which parts of your text are markup and which are text. There's a check method in JLanguageTool
that accepts the AnnotatedText
object.
The JLanguageTool
class is not thread safe. Create one instance of JLanguageTool
per thread, but create the language only once (e.g. Languages.getLanguageForShortCode("en-GB")
) and
use that for all instances of JLanguageTool
. The same is true for
MultiThreadedJLanguageTool
- its name refers to the fact that it uses
threads internally, but it's not thread safe itself.
If you want spell checking and the language you're working with has variants,
you will need to specify that variant in the JLanguageTool
constructor, e.g.
Languages.getLanguageForShortCode("en-US")
instead of Languages.getLanguageForShortCode("en")
.
To ignore words, i.e. exclude them from spell checking, call the addIgnoreTokens(...)
method of the spell checking rule you're using. You first have to find the rule by
iterating over all active rules. Example:
JLanguageTool lt = new JLanguageTool(Langauges.getLanguageForShortCode("en-US")));
for (Rule rule : lt.getAllActiveRules()) {
if (rule instanceof SpellingCheckRule) {
List<String> wordsToIgnore = Arrays.asList("specialword", "myotherword");
((SpellingCheckRule)rule).addIgnoreTokens(wordsToIgnore);
}
}
You can also ignore phrases with the acceptPhrases()
method:
for (Rule rule : lt.getAllActiveRules()) {
if (rule instanceof SpellingCheckRule) {
((SpellingCheckRule)rule).acceptPhrases(Arrays.asList("foo bar", "producct namez"));
}
}
LanguageTool determines which languages it supports at runtime by reading
them from a file META-INF/org/languagetool/language-module.properties
in the classpath. The file may look like this:
languageClasses=org.languagetool.language.it.Italian
languageClasses=org.languagetool.language.pl.Polish
languageClasses=org.languagetool.language.en.English,org.languagetool.language.en.AmericanEnglish
You either build that file yourself, adapted to the languages you support, or you take it from the LanguageTool stand-alone distribution. Of course, the classes referenced in that file actually need to be in your classpath.
LanguageTool releases a new version every three months. While our API is generally quite stable, we sometimes make small changes that might affect your usage of the API. We recommend upgrading without skipping a version. So if you are using LanguageTool 3.2 and want to upgrade to 3.5, we recommend updating to 3.3 first, then to 3.4, then to 3.5. With each update, you should carefully look at the "API" section in our change log and re-compile the code that uses the LanguageTool API to see if there are deprecation warnings. If there are warnings, the javadoc we provide usually has a suggestion which method to use instead of the deprecated one.
If you prefer to use a LanguageTool server to
running LanguageTool in-process, use org.languagetool.remote.RemoteLanguageTool
from the languagetool-http-client
module. It takes care of sending the
HTTP(S) request and parsing the JSON result. For testing, feel free to
use our public HTTPS API server. The advantage
of using a remote server is lower CPU, memory, and disk usage for your
local process, especially if you'd like to use
ngram data for detecting commonly
confused words.