Skip to content

translate-tools/core

Repository files navigation

CodeQL

The AnyLang kit is purposed to build your own translation system.

This package provides next translation primitives:

  • Translators. Bindings to a popular translation APIs, LLMs, and free translation services like Google Translate, Bing, Yandex, etc.
  • Translation schedulers, to batch many calls to translator into single request.
  • Text-to-speech modules, to make audio of text.
  • Language utils, to manage language codes, and aliases between different ISO standards and formats
  • Processing utils like text multiplexing tools, cache management, etc.
  • Consistent interfaces. You may use presets, like ChatGPTLLMTranslator or you may compose configuration yourself, from your own implementation of LLM fetcher and LLM translator, and use it natively with other primitives.

This package is a part of TranslateTools project.

Usage

Star the project repo, help us to make it popular

Install this package via npm install anylang

Translate text with one of translators

import { MicrosoftTranslator } from 'anylang/translators';

const translator = new MicrosoftTranslator();

// You can translate single text
translator.translate('Hello world', 'en', 'de').then(console.log);

// will print string: "Hallo Welt"

// and you can translate array of texts
translator
  .translateBatch(['I am Anthony', 'How are you?'], 'en', 'de')
  .then(console.log);

// will print array of translated strings: ["Ich bin Anthony", "Wie sind Sie?"]

See a Live Demo at StackBlitz

ESM modules

This package provides both CJS and ESM modules.

In this docs all examples uses CJS modules, if you need to use ESM modules, just add /esm prefix for any paths, after a package name like that:

// Example with import a CommonJS module
import { GoogleTranslator } from 'anylang/translators';

// Example with import an ECMAScript module
import { GoogleTranslator } from 'anylang/esm/translators';

Migrations

If you update package to a new major version, check migration guides to do it smooth.

Translators list

Directory translators contains a set of translators and its types and interfaces.

Translator purpose is translate text from one language to another.

Translators in this package uses 2-letter ISO 639-1 language codes like en, es, ru, ja, etc.

If you want to use another format for language codes, for example ISO 639-2 language codes, you may implement your own adapter for Translator that map languages.

Translators have 2 public methods:

  • translate for translate single text
  • translateBatch for translate a multiple texts per one request

Method translateBatch may have better translation quality for some use cases, because some translator implementations may see a context of all strings.

Package includes translators implementations for most popular services.

Free translators list

  • GoogleTranslator and GoogleTranslatorTokenFree - both uses a free google translation API. The difference in details of implementation, so you may use any
  • MicrosoftTranslator - uses a free Microsoft's translation service that is used in Microsoft Edge browser
  • YandexTranslator - uses a free API of service https://translate.yandex.ru. This service have aggressive bots protection, so it sometimes show page with captcha challenge and blocks API requests until you go to page and will pass challenge or change your IP
  • TartuNLPTranslator - Uses a free API https://github.com/TartuNLP/translation-api built with TartuNLP's public NMT engines. Translator have good quality, but strict rate limits. See API docs: https://api.tartunlp.ai/translation/docs and Demo: https://translate.ut.ee/

Example of use

import { YandexTranslator } from 'anylang/translators';

const translator = new YandexTranslator();

// You can translate single text
translator
  .translate('Hello world', 'en', 'de')
  .then(console.log);

// will print "Hallo Welt" in console

Unstable translators

Unstable translators is placed in unstable subdirectory, because it is not purposed for high intensive use because of strict rate limits or other problems.

Unstable translators may be removed at any time.

Example of use

import { DuckDuckGoLLMTranslator } from 'anylang/translators/unstable';

const translator = new DuckDuckGoLLMTranslator();

// You can translate single text
translator
  .translate('Hello world', 'en', 'de')
  .then(console.log);

// will print "Hallo Welt" in console

Translators that require API keys

LLM translators

There are few LLM translators configuration is provided

  • ChatGPTLLMTranslator
  • GeminiLLMTranslator

Example of use is

import { ChatGPTLLMTranslator } from 'anylang/translators';

const translator = new ChatGPTLLMTranslator({
	apiKey: 'YOUR_SECRET_TOKEN_HERE',
	// Optional config
	apiOrigin: 'https://api.openai.com',
	model: 'gpt-4o-mini',
});

// You can translate single text
translator
  .translate('Hello world', 'en', 'de')
  .then(console.log);

// will print "Hallo Welt" in console

If you want, you may build your own LLM translator from primitives in subdirectory LLMTranslators.

DeepLTranslator

Uses API of https://www.deepl.com/translator

This translator requires to provide API key

import { DeepLTranslator } from 'anylang/translators';

const translator = new DeepLTranslator({
	apiKey: '820c5d18-365b-289c-e63b6fc7e1cb:fx',
});

translator
	.translate('Hello world', 'en', 'de')
	.then((translate) => console.log('Translate result', translate));

LibreTranslateTranslator

Uses API of https://github.com/LibreTranslate/LibreTranslate

This translator requires to provide API endpoint and API key for some not free API instances. See an instances list to find a public instances.

Unstable: this translator are not stable and placed in unstable subdirectory. Keep in mind the not stable translators may be removed any time.

Options

  • apiHost optional - url to API endpoint. Default: https://translate.terraprint.co/translate
  • apiKey optional - API key
import { LibreTranslateTranslator } from 'anylang/translators/unstable';

const freeTranslator = new LibreTranslateTranslator({
	apiHost: 'https://translate.argosopentech.com/translate',
});

const localDeployedTranslator = new LibreTranslateTranslator({
	apiHost: 'http://localhost:9077/translate',
});

const paidTranslator = new LibreTranslateTranslator({
	apiHost: 'https://libretranslate.com/translate',
	apiKey: '76c1d41c-0a9a-5667-e9d169746e1e',
});

Special translators

You may use FakeTranslator for test purposes. This translator is just returns original strings with added prefix

Advanced use of translators

Use in NodeJS

For use with nodejs you have to specify user agent. In most cases for nodejs, translator will work incorrectly with not set User-Agent header.

import { GoogleTranslator } from 'anylang/translators';

const translator = new GoogleTranslator({
	headers: {
		'User-Agent':
			'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
	},
});

Custom fetcher

Sometimes you may want to use custom fetcher, to send requests through proxy server, implement retries or cache, etc.

For example, some translators API is not available in browser, due to CSP policy, for this cases you may use some CORS proxy to fix the problem. With this option, translator will use proxy to send requests, keep in mind that those who own the proxy server you use, can see your requests content.

You may pass your implementation of Fetcher function to option fetcher.

import { GoogleTranslator } from 'anylang/translators';

import { Fetcher, basicFetcher } from 'anylang/utils';

// Extend `basicFetcher` with use CORS proxy for all requests
const fetcher: Fetcher = async (url, options) => {
	// Use some CORS proxy service as prefix
	return basicFetcher('https://corsproxy.io/?' + url, options);
};

const translator = new GoogleTranslator({ fetcher });

You may use any fetcher under the hood, like Axios, Got, or even postMessage.

Scheduling

Directory scheduling contains primitives to schedule translation, that allows to batch few translation requests to a single API request.

Scheduler are extremely useful for cases when your code intensive calls translation methods, to avoid API rate limit errors and to collect few requests for short time to one request.

Usage

import { GoogleTranslator } from 'anylang/translators';
import { Scheduler } from 'anylang/scheduling/Scheduler';

// We use google translator API for translate requests
const translator = new GoogleTranslator();
const scheduler = new Scheduler(translator, { translatePoolDelay: 100 });

// This text will not be translated immediately,
// instead it will be added to a batch in queue,
// and will be translated after 100ms from last update on this batch
scheduler.translate('Some text for translate', 'en', 'de', { priority: 1 });

// The same with this request, but this text will be translated first, since it have most greater priority (the greater the more important)
scheduler.translate('Text that must be translated ASAP', 'en', 'de', { priority: 10 });

// Texts with same priority will be batched
scheduler.translate('Some text for translate', 'en', 'de', { priority: 1 });

Scheduler API

Scheduler have standard interface, to allow you implement your own scheduler for you use case

export interface ISchedulerTranslateOptions {
	/**
	 * Identifier to grouping requests
	 */
	context?: string;

	/**
	 * Priority index for translate queue
	 */
	priority?: number;

	/**
	 * Use direct translate for this request if it possible
	 */
	directTranslate?: boolean;
}

export interface IScheduler {
	/**
	 * Translate text
	 *
	 * @param text text for translation
	 * @param from text language code
	 * @param to target language code for translation
	 * @param options {ISchedulerTranslateOptions}
	 */
	translate(
		text: string,
		from: string,
		to: string,
		options?: ISchedulerTranslateOptions,
	): Promise<string>;

	/**
	 * Abort translation for all requests with provided context
	 * - All delayed requests will be immediately rejected
	 * - If exception thrown for in-flight requests, they will be rejected immediately with no retries
	 *
	 * @param context unique name for group of requests that must be aborted
	 */
	abort(context: string): Promise<void>;
}

Scheduler API looks like translator with one public method translate. User calls this method any times with any frequency, and scheduler executes requests by its own plan.

API

interface Scheduler {
	new (translator: TranslatorInstanceMembers, config?: SchedulerConfig): IScheduler;
}

interface SchedulerConfig {
	/**
	 * Number of attempts for retry request
	 */
	translateRetryAttemptLimit?: number;

	/**
	 * If true - rejected requests will use direct translate
	 */
	isAllowDirectTranslateBadChunks?: boolean;

	/**
	 * Length of string for direct translate.
	 *
	 * null for disable the condition
	 */
	directTranslateLength?: number | null;

	/**
	 * Delay for translate a chunk. The bigger the more requests will collect
	 */
	translatePoolDelay?: number;

	/**
	 * When chunk collect this size, it's will be instant add to a translate queue
	 *
	 * null for disable the condition
	 */
	chunkSizeForInstantTranslate?: number | null;

	/**
	 * Pause between handle task batches
	 *
	 * It may be useful to await accumulating a task batches in queue to consider priority better and don't translate first task batch immediately
	 *
	 * WARNING: this option must be used only for consider priority better! Set small value always (10-50ms)
	 *
	 * When this option is disabled (by default) and you call translate method for texts with priority 1 and then immediately for text with priority 2, first request will have less delay for translate and will translate first, even with lower priority, because worker will translate first task immediately after delay defined by option `translatePoolDelay`
	 */
	taskBatchHandleDelay?: null | number;
}

Scheduler with cache

Scheduler with cache placed at scheduling/SchedulerWithCache and useful for cases when you needs to cache your translations.

This class is just a decorator over Scheduler that allow you to use standard object ICache with scheduler.

import { GoogleTranslator } from 'anylang/translators';
import { Scheduler } from 'anylang/scheduling/Scheduler';
import { SchedulerWithCache } from 'anylang/scheduling/SchedulerWithCache';
import { ICache } from 'anylang/utils';

// Dummy implementation of cache
class CacheInRam implements ICache {
	private cache: Record<string, string> = {};
	private buildKey(text: string, from: string, to: string) {
		return [text, from, to].join('-');
	}

	async get(text: string, from: string, to: string) {
		const cacheKey = this.buildKey(text, from, to);
		return this.cache[cacheKey] ?? null;
	}

	async set(text: string, translate: string, from: string, to: string) {
		const cacheKey = this.buildKey(text, from, to);
		this.cache[cacheKey] = translate;
	}

	async clear() {
		this.cache = {};
	}
}

// We use google translator API for translate requests
const translator = new GoogleTranslator();
const scheduler = new Scheduler(translator, { translatePoolDelay: 100 });

const cache = new CacheInRam();
const schedulerWithCache = new SchedulerWithCache(scheduler, cache);

schedulerWithCache.translate('Some text for translate', 'en', 'de');
schedulerWithCache.translate('Some another text', 'en', 'de');

API

interface SchedulerWithCache {
	new (scheduler: Scheduler, cache: ICache): IScheduler;
}

/**
 * Interface of cache
 */
export interface ICache {
	/**
	 * Get translation
	 */
	get: (text: string, from: string, to: string) => Promise<string | null>;

	/**
	 * Set translation
	 */
	set: (text: string, translate: string, from: string, to: string) => Promise<void>;

	/**
	 * Clear cache
	 */
	clear: () => Promise<void>;
}

Text to speech (TTS)

Directory tts contains text to speech modules, to make audio from a text.

TTS modules returns buffer with audio.

TTS usage

import { GoogleTTS } from 'anylang/tts/GoogleTTS';

const tts = new GoogleTTS();
tts.getAudioBuffer('Some demo text to speak', 'en').then(({ buffer, type }) => {
	// Play audio from fetched `ArrayBuffer`
	const audio = new Audio();
	audio.src = URL.createObjectURL(new Blob([buffer], { type }));
	audio.play();
});

TTS list

GoogleTTS

Uses a free API version of service translate.google.com

LingvaTTS

Uses API of https://github.com/thedaviddelta/lingva-translate

This module supports option apiHost to provide API endpoint, see an instances list to find a public instances. Default API host: https://translate.plausibility.cloud.

TTS API

/**
 * Object with audio data
 */
export type TTSAudioBuffer = {
	/**
	 * Audio mimetype
	 */
	type: string;
	/**
	 * Buffer contains audio bytes
	 */
	buffer: ArrayBuffer;
};

/**
 * TTS instance members
 */
export interface TTSProviderProps {
	/**
	 * Get blob with audio
	 * @param text text to speak
	 * @param language text language
	 * @param options optional map with preferences to generate audio
	 */
	getAudioBuffer(
		text: string,
		language: string,
		options?: Record<string, string>,
	): Promise<TTSAudioBuffer>;
}

/**
 * TTS constructor members
 */
export interface TTSProviderStaticProps {
	getSupportedLanguages(): string[];
}

/**
 * Text to speech module
 */
export type TTSProvider = TTSProviderStaticProps & {
	new (...args: any[]): TTSProviderProps;
};

Languages

Module languages contains utils to work with language codes.

isLanguageCodeISO639v1(code: string): boolean

Predicate to check is language code are valid ISO-639v1 code or not

isLanguageCodeISO639v2(code: string): boolean

Predicate to check is language code are valid ISO-639v2 code or not

getLanguageCodesISO639(set: 'v1' | 'v2'): string[]

Returns language codes for selected version of ISO-639

API

Translator API

Module translators/Translator contains translator interfaces.

TranslatorInstanceMembers

Interface describes instance members for a translator

interface TranslatorInstanceMembers {
	/**
	 * Translate text
	 * @returns Translated string
	 */
	translate(text: string, langFrom: string, langTo: string): Promise<string>;

	/**
	 * Translate texts array
	 * @returns Array with translated strings and same length as input array
	 * @returns Texts that did not translated will replaced to `null`
	 */
	translateBatch(
		text: string[],
		langFrom: string,
		langTo: string,
	): Promise<Array<string | null>>;

	/**
	 * Check string or array of stings to exceeding a limit
	 *
	 * It need for modules with complexity logic of encoding a strings.
	 * For example, when in `translateBatch` text merge to string
	 * and split to chunks by tokens with ID: `<id1>Text 1</id1><id2>Text 2</id2>`
	 *
	 * Here checked result of encoding a input data
	 * @returns number of extra chars
	 */
	checkLimitExceeding(text: string | string[]): number;

	/**
	 * Check supporting of translate direction
	 */
	checkDirection?: (langFrom: string, langTo: string) => boolean;

	/**
	 * Max length of string for `translate` or total length of strings from array for `translateBatch`
	 */
	getLengthLimit(): number;

	/**
	 * Delay between requests that required by translator to a correct work
	 */
	getRequestsTimeout: () => number;
}

TranslatorStaticMembers

Interface describes static members, useful to choose most suitable translator in a list by features

interface TranslatorStaticMembers {
	/**
	 * Public translator name to displaying
	 */
	readonly translatorName: string;

	/**
	 * Is required API key for this module
	 */
	isRequiredKey(): boolean;

	/**
	 * Is it supported value `auto` in `langFrom` argument of `translate` and `translateBatch` methods
	 */
	isSupportedAutoFrom(): boolean;

	/**
	 * Array of supported languages as ISO 639-1 codes
	 */
	getSupportedLanguages(): string[];
}

TranslatorConstructor

Interface describes translator constructor, that have members of TranslatorStaticMembers and construct an object with type TranslatorInstanceMembers

TranslatorOptions

Interface describes default options for any translator constructor.

Some translators may ignore some options.

type TranslatorOptions<O extends Record<any, any> = {}> = O & {
	/**
	 * Access key for requests to translator API
	 */
	apiKey?: string;

	/**
	 * Union text array to 1 request (or more, but less than usualy anyway).
	 *
	 * Option for reduce the number of requests, but it can make artefacts in translated text.
	 *
	 * Some modules may not support this feature.
	 */
	useMultiplexing?: boolean;

	/**
	 * Additional headers for requests
	 */
	headers?: Record<string, string>;

	/**
	 * Custom fetcher
	 */
	fetcher?: Fetcher;
};

Help to us

Star the project repo, help us to make it popular

The most valuable thing you can do to support the project if you like it, is to spread the word about this package. Tell about anylang to your team and start use it.

The more users project have, the better its quality, since we can find edge cases faster and improve code design.

You also could suggest in issues a new ideas, features and elegant ways to make package better.

AnyLang is an open source project, so we are opened for your pull requests.