Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace rust-i18n with i18n-embed #1426

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

nc7s
Copy link
Contributor

@nc7s nc7s commented Dec 11, 2024

As discussed in #1416. From there:

I'm also tempted to just do something akin to your patch, as doing this translation is simple enough that pulling in dependencies may be overkill.

Probably, if strings are to be fairly simple and keep that way. Optimize with phf, even.

Anyway, the new code is here for comparison.

This is a very crude draft with quite some points up for debate: code style and placement, workaround for locale override with config value (there's a comment about this), structure of i18n resources, and MSRV (enabling std::sync::LazyLock, but actually not very relevant to i18n).

Good news is this does not significantly affect compiled size (even down a few kilos actually).

@fujiapple852
Copy link
Owner

fujiapple852 commented Dec 12, 2024

Thanks, I'll review this properly later today.

One thing worth doing is re-introducing the locale::t! "wrapper" macro such that no changes are needed at the call site (i.e. everything in frontend::render).

I tested this locally and it seemed to work ok:

#[macro_export]
macro_rules! t {
    ($($all:tt)*) => {
        i18n_embed_fl::fl!($crate::locale::LANGUAGE_LOADER, $($all)*)
    }
}

@nc7s
Copy link
Contributor Author

nc7s commented Dec 12, 2024

Indeed, that'll remove quite some noise. Though fl! returns String, besides does not support the => style invocation, so some invocations still need to be adjusted.

@nc7s nc7s force-pushed the i18n-embed branch 12 times, most recently from 5a5a5ac to cec26eb Compare December 12, 2024 06:44
Comment on lines 274 to 278
if self == &Self::Ttl {
return "#".into();
}
match self {
Self::Ttl => Cow::Borrowed("#"),
Self::Ttl => unreachable!(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoids introducing .into() to all t!() invocations at the cost of extra statements other than the match. I'll switch if you'd prefer.

Copy link
Owner

@fujiapple852 fujiapple852 Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got me thinking:

The ColumnType::name method here returns Cow<'_, str> only because that is what the i18n-rust crate macro returned. Easy to fix, we just make the return type here (and a few other places) a String.

However, does that mean that i18n-embed allocates an owned string every time? After digging through the code of the i18n_embed_fl::fl! macro it seems it ultimate calls this:

https://github.com/kellpossible/cargo-i18n/blob/master/i18n-embed/src/fluent.rs#L133-L136

/// Get a localized message referenced by the `message_id`.
pub fn get(&self, message_id: &str) -> String {
    self.get_args_fluent(message_id, None)
}

So the answer appears to be that it does allocate a new String for every translation, every time. Caveat that I only looked at it briefly so I may be missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i18n_embed::fluent::FluentLanguageLoader::get_args_fluent() ultimately calls fluient::bundle::FluentBundle::format_pattern(), which actually returns a Cow<_, str>. It then turns it .into() a String. This is a bit frustrating.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I added rust-i18n I was a bit concerned about the extra CPU load as it did involved adding several string allocations per-frame (every 100ms by default). I did some (very unscientific) testing, I built in release mode and eyeballing the CPU graph, and it did have a small but noticeable impact. This would likely add even more so i'd want to repeat that eyeball test to see if the change is meaningful.

@fujiapple852
Copy link
Owner

fujiapple852 commented Dec 12, 2024

@nc7s I've pushed some WIP commits to this PR (we can refine these and squash later).

  • wip: remove unused sys-locale crate

No longer used so I removed it. I haven't reviewed the replacement logic yet.

  • wip: remove Cow from several methods

See my comment above re: removing Cow and what that reveals about the way the i18n_embed crate works.

  • wip: use OnceLock instead of LazyLock

We can use OnceLock (stable on 1.70, Jun 2023) instead of LazyLock (1.80, Jul 2024) to avoid the MSRV bump, as the latter is a little bit too recent (thinking about package maintainers here!). After adding this commit I dropped the earlier MSRV bump commit.

  • wip: move i18n into crates/trippy-tui and delete old locales

As the translations are strictly a TUI concern I'd prefer this to live in the trippy-tui crate. It was quite fiddly to move, but it seems to work. It seems rust-embed handles differently between debug and release builds but has a "debug-embed" to align them, which I briefly tried but didn't use.

I'm find it confusing to have a i18n.toml config file which defines the assets_dir and having to specify it in #[folder = "i18n"]. I did look briefly if there was a way to avoid this but didn't see anything.

I also removed the old locale directory in this commit.

@fujiapple852
Copy link
Owner

@nc7s on the fluent ftl files, do you know if it is possible to combine them into a single multi-language file? This has pros and cons but is handy for generating placeholder translations for new terms for all languages quickly as it plays well with AI tools.

If not, can we name all file as ${lang}_ftl and place them in a single directory? (aside: is there any significance to the name trippy-tui.ftl or is it arbitrary?)

Cargo.lock Outdated Show resolved Hide resolved
@nc7s
Copy link
Contributor Author

nc7s commented Dec 13, 2024

Thanks for the detailed review and code.

  • wip: move i18n into crates/trippy-tui and delete old locales
    As the translations are strictly a TUI concern I'd prefer this to live in the trippy-tui crate. It was quite fiddly to move, but it seems to work. It seems rust-embed handles differently between debug and release builds but has a "debug-embed" to align them, which I briefly tried but didn't use.

I regret to admit it's because I simply followed its docs, which told me to put it in the root. :>

I'm find it confusing to have a i18n.toml config file which defines the assets_dir and having to specify it in #[folder = "i18n"]. I did look briefly if there was a way to avoid this but didn't see anything.

The former is for the i18n-embed crate, the latter for rust-embed. The former does not actually embed files itself, even with that word in the name.

@nc7s on the fluent ftl files, do you know if it is possible to combine them into a single multi-language file? This has pros and cons but is handy for generating placeholder translations for new terms for all languages quickly as it plays well with AI tools.

As shown in rust-i18n docs:

# Use the fluent localization system.
[fluent]
# (Required) The path to the assets directory.
# The paths inside the assets directory should be structured like so:
# `assets_dir/{language}/{domain}.ftl`
assets_dir = "i18n"

I'm afraid we have to manually reimplement some of the set code to support an alternative layout, e.g. building FluentBundles ourselves.

If not, can we name all file as ${lang}_ftl and place them in a single directory? (aside: is there any significance to the name trippy-tui.ftl or is it arbitrary?)

As shown above, trippy-tui.ftl is just {domain}.ftl, which we do have control by manually building a FluentLanguageLoader (not using the fluent_language_loader! macro).

@c-git
Copy link
Collaborator

c-git commented Dec 13, 2024

Maybe this is a good use case to implement a few bench marks?

PS. If we don't allow the language to change at runtime maybe we can use LazyCell or something like that to drop the runtime cost. I'm using it right now at work and it's pretty simple to use. You only pay for what you use and you only pay once and since it derefs into &T most code doesn't need to change.

@nc7s
Copy link
Contributor Author

nc7s commented Dec 13, 2024

I used its thread safe version, std::sync::LazyLock, but it requires bumping MSRV to 1.80, @fujiapple852 worried it's too new and changed "back" to std::sync::OnceLock, which was introduced in 1.70, lower than current MSRV 1.76. It has the cost of lock, but it seems we need it for Sync:

error[E0277]: `OnceCell<FluentLanguageLoader>` cannot be shared between threads safely
  --> crates/trippy-tui/src/locale.rs:12:29
   |
12 | pub static LANGUAGE_LOADER: OnceCell<FluentLanguageLoader> = OnceCell::new();
   |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `OnceCell<FluentLanguageLoader>` cannot be shared between threads safely
   |
   = help: the trait `Sync` is not implemented for `OnceCell<FluentLanguageLoader>`
   = note: if you want to do aliasing and mutation between multiple threads, use `std::sync::OnceLock` instead
   = note: shared static variables must have a type that implements `Sync`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `trippy-tui` (lib) due to 1 previous error

@c-git
Copy link
Collaborator

c-git commented Dec 13, 2024

Oh sorry I guess I should have remember it needed to go across threads. Yes I think the lock cost is less than creating the string each time but we'd probably need to benchmark it to really know. And yes using OnceLock would also be fine.

@nc7s
Copy link
Contributor Author

nc7s commented Dec 13, 2024

Arguably we could .leak() a String into it, since the locale is almost always set only once per run, but we still need to figure out how to write into a static without OnceLock or the like.

@c-git
Copy link
Collaborator

c-git commented Dec 13, 2024

I think I missed something, why can't we use OnceLock?

@nc7s
Copy link
Contributor Author

nc7s commented Dec 13, 2024

Uh that was intended as an idea for further optimization, OnceLock is fine per se.

@fujiapple852
Copy link
Owner

@nc7s I pushed another wip commit, this is mostly about making the i18n impl hidden (to the extent possible) inside the locale module. I also added some tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants