Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace rust-i18n with phf string maps and simple replaces #1428

Merged
merged 1 commit into from
Dec 16, 2024

Conversation

nc7s
Copy link
Contributor

@nc7s nc7s commented Dec 13, 2024

Closes #1416
Closes #1431

Alternative to #1426 for comparison. Strings are now &'static str as phf::Map entries, placeholder substitution is simply str::replace, no complex logic. Executable size is down 1MB (!).

Downsides are 1. build.rs, 2. str::replace is not enough for complex translations.

Currently both the $kt:ident = $kt:expr and $kt:literal => $kv:expr forms of t! are still supported, but I doubt if they really differ.

@nc7s
Copy link
Contributor Author

nc7s commented Dec 13, 2024

Failure of the build-docker job seems to be a result of the way Dockerfile is written. The "caching" somehow resulted in build.rs being skipped.

@fujiapple852
Copy link
Owner

@nc7s I pushed a commit which fixes the Dockerfile. I've not had time to review the rest of this yet, I will over the weekend.

@fujiapple852
Copy link
Owner

fujiapple852 commented Dec 13, 2024

aside: I think this may be broken on master as well, as the locales directory is not copied into the docker image.

edit: yes, broken:

$ docker run -it fujiapple/trippy 1.1.1.1 --print-locales
TUI locales:

bug: #1431

@fujiapple852
Copy link
Owner

@nc7s i've rebased and pushed a WIP commit that:

  • replaces phf (and the need for the build.rs) with a simple include_str
  • replaces the RwLock with ArcSwap
  • Fixed an issue where the provided locale, rather than the actual locale, was displayed
  • (re)added some tests

LTM what you think.

Cargo.toml Outdated Show resolved Hide resolved
@nc7s
Copy link
Contributor Author

nc7s commented Dec 15, 2024

I used phf because it's more performant in this small key scenario than SipHash currently used by std::collections::HashMap IIRC. Did a more or less also unscientific benchmark with bench code like this:

mod locale;

use locale::*;
use rand::distributions::{Distribution, Uniform};

fn main() {
    let mut rng = rand::thread_rng();

    let all_keys: Vec<_> = data().0.keys().collect();
    let all_locales: Vec<_> = available_locales();

    let keys: Vec<_> = Uniform::new(0, all_keys.len())
        .sample_iter(&mut rng)
        .take(100)
        .map(|i| all_keys[i])
        .collect();
    let locales: Vec<_> = Uniform::new(0, all_locales.len())
        .sample_iter(&mut rng)
        .take(100)
        .map(|i| all_locales[i])
        .collect();

    for locale in locales {
        set_locale(Some(locale));
        for key in &keys {
            println!("{locale}:{key} = '{}'", t!(key));
        }
    }
}

and the result is ("s" means std i.e. std::collections::HashMap, "p" means phf::Map):

/tmp/std-phf-comp                                                                           took 5s
❯ hyperfine --warmup 50 --runs 500 s/target/release/s p/target/release/p
Benchmark 1: s/target/release/s
  Time (mean ± σ):      12.6 ms ±   0.6 ms    [User: 8.7 ms, System: 3.8 ms]
  Range (min … max):    11.5 ms …  14.7 ms    500 runs

Benchmark 2: p/target/release/p
  Time (mean ± σ):       8.5 ms ±   0.5 ms    [User: 4.9 ms, System: 3.5 ms]
  Range (min … max):     7.8 ms …  11.2 ms    500 runs

Summary
  p/target/release/p ran
    1.49 ± 0.11 times faster than s/target/release/s

phf is significantly faster by these numbers, but then again, probably negligible as it's on the level of nanoseconds per read.

So, ultimately your call. The code structure is kinda new to me but OK :>

Side note: I implemented "language-region to language fallback" by cutting the first two chars of locale: &str, assuming all language codes are ISO 639-1 i.e. two-letter. You may or may not think it's better than .split('-').next().

@fujiapple852
Copy link
Owner

fujiapple852 commented Dec 16, 2024

phf is significantly faster by these numbers, but then again, probably negligible as it's on the level of nanoseconds per read.

@nc7s can phf be used in a similar way to toml::from_str(include_str!("../locales.toml"))? I'm fine to use phf, especially if it is faster, but would prefer to avoid the complexity of generating source code in build.rs for this.

aside: we may want a buld.rs anyway, just to have cargo::rerun-if-changed for crates/trippy-tui/locales.toml.

Side note: I implemented "language-region to language fallback" by cutting the first two chars of locale: &str, assuming all language codes are ISO 639-1 i.e. two-letter. You may or may not think it's better than .split('-').next().

I'd like to swap out the adhoc string parsing for a more robust and compliant Locale type, similar to the ones some of the i18n crates use. Trippy should also validate the given locale string much earlier, somewhere in the config.rs module. For now i'll stick with the overly forgiving parsing in 0.12.0 for consistency. I've raise a follow up issue (#1432)

@fujiapple852
Copy link
Owner

@nc7s I replaced OnceLock<ArcSwap<String>> with a simple RefCell<String> using thread_local! to get single-threaded-and-mutable semantics. The UI is single threaded and so there should be no need for ArcSwap and/or RwLock.

@c-git
Copy link
Collaborator

c-git commented Dec 16, 2024

@fujiapple852 but if it never changes then maybe OnceCell might be a better fit?

@nc7s
Copy link
Contributor Author

nc7s commented Dec 16, 2024

phf is significantly faster by these numbers, but then again, probably negligible as it's on the level of nanoseconds per read.

@nc7s can phf be used in a similar way to toml::from_str(include_str!("../locales.toml"))? I'm fine to use phf, especially if it is faster, but would prefer to avoid the complexity of generating source code in build.rs for this.

I don't think so, phf is "a library to generate efficient lookup tables", "at compile time". I understand build.rs is more or less ugly, but the cost of the perfect hash values might actually hurt performance if computed at runtime.

aside: we may want a buld.rs anyway, just to have cargo::rerun-if-changed for crates/trippy-tui/locales.toml.

That would ease accepting a build.rs I guess ;)

Like said earlier, though, this ultimately boils down to a difference of tens of nanoseconds per read.

Side note: I implemented "language-region to language fallback" by cutting the first two chars of locale: &str, assuming all language codes are ISO 639-1 i.e. two-letter. You may or may not think it's better than .split('-').next().

I'd like to swap out the adhoc string parsing for a more robust and compliant Locale type, similar to the ones some of the i18n crates use. Trippy should also validate the given locale string much earlier, somewhere in the config.rs module. For now i'll stick with the overly forgiving parsing in 0.12.0 for consistency. I've raise a follow up issue (#1432)

This smells like reinventing (some aspects of) the wheel.. though I too lean towards correctness on this matter.

@nc7s
Copy link
Contributor Author

nc7s commented Dec 16, 2024

@nc7s I replaced OnceLock<ArcSwap<String>> with a simple RefCell<String> using thread_local! to get single-threaded-and-mutable semantics. The UI is single threaded and so there should be no need for ArcSwap and/or RwLock.

I don't have a reason to object to removing the lock in single thread environments, but then you take the responsibility to make sure either it does not become multi-threaded, or if it does, use locks again ;)

@fujiapple852
Copy link
Owner

@c-git the problem with using OnceLock in this case is the initial value is not known until runtime. Even though it will only be set once at startup and not read until after it is set, __translate is forced to specify an initial value (which it does not know) just in case it runs first. In practise that can't actually happen, but it is a bit brittle.

@fujiapple852 fujiapple852 force-pushed the i18n-phf branch 2 times, most recently from 9e04e04 to 5310209 Compare December 16, 2024 14:18
@fujiapple852 fujiapple852 merged commit 4be6f82 into fujiapple852:master Dec 16, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Locale data not copied into docker image Get rid of (serde_)y(a)ml
3 participants