Skip to content

[GR-18163] Support the POSIX locale, the default locale in Docker images #3907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Compatibility:
* Adjust a `FrozenError`'s message and add a receiver when a frozen module or class is modified (e.g. by defining or undefining an instance method or by defining a nested module (@andrykonchin).
* Fix `Kernel#sprintf` and `%p` format specification to produce `"nil"` for `nil` argument (#3846, @andrykonchin).
* Reimplement `Data#with` to not call `Data.new` that can be removed or redefined (#3890, @andrykonchin).
* TruffleRuby now supports the `POSIX` locale, the default locale in Docker images (@eregon).

Performance:

Expand Down
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,6 @@ environment, for example, by unmounting system filesystems such as `/dev/shm`.
Without these dependencies, many libraries including RubyGems will not work.
TruffleRuby will try to print a nice error message if a dependency is missing, but this can only be done on a best effort basis.

You also need to set up a [UTF-8 locale](doc/user/utf8-locale.md) if not already done.

See the [contributor workflow](doc/contributor/workflow.md) document if you wish to build TruffleRuby from source.

## Current Status
Expand Down
23 changes: 13 additions & 10 deletions ci.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -245,38 +245,41 @@ local part_definitions = {

platform: {
local common_deps = common.deps.truffleruby + common.deps.sulong,
local locale = {
# We want to test with the POSIX locale, which is the default locale for Docker images.
# We need to override all locale-related env vars set by the CI.
environment+: {
LANG: "POSIX",
LC_ALL: "POSIX",
LC_CTYPE: "POSIX",
},
},

linux: common.linux_amd64 + common_deps + {
linux: common.linux_amd64 + common_deps + locale + {
platform_name:: "LinuxAMD64",
"$.cap":: {
normal_machine: [],
bench_machine: ["x52"] + self.normal_machine + ["no_frequency_scaling"],
},
},
linux_aarch64: common.linux_aarch64 + common_deps + {
linux_aarch64: common.linux_aarch64 + common_deps + locale + {
platform_name:: "LinuxAArch64",
"$.cap":: {
normal_machine: [],
},
},
darwin_amd64: common.darwin_amd64 + common_deps + {
darwin_amd64: common.darwin_amd64 + common_deps + locale + {
platform_name:: "DarwinAMD64",
"$.cap":: {
# GR-45839, GR-46279: exclude macmini_late_2014_8gb, they are too slow, have too little RAM and cause various timeouts
normal_machine: ["darwin_bigsur", "!macmini_late_2014_8gb"],
},
environment+: {
LANG: "en_US.UTF-8",
},
},
darwin_aarch64: common.darwin_aarch64 + common_deps + {
darwin_aarch64: common.darwin_aarch64 + common_deps + locale + {
platform_name:: "DarwinAArch64",
"$.cap":: {
normal_machine: ["darwin_bigsur"],
},
environment+: {
LANG: "en_US.UTF-8",
},
},
},

Expand Down
17 changes: 12 additions & 5 deletions doc/user/utf8-locale.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,28 @@ permalink: /reference-manual/ruby/UTF8Locale/
---
# Setting Up a UTF-8 Locale

You need a UTF-8 locale to run some Ruby applications.
For example, we have found that RubyGems and ruby/spec need such a locale.
Since TruffleRuby 25.0, TruffleRuby supports the `POSIX` locale, the default locale in Docker images.
**So there is no need to set up a locale anymore.**

This is not needed if the `$LANG` environment variable is already set and:
Some Ruby applications however require setting up a proper locale (same on CRuby).
The instructions below explain how to do that.

You can check the current locale using:

```bash
locale
```

shows no `="C"` and no warning.
Instead, all values should be `"en_US.UTF-8"` or other regions but still `.UTF-8`.
If that shows warnings, it probably means `LANG` is set to a locale which is not installed.

These docs explain how to setup the `en_US.UTF-8` locale.

As a note, the `C.UTF-8` locale also exists on Linux (but not on macOS) and might be more convenient as it does not require installing extra packages.

### Fedora-based: RHEL, Oracle Linux, etc

```bash
sudo dnf install glibc-langpack-en
export LANG=en_US.UTF-8
```

Expand Down
10 changes: 6 additions & 4 deletions spec/truffle/include_all_c_header_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@

describe 'lib/cext/include/internal_all.h' do
it 'includes each *.h file from lib/cext/include/internal/' do
filenames = Dir.glob('internal/**/*.h', base: 'lib/cext/include', sort: true)
content = File.read('lib/cext/include/internal_all.h')
ruby_home = RbConfig::CONFIG['prefix']
filenames = Dir.glob('internal/**/*.h', base: "#{ruby_home}/lib/cext/include", sort: true)
content = File.read("#{ruby_home}/lib/cext/include/internal_all.h")

filenames.should_not be_empty

Expand All @@ -20,8 +21,9 @@
end

it 'includes each *.h file from lib/cext/include/stubs/internal/' do
filenames = Dir.glob('internal/**/*.h', base: 'lib/cext/include/stubs', sort: true)
content = File.read('lib/cext/include/internal_all.h')
ruby_home = RbConfig::CONFIG['prefix']
filenames = Dir.glob('internal/**/*.h', base: "#{ruby_home}/lib/cext/include/stubs", sort: true)
content = File.read("#{ruby_home}/lib/cext/include/internal_all.h")

filenames.should_not be_empty

Expand Down
6 changes: 0 additions & 6 deletions spec/truffle/launcher_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -404,12 +404,6 @@ def should_print_full_java_command(options, env: {})
end
end

it "warns if the locale is not set properly" do
err = ruby_exe("Encoding.find('locale')", args: "2>&1", env: { "LC_ALL" => "C" })
err.should.include? "[ruby] WARNING: Encoding.find('locale') is US-ASCII (due to nl_langinfo(CODESET) which returned "
err.should.include? "), this often indicates that the system locale is not set properly"
end

['RUBYOPT', 'TRUFFLERUBYOPT'].each do |var|
it "should recognize ruby --vm options in #{var}" do
out = ruby_exe('print Truffle::System.get_java_property("foo")', env: { var => "#{ENV[var]} --vm.Dfoo=bar" }, args: @redirect)
Expand Down
16 changes: 0 additions & 16 deletions src/main/java/org/truffleruby/core/encoding/EncodingManager.java
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,6 @@ public void initializeDefaultEncodings(TruffleNFIPlatform nfi, NativeConfigurati

private void initializeLocaleEncoding(TruffleNFIPlatform nfi, NativeConfiguration nativeConfiguration) {
final String localeEncodingName;
final String detector;
if (nfi != null) {
final int codeset = (int) nativeConfiguration.get("platform.langinfo.CODESET");

Expand All @@ -146,10 +145,8 @@ private void initializeLocaleEncoding(TruffleNFIPlatform nfi, NativeConfiguratio
context,
InteropLibrary.getUncached(),
0);
detector = "nl_langinfo(CODESET)";
localeEncodingName = new String(bytes, StandardCharsets.US_ASCII);
} else {
detector = "Charset.defaultCharset()";
localeEncodingName = Charset.defaultCharset().name();
}

Expand All @@ -158,19 +155,6 @@ private void initializeLocaleEncoding(TruffleNFIPlatform nfi, NativeConfiguratio
rubyEncoding = Encodings.US_ASCII;
}

if (context.getOptions().WARN_LOCALE && rubyEncoding == Encodings.US_ASCII) {
String firstLine = "Encoding.find('locale') is US-ASCII (due to " + detector + " which returned " +
localeEncodingName + "), this often indicates that the system locale is not set properly. ";
if ("C".equals(System.getenv("LANG")) && "C".equals(System.getenv("LC_ALL"))) {
// The parent process seems to explicitly want a C locale (e.g. EnvUtil#invoke_ruby in the MRI test harness), so only warn at config level in this case.
RubyLanguage.LOGGER.config(firstLine + "Warning at level=CONFIG because LANG=C and LC_ALL=C are set. " +
"Set LANG=en_US.UTF-8 and see https://www.graalvm.org/dev/reference-manual/ruby/UTF8Locale/ for details.");
} else {
RubyLanguage.LOGGER.warning(firstLine +
"Set LANG=en_US.UTF-8 and see https://www.graalvm.org/dev/reference-manual/ruby/UTF8Locale/ for details.");
}
}

localeEncoding = rubyEncoding;
}

Expand Down
5 changes: 0 additions & 5 deletions src/main/java/org/truffleruby/options/Options.java
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,6 @@ public final class Options {
public final boolean VIRTUAL_THREAD_FIBERS;
/** --log-subprocess=false */
public final boolean LOG_SUBPROCESS;
/** --warn-locale=true */
public final boolean WARN_LOCALE;
/** --exceptions-store-java=false */
public final boolean EXCEPTIONS_STORE_JAVA;
/** --exceptions-print-java=false */
Expand Down Expand Up @@ -236,7 +234,6 @@ public Options(Env env, OptionValues options, LanguageOptions languageOptions) {
HASHING_DETERMINISTIC = options.get(OptionsCatalog.HASHING_DETERMINISTIC_KEY);
VIRTUAL_THREAD_FIBERS = options.get(OptionsCatalog.VIRTUAL_THREAD_FIBERS_KEY);
LOG_SUBPROCESS = options.get(OptionsCatalog.LOG_SUBPROCESS_KEY);
WARN_LOCALE = options.get(OptionsCatalog.WARN_LOCALE_KEY);
EXCEPTIONS_STORE_JAVA = options.get(OptionsCatalog.EXCEPTIONS_STORE_JAVA_KEY);
EXCEPTIONS_PRINT_JAVA = options.get(OptionsCatalog.EXCEPTIONS_PRINT_JAVA_KEY);
EXCEPTIONS_PRINT_UNCAUGHT_JAVA = options.get(OptionsCatalog.EXCEPTIONS_PRINT_UNCAUGHT_JAVA_KEY);
Expand Down Expand Up @@ -357,8 +354,6 @@ public Object fromDescriptor(OptionDescriptor descriptor) {
return VIRTUAL_THREAD_FIBERS;
case "ruby.log-subprocess":
return LOG_SUBPROCESS;
case "ruby.warn-locale":
return WARN_LOCALE;
case "ruby.exceptions-store-java":
return EXCEPTIONS_STORE_JAVA;
case "ruby.exceptions-print-java":
Expand Down
2 changes: 1 addition & 1 deletion src/main/ruby/truffleruby/core/dir.rb
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ def glob(pattern, flags = 0, base: nil, sort: true, &block)

total = matches.size
while index < total
matches[index] = matches[index].encode(enc) unless matches[index].encoding == enc
matches[index] = matches[index].force_encoding(enc) unless matches[index].encoding == enc
index += 1
end
end
Expand Down
6 changes: 5 additions & 1 deletion src/main/ruby/truffleruby/core/env.rb
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,11 @@ def set_encoding(value)
if Encoding.default_internal && value.ascii_only?
value = value.encode Encoding.default_internal, Encoding::LOCALE
elsif value.encoding != Encoding::LOCALE
value = value.dup.force_encoding(Encoding::LOCALE)
if Encoding::LOCALE == Encoding::US_ASCII && !value.ascii_only?
value = value.b
else
value = value.dup.force_encoding(Encoding::LOCALE)
end
end
value.freeze
end
Expand Down
1 change: 0 additions & 1 deletion src/options.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,6 @@ EXPERT:
HASHING_DETERMINISTIC: [hashing-deterministic, boolean, false, Produce deterministic hash values]
VIRTUAL_THREAD_FIBERS: [virtual-thread-fibers, boolean, false, 'Use VirtualThread for Fibers']
LOG_SUBPROCESS: [log-subprocess, boolean, false, 'Log whenever a subprocess is created'] # Also see --log-process-args
WARN_LOCALE: [warn-locale, boolean, true, 'Warn when the system locale is not set properly']

# Options to tweak backtraces
EXCEPTIONS_STORE_JAVA: [exceptions-store-java, boolean, false, Store the Java exception with the Ruby backtrace]
Expand Down
12 changes: 0 additions & 12 deletions src/shared/java/org/truffleruby/shared/options/OptionsCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@ public final class OptionsCatalog {
public static final OptionKey<Boolean> HASHING_DETERMINISTIC_KEY = new OptionKey<>(false);
public static final OptionKey<Boolean> VIRTUAL_THREAD_FIBERS_KEY = new OptionKey<>(false);
public static final OptionKey<Boolean> LOG_SUBPROCESS_KEY = new OptionKey<>(false);
public static final OptionKey<Boolean> WARN_LOCALE_KEY = new OptionKey<>(true);
public static final OptionKey<Boolean> EXCEPTIONS_STORE_JAVA_KEY = new OptionKey<>(false);
public static final OptionKey<Boolean> EXCEPTIONS_PRINT_JAVA_KEY = new OptionKey<>(false);
public static final OptionKey<Boolean> EXCEPTIONS_PRINT_UNCAUGHT_JAVA_KEY = new OptionKey<>(false);
Expand Down Expand Up @@ -429,14 +428,6 @@ public final class OptionsCatalog {
.usageSyntax("")
.build();

public static final OptionDescriptor WARN_LOCALE = OptionDescriptor
.newBuilder(WARN_LOCALE_KEY, "ruby.warn-locale")
.help("Warn when the system locale is not set properly")
.category(OptionCategory.EXPERT)
.stability(OptionStability.EXPERIMENTAL)
.usageSyntax("")
.build();

public static final OptionDescriptor EXCEPTIONS_STORE_JAVA = OptionDescriptor
.newBuilder(EXCEPTIONS_STORE_JAVA_KEY, "ruby.exceptions-store-java")
.help("Store the Java exception with the Ruby backtrace")
Expand Down Expand Up @@ -1385,8 +1376,6 @@ public static OptionDescriptor fromName(String name) {
return VIRTUAL_THREAD_FIBERS;
case "ruby.log-subprocess":
return LOG_SUBPROCESS;
case "ruby.warn-locale":
return WARN_LOCALE;
case "ruby.exceptions-store-java":
return EXCEPTIONS_STORE_JAVA;
case "ruby.exceptions-print-java":
Expand Down Expand Up @@ -1647,7 +1636,6 @@ public static OptionDescriptor[] allDescriptors() {
HASHING_DETERMINISTIC,
VIRTUAL_THREAD_FIBERS,
LOG_SUBPROCESS,
WARN_LOCALE,
EXCEPTIONS_STORE_JAVA,
EXCEPTIONS_PRINT_JAVA,
EXCEPTIONS_PRINT_UNCAUGHT_JAVA,
Expand Down
14 changes: 0 additions & 14 deletions tool/docker-configs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,57 +6,43 @@ rpm: &rpm
yaml: libyaml-devel
cext: gcc make
c++: gcc-c++
set-locale:
- ENV LANG=en_US.UTF-8

deb: &deb
locale: locales
tar:
specs: netbase
zlib: libz-dev
openssl: libssl-dev
yaml: libyaml-dev
cext: gcc make
c++: g++
set-locale:
# Uncomment the en_US.UTF-8 line in /etc/locale.gen
- RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen
# locale-gen generates locales for all uncommented locales in /etc/locale.gen
- RUN locale-gen
- ENV LANG=en_US.UTF-8

# Too old g++
#ol7:
# base: oraclelinux:7-slim
# # --enablerepo needed for libyaml-devel
# install: RUN yum install --enablerepo=ol7_optional_latest -y
# locale:
# <<: *rpm

ol8:
base: oraclelinux:8-slim
# --enablerepo needed for libyaml-devel
install: RUN microdnf install --enablerepo=ol8_codeready_builder -y
locale: glibc-langpack-en
<<: *rpm

ol9:
base: oraclelinux:9-slim
# --enablerepo needed for libyaml-devel
install: RUN microdnf install --enablerepo=ol9_codeready_builder -y
locale: glibc-langpack-en
<<: *rpm

fedora37:
base: fedora:37
install: RUN dnf install -y
locale: glibc-langpack-en
<<: *rpm

fedora38:
base: fedora:38
install: RUN dnf install -y
locale: glibc-langpack-en
<<: *rpm

ubuntu1804:
Expand Down
6 changes: 0 additions & 6 deletions tool/docker.rb
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,6 @@ def docker(*args)
run_post_install_hook = rebuild_openssl

packages = []
packages << distro.fetch('locale')

packages << distro.fetch('tar')
packages << distro.fetch('specs') if full_test

Expand All @@ -140,12 +138,8 @@ def docker(*args)
"FROM #{distro.fetch('base')}",
*proxy_vars,
[distro.fetch('install'), *packages.compact].join(' '),
*distro.fetch('set-locale'),
]

# Check the locale is properly generated
lines << 'RUN locale -a | grep en_US.utf8'

lines << 'WORKDIR /test'

lines << 'RUN useradd -ms /bin/bash test'
Expand Down