GitHub - iobnc/e2k-ports: Performance patches and build fixes for Elbrus 2000 (e2k) architecture.

e2k-ports

Performance patches and build fixes for Elbrus (e2k) architecture.

This is my personal repository so that patches won't get lost.

Elbrus porting cheat sheet:

Elbrus 2000 (aka e2k) is a 64-bit little-endian architecture.
The compiler is mostly GCC compatible (defines __GNUC__), EDG frontend.

detection

shell: uname -m returns e2k
cmake: if({CMAKE_SYSTEM_PROCESSOR} STREQUAL "e2k")
C preprocessor: if defined(__e2k__)
compiler version: if __LCC__ = 125 and __LCC_MINOR__ = 9 then it's "LCC 1.25.09"
architecture version: defined in __iset__ (less than 3 is obsolete, 6 is the latest at the moment)

intrinsics

MMX, SSE2, SSSE3, SSE4.1* - native support
AVX, AVX2 - supported, but not recommended, uses too much CPU registers
SSE4.2 and _mm_dp_ps (from SSE4.1) - emulated, slow, do not use

The compiler enables MMX to AVX2 support by default, pass -mno-avx (-mno-sse4.2) if code depends on the presence of macros (e.g. #if defined(__AVX2__)).

builtins

__sync*, __atomic* - supported by the compiler
count leading/trailing zeros - supported (__builtin_clz, __builtin_ctz)
memory fence - supported (need to include x86intrin.h first)
- __builtin_ia32_mfence, __builtin_ia32_lfence, __builtin_ia32_sfence

cpuid

Use compile time CPU detection, select the best SIMD up to SSE4.1.

rdtsc

#include <x86intrin.h>
uint64_t time = __rdtsc();
// same: unsigned aux; uint64_t time = __rdtscp(&aux);

useful pragmas

_Pragma("name") - to use from macros.

Use before the loop:

#pragma ivdep - ignore data dependencies inside the loop
#pragma unroll(n) - unroll cycle N times

restrict

Using the restrict keyword is good for performance, but note that it is ignored by the LCC if you're using vector load/store intrinsics such as _mm_load_si128(). For code with vector intrinsics use #pragma ivdep.

makecontext

Instead of makecontext(ctx, ...) use makecontext_e2k(ctx, ...), returns a negative integer on error. Allocates extra resources that need to be freed using freecontext_e2k(ctx).

nop

Use __asm__ __volatile__ ("nop") or _mm_pause() for a little delay.

clearing the instruction cache

The GNUC standard function __clear_cache(char *begin, char *end) works correctly since LCC 1.25.18, LCC 1.26.04. This function is available in previous versions, but does nothing.

inline

If it's crucial to performance, then use __attribute__((__always_inline__)) inline rather than just inline. Because when using large or complicated inline functions, the LCC compiler may decide not to inline them.

avoid if possible

The GNUC C extension Labels as Values is available in the LCC, but performance is worse than using a simple switch/case.

The GNUC Vector Extension is also available in LCC, but poorly implemented and its performance is very bad.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
Python.md		Python.md
README.md		README.md
aften-0.0.8-e2k.patch		aften-0.0.8-e2k.patch
benchmark-1.7.1-e2k.patch		benchmark-1.7.1-e2k.patch
blender-3.6.2-e2k.patch		blender-3.6.2-e2k.patch
boost-1.76-e2k.patch		boost-1.76-e2k.patch
cppcrypto-0.17-e2k.patch		cppcrypto-0.17-e2k.patch
emacs-28.2-e2k.patch		emacs-28.2-e2k.patch
embree-4.3.0-e2k.patch		embree-4.3.0-e2k.patch
ffmpeg-4.4-e2k.patch		ffmpeg-4.4-e2k.patch
ffmpeg-6.1-e2k.patch		ffmpeg-6.1-e2k.patch
fftw-3.3.8-e2k.patch		fftw-3.3.8-e2k.patch
glib-2.70.3-e2k.patch		glib-2.70.3-e2k.patch
kicad-6.0.0-e2k.patch		kicad-6.0.0-e2k.patch
libaom-3.0.0-e2k.patch		libaom-3.0.0-e2k.patch
libbotan-2.19.1-e2k.patch		libbotan-2.19.1-e2k.patch
libdc1394-2.2.5-e2k.patch		libdc1394-2.2.5-e2k.patch
libffcall-2.4-e2k.patch		libffcall-2.4-e2k.patch
libffi-3.4.2-e2k.patch		libffi-3.4.2-e2k.patch
libgc-7.6.8-e2k.patch		libgc-7.6.8-e2k.patch
libgc-8.0.2-e2k.patch		libgc-8.0.2-e2k.patch
libjpeg-turbo-3.0.2-e2k.patch		libjpeg-turbo-3.0.2-e2k.patch
libpcl-1.12-e2k.patch		libpcl-1.12-e2k.patch
libpng-1.6.37-e2k.patch		libpng-1.6.37-e2k.patch
libvpx-1.11.0-e2k.patch		libvpx-1.11.0-e2k.patch
manticore-3.6.0-e2k.patch		manticore-3.6.0-e2k.patch
mysql-8.0.35-e2k.patch		mysql-8.0.35-e2k.patch
onetbb-2021.9-e2k.patch		onetbb-2021.9-e2k.patch
openblas-0.3.19-e2k.patch		openblas-0.3.19-e2k.patch
opencv-4.7.0-e2k.patch		opencv-4.7.0-e2k.patch
php-8.1.1-e2k.patch		php-8.1.1-e2k.patch
postgres-e2k.patch		postgres-e2k.patch
qt5-base-5.15.2-e2k.patch		qt5-base-5.15.2-e2k.patch
qt6-base-6.4.2-e2k.patch		qt6-base-6.4.2-e2k.patch
tbb-2020.3-e2k.patch		tbb-2020.3-e2k.patch
telegram-desktop-4.3.1-e2k.patch		telegram-desktop-4.3.1-e2k.patch
vdo-6.2.4-e2k.patch		vdo-6.2.4-e2k.patch
webkitgtk-2.34.3-e2k.patch		webkitgtk-2.34.3-e2k.patch
x264-164-e2k.patch		x264-164-e2k.patch
zlib-ng-2.1.4-e2k.patch		zlib-ng-2.1.4-e2k.patch
zstd-1.5.2_e2k.patch		zstd-1.5.2_e2k.patch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

e2k-ports

Elbrus porting cheat sheet:

detection

intrinsics

builtins

cpuid

rdtsc

useful pragmas

restrict

makecontext

nop

clearing the instruction cache

inline

avoid if possible

About

Releases

Packages

iobnc/e2k-ports

Folders and files

Latest commit

History

Repository files navigation

e2k-ports

Elbrus porting cheat sheet:

detection

intrinsics

builtins

cpuid

rdtsc

useful pragmas

restrict

makecontext

nop

clearing the instruction cache

inline

avoid if possible

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages