Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C-API bindings for IFRT #164

Draft
wants to merge 89 commits into
base: main
Choose a base branch
from
Draft

C-API bindings for IFRT #164

wants to merge 89 commits into from

Conversation

mofeing
Copy link
Collaborator

@mofeing mofeing commented Oct 4, 2024

This time for real.

  •  IFRT
    • Value
      •  correctly manage it with the RCReference capture map
    • Tuple
      • Unpack
      •  correctly manage it with the RCReference capture map
    • Memory
    • Device
      • Memories
      • Attributes
    • Sharding
      • Sharding::devices
      • add devices argument to ifrt_sharding_with_device_assignment
      • OpaqueSharding
      • ConcreteSharding
      • ConcreteEvenSharding
      • ShardingParamsSharding
      • DeserializeShardingOptions
    • Array
    • Topology
      • implement C API
        •  GetDefaultLayout
        •  Attributes
      • implement Julia API for descriptions (ifrt_topology_device_descriptions)
    • Client
      • add on_done_with_host_buffer argument to ifrt_client_make_array_from_host_buffer
      • add RemapArrays, Attributes, GetAllDevices, GetDefaultDeviceAssignment method
    • HostCallback
    • LoadedHostCallback
    • Executable
      • GetCompiledMemoryStats
      •  GetCostAnalysis
    • LoadedExecutable
      • fix type conversion on the following functions
        • ifrt_loadedexecutable_parameter_shardings
        • ifrt_loadedexecutable_output_shardings
        • ifrt_loadedexecutable_parameter_layouts
        • ifrt_loadedexecutable_output_layouts
      • implement in C-API
        •  GetOutputMemoryKinds
        • GetCostAnalysis
        • Execute
        •  more options for CompileOptions constructor
    • Compiler
      • check that DeserializeExecutableOptions and CompileOptions are really deprecated
    • DType
    • Shape
    • DynamicShape
    • Index
    • IndexDomain
    • MemoryKind
  • IFRT-PjRt backend
    • PjRtArray
    • PjRtClient
      • full interface to PjRtClient::CompileOptions
    • PjRtCompiler
    • PjRtDevice
    • PjRtExecutable
    • PjRtHostSendAndRecvLoadedHostCallback
    • PjRtLoadedExecutable
    • PjRtMemory
    • PjRtTopology
    • PjRtTuple
      •  correctly manage it with the RCReference capture map

This comment was marked as off-topic.

github-actions[bot]

This comment was marked as off-topic.

@mofeing mofeing changed the title Write Bazel rule for IFRT dialect Support IFRT Dec 30, 2024
@mofeing
Copy link
Collaborator Author

mofeing commented Jan 2, 2025

i added some utilities for Julia-like conversion in C++ and better communication between Julia and C.

  • Type<T> is just a placeholder for representing a target type
  • convert converts the passing object to the object we can pass through FFI (it helps a lot deduplicating the code and avoiding bugs)
  • span<T> is a small std::span-like object but for C++ <20 and ABI-stable so we can view it from Julia as a Tuple{Csize, Ptr{Cvoid}}

for example, imagine a function that returns a absl::Span<tsl::RCReference<Device>>. then our C binding would be like:

extern "C" span<Device*> the_function(...) {
    return convert(Type<span<Device*>>(), some_func_that_returns_a_absl_Span(...))
}

conceptually works, in practice a need to add some particular implementations so it works (i'm sure it can be done more generically but i don't have the time nor the energy to learn the terribly complicated C++)

@mofeing mofeing force-pushed the ss/ifrt branch 2 times, most recently from 93f73a3 to c00ac05 Compare January 20, 2025 23:35
@mofeing
Copy link
Collaborator Author

mofeing commented Jan 22, 2025

@ftynse given this is a priority for the distributed working group, do you mind adding this to PRONTOLab/GB-25#1?

@ftynse ftynse mentioned this pull request Jan 24, 2025
6 tasks
@ftynse
Copy link
Collaborator

ftynse commented Jan 24, 2025

Sure, we are also still figuring out the project management stack

@mofeing mofeing marked this pull request as ready for review February 2, 2025 19:10
@mofeing mofeing requested a review from wsmoses February 2, 2025 19:14
Copy link
Collaborator Author

@mofeing mofeing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build fails, so it's not yet ready to merge but it can be reviewed while we fix it

Copy link
Member

@giordano giordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned already on Slack, the changes to ReactantExtra should be done in a separate PR, to be merged before this one, to allow a build on Yggdrasil

src/IFRT/IFRT.jl Outdated
Comment on lines 17 to 22
"""
backend(x)

Returns the `Backend` trait of the given IFRT object.
"""
function backend end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

┌ Error: 1 docstring not included in the manual:
│ 
│     Reactant.IFRT.backend
│ 
│ These are docstrings in the checked modules (configured with the modules keyword)
│ that are not included in canonical @docs or @autodocs blocks.
└ @ Documenter ~/.julia/packages/Documenter/Bs999/src/utilities/utilities.jl:44

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm weird? there's nothing fancy around this. do you have an idea of what can be happening?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You simply need to add this docstring to the documentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, misunderstood the problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump

@giordano
Copy link
Member

giordano commented Feb 7, 2025

More errors in BB:

[07:55:57] ERROR: /workspace/srcdir/Reactant.jl/deps/ReactantExtra/BUILD:355:11: Compiling src/ifrt/Program.cpp failed: (Exit 1): aarch64-apple-darwin20-cc failed: error executing command (from target //:ReactantExtraLib)
[...]
[07:55:57] # Configuration: 733f0a17cb611a25065571f976b63834782b2e1741e54e6f530060798c5acd3e
[07:55:57] # Execution platform: @local_execution_config_platform//:platform
[07:55:57]
[07:55:57] Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
[07:55:57] In file included from src/ifrt/Program.cpp:1:
[07:55:57] In file included from ./src/type_conversion.hpp:8:
[07:55:57] ./src/memory_management.hpp:144:29: error: no member named 'reinterpret_pointer_cast' in namespace 'std'
[07:55:57]   144 |         capture_shared(std::reinterpret_pointer_cast<void>(std::const_pointer_cast<G>(ptr)))
[07:55:57]       |                        ~~~~~^
[07:55:57] ./src/memory_management.hpp:144:58: error: expected '(' for function-style cast or type construction
[07:55:57]   144 |         capture_shared(std::reinterpret_pointer_cast<void>(std::const_pointer_cast<G>(ptr)))
[07:55:57]       |                                                      ~~~~^
[07:55:57] ./src/memory_management.hpp:175:17: error: no member named 'reinterpret_pointer_cast' in namespace 'std'
[07:55:57]   175 |     return std::reinterpret_pointer_cast<T>(get_shared(ptr));
[07:55:57]       |            ~~~~~^
[07:55:57] ./src/memory_management.hpp:175:42: error: 'T' does not refer to a value
[07:55:57]   175 |     return std::reinterpret_pointer_cast<T>(get_shared(ptr));
[07:55:57]       |                                          ^
[07:55:57] ./src/memory_management.hpp:171:19: note: declared here
[07:55:57]   171 | template<typename T>
[07:55:57]       |                   ^
[07:55:57] 4 errors generated.

@mofeing
Copy link
Collaborator Author

mofeing commented Feb 7, 2025

mmm std::reinterpret_pointer_cast was introduced in C++17, which I think is the version we are using in ReactantExtra. do the logs show the used compiler flags?

@giordano
Copy link
Member

giordano commented Feb 7, 2025

Sounds like std::reinterpret_pointer_cast was introduced in SDK 11.3, based on a research in https://github.com/phracker/MacOSX-SDKs and https://github.com/roblabla/MacOSX-SDKs.

#164 (comment) still needs to be resolved though.

@mofeing
Copy link
Collaborator Author

mofeing commented Feb 7, 2025

Sounds like std::reinterpret_pointer_cast was introduced in SDK 11.3, based on a research in https://github.com/phracker/MacOSX-SDKs and https://github.com/roblabla/MacOSX-SDKs.

mmm that explains why it works in my macbook but not on yggy

#164 (comment) still needs to be resolved though.

sure. If it's not giving linking problems, I can set it as just extern "C".


i'm still having this issue though when building it with clang

dlopen(/Users/mofeing/Developer/Reactant.jl/deps/ReactantExtra/bazel-bin/libReactantExtra.so, 0x0001): symbol not found in flat namespace '__ZNSt13exception_ptr31__from_native_exception_pointerEPv'

@giordano
Copy link
Member

giordano commented Feb 7, 2025

i'm still having this issue though when building it with clang

dlopen(/Users/mofeing/Developer/Reactant.jl/deps/ReactantExtra/bazel-bin/libReactantExtra.so, 0x0001): symbol not found in flat namespace '__ZNSt13exception_ptr31__from_native_exception_pointerEPv'

I don't see that symbol in the BinaryBuilder build:

sandbox:${WORKSPACE} # c++filt -_ __ZNSt13exception_ptr31__from_native_exception_pointerEPv
std::exception_ptr::__from_native_exception_pointer(void*)
sandbox:${WORKSPACE} # nm libReactantExtra.dylib | grep __ZNSt13exception_ptr31__from_native_exception_pointerEPv
sandbox:${WORKSPACE} # nm libReactantExtra.dylib | grep exception | /opt/${MACHTYPE}/bin/${MACHTYPE}-c++filt -_
000000000687eec8 t llvm::ARMAttributeParser::ABI_FP_exceptions(llvm::ARMBuildAttrs::AttrType)
000000000687eee4 t llvm::ARMAttributeParser::ABI_FP_user_exceptions(llvm::ARMBuildAttrs::AttrType)
                 U std::exception::what() const
                 U std::exception_ptr::exception_ptr(std::exception_ptr const&)
                 U std::exception_ptr::~exception_ptr()
                 U std::exception_ptr::operator=(std::exception_ptr const&)
                 U std::__1::__assoc_sub_state::set_exception(std::exception_ptr)
                 U std::exception::~exception()
                 U std::current_exception()
                 U std::rethrow_exception(std::exception_ptr)
0000000001d6cf58 t std::exception_ptr std::make_exception_ptr<std::__1::future_error>(std::__1::future_error)
0000000001d073cc t std::exception_ptr std::make_exception_ptr<std::__1::future_error>(std::__1::future_error)
                 U typeinfo for std::exception
000000000845cee0 s llvm::ARMAttributeParser::ABI_FP_exceptions(llvm::ARMBuildAttrs::AttrType)::strings
000000000845cef0 s llvm::ARMAttributeParser::ABI_FP_user_exceptions(llvm::ARMBuildAttrs::AttrType)::strings
                 U ___cxa_allocate_exception
                 U ___cxa_free_exception
                 U _task_get_exception_ports
                 U _task_set_exception_ports

Maybe it's a problem with your toolchain?

@mofeing
Copy link
Collaborator Author

mofeing commented Feb 7, 2025

Maybe it's a problem with your toolchain?

probably, just found out that it links to /usr/lib/libc++.1.dylib but... there is no such file 🫠

this would explain the "mixing between libstdc++ and libc++": there's no such mix, it's just that in my local build, it expects libc++ to provide that symbol and can't link it because it doesn't find it.

i was building with Homebrew's LLVM 19, but trying again with LLVM 16.

@giordano
Copy link
Member

giordano commented Feb 7, 2025

probably, just found out that it links to /usr/lib/libc++.1.dylib but... there is no such file 🫠

That's normal, macOS hides system libraries and header files from the filesystem: https://en.wikipedia.org/wiki/System_Integrity_Protection

@giordano
Copy link
Member

giordano commented Feb 7, 2025

Also: #164 (review)

As mentioned already on Slack, the changes to ReactantExtra should be done in a separate PR, to be merged before this one, to allow a build on Yggdrasil

☝️

@mofeing mofeing changed the title Support IFRT C-API bindings for IFRT Feb 8, 2025
@mofeing
Copy link
Collaborator Author

mofeing commented Feb 8, 2025

As mentioned already on Slack, the changes to ReactantExtra should be done in a separate PR, to be merged before this one, to allow a build on Yggdrasil

i'm leaving this PR just for the changes in ReactantExtra and gonna move the Julia part of the bindings to a new PR

Maybe it's a problem with your toolchain?

it seems like so. using --cc gcc --gcc_host_compiler_path gcc makes the error disappear. still trying to understand what happened these last months because it worked before, but it could be on XLA's side and not on ours.


template<>
void* capture_shared(std::shared_ptr<void> ptr) {
captured_shared_ptr[ptr.get()] = ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use the new wrapper type as we talked about earlier (the global can work, but will cause concurrency issues that won't happen with the wrapper)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, using the new wrapper type solves concurrency issues but AFAIK the logic in Reactant.jl works on single thread as of today right?

since this seems to work as of now, i wanted to get this merged first and then switch to the new holder types on the next PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, Reactant.jl logic isn't limited to a single thread (and user code can allocate in multiple threads)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, then let me prototype sth

mofeing and others added 26 commits February 8, 2025 18:22
* rcref wrapper + fixed some compile errors

* fixed compile errors
* replace capture maps for `Holded` wrapper

* Remove commented code

* Write destructors for concrete sharding classes
@mofeing mofeing marked this pull request as draft February 9, 2025 12:06
@mofeing mofeing added the IFRT label Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants