Skip to content

Add regex unit tests and enable shared linkage in fbcode #78

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 29, 2025

Conversation

larryliu0820
Copy link
Contributor

@larryliu0820 larryliu0820 commented May 26, 2025

Summary:

As titled. We can't use weak symbol in fbcode since by default libraries are built in shared libraries and weak symbol doesn't work well with shared libraries.

Basically the dynamic linker will resolve to the first definition it finds, so strong symbol will be ignored.

There's an environment variable LD_DYNAMIC_WEAK to let ld fallback to the old behavior which respects the strong symbol but that's non-standard and support can be dropped.

See https://man7.org/linux/man-pages/man8/ld.so.8.html

As a result, this PR changes the pattern to be using static initializer to override the create_fallback_regex() function.

                                                                               
                                                                               
                                                                               
                          regex_lookahead.so                                   
                         ┌───────────────────────────────────────────────────┐ 
                         │  ┌────────────────────────────────────┐┌─────────┐│ 
                         │  │   override_regex_fn(fallback_fn)   ││         ││ 
 ┌─────────────┐         │  │                                    ││         ││ 
 │             │         │  │          regex_lookahead.cpp       ││  pcre2  ││ 
 │             │         │  │                                    ││         ││ 
 │             │         │  │                                    ││         ││ 
 │ application ┼─────────►  └────────────────────────────────────┘└─────────┘│ 
 │             │         └──────────────────────────┬────────────────────────┘ 
 │             │                                    │                          
 │             │                                    │                          
 │             │                                    │link to                   
 └─────┬───────┘                                    │                          
       │                  regex.so                  │                          
       │                 ┌──────────────────────────▼────────────────────────┐ 
       │                 │  ┌───────────────┐   ┌──────────────────────────┐ │ 
       │                 │  │               │   │    regex_fn = default    │ │ 
       │                 │  │               │   │    override_regex_fn()   │ │ 
       └─────────────────►  │   regex.h     │   │                          │ │ 
                         │  │               │   │        regex.cpp         │ │ 
                         │  │               │   │                          │ │ 
                         │  │               │   │                          │ │ 
                         │  └───────────────┘   └──────────────────────────┘ │ 
                         └───────────────────────────────────────────────────┘ 
                                                                               
                                                                               

With this setup, an application can link to regex.so (which doesn't have all the pcre2 symbols and thus having a smaller binary size). If the application wants to add lookahead support, they can additionally link to regex_lookahead.so and override the create_fallback_regex() function.

Differential Revision: D75391108

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 26, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75391108

larryliu0820 added a commit that referenced this pull request May 26, 2025
Summary:

As titled. After some investigation it looks like we need `--dynamic-list` to expose the weak symbol `create_fallback_regex` in `regex.so`. Without `--dynamic-list`, linker will only look for locally defined symbol (the weak implementation) and ignore the strong implementation.

This way, when linking `regex_lookahead.so` the exported dynamic symbol can be bound to the strong implementation in `regex_lookahead.cpp`.

This makes the unit tests work in fbcode, where by default we are using `mode/dev` which links everything in shared linkage.

If using `mode/opt` we don't need `--dynamic-list` because the linking order doesn't matter.

Differential Revision: D75391108
@larryliu0820 larryliu0820 force-pushed the export-D75391108 branch 2 times, most recently from fb67eae to ef519a2 Compare May 26, 2025 08:35
larryliu0820 added a commit that referenced this pull request May 26, 2025
Summary:
Pull Request resolved: #78

As titled. After some investigation it looks like we need `--dynamic-list` to expose the weak symbol `create_fallback_regex` in `regex.so`. Without `--dynamic-list`, linker will only look for locally defined symbol (the weak implementation) and ignore the strong implementation.

This way, when linking `regex_lookahead.so` the exported dynamic symbol can be bound to the strong implementation in `regex_lookahead.cpp`.

This makes the unit tests work in fbcode, where by default we are using `mode/dev` which links everything in shared linkage.

If using `mode/opt` we don't need `--dynamic-list` because the linking order doesn't matter.

Differential Revision: D75391108
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75391108

larryliu0820 added a commit that referenced this pull request May 27, 2025
Summary:

As titled. After some investigation it looks like we need `--dynamic-list` to expose the weak symbol `create_fallback_regex` in `regex.so`. Without `--dynamic-list`, linker will only look for locally defined symbol (the weak implementation) and ignore the strong implementation.

This way, when linking `regex_lookahead.so` the exported dynamic symbol can be bound to the strong implementation in `regex_lookahead.cpp`.

This makes the unit tests work in fbcode, where by default we are using `mode/dev` which links everything in shared linkage.

If using `mode/opt` we don't need `--dynamic-list` because the linking order doesn't matter.

Differential Revision: D75391108
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75391108

larryliu0820 added a commit that referenced this pull request May 27, 2025
Summary:
Pull Request resolved: #78

As titled. After some investigation it looks like we need `--dynamic-list` to expose the weak symbol `create_fallback_regex` in `regex.so`. Without `--dynamic-list`, linker will only look for locally defined symbol (the weak implementation) and ignore the strong implementation.

This way, when linking `regex_lookahead.so` the exported dynamic symbol can be bound to the strong implementation in `regex_lookahead.cpp`.

This makes the unit tests work in fbcode, where by default we are using `mode/dev` which links everything in shared linkage.

If using `mode/opt` we don't need `--dynamic-list` because the linking order doesn't matter.

Differential Revision: D75391108
facebook-github-bot pushed a commit that referenced this pull request May 28, 2025
Summary:

As titled. We can't use weak symbol in fbcode since by default libraries are built in shared libraries and weak symbol doesn't work well with shared libraries.

Basically the dynamic linker will resolve to the first definition it finds, so strong symbol will be ignored.

There's an environment variable `LD_DYNAMIC_WEAK` to let `ld` fallback to the old behavior which respects the strong symbol but that's non-standard and support can be dropped.

See https://man7.org/linux/man-pages/man8/ld.so.8.html

As a result, this PR changes the pattern to be using static initializer to override the `create_fallback_regex()` function. 

```
                                                                               
                                                                               
                                                                               
                          regex_lookahead.so                                   
                         ┌───────────────────────────────────────────────────┐ 
                         │  ┌────────────────────────────────────┐┌─────────┐│ 
                         │  │   override_regex_fn(fallback_fn)   ││         ││ 
 ┌─────────────┐         │  │                                    ││         ││ 
 │             │         │  │          regex_lookahead.cpp       ││  pcre2  ││ 
 │             │         │  │                                    ││         ││ 
 │             │         │  │                                    ││         ││ 
 │ application ┼─────────►  └────────────────────────────────────┘└─────────┘│ 
 │             │         └──────────────────────────┬────────────────────────┘ 
 │             │                                    │                          
 │             │                                    │                          
 │             │                                    │link to                   
 └─────┬───────┘                                    │                          
       │                  regex.so                  │                          
       │                 ┌──────────────────────────▼────────────────────────┐ 
       │                 │  ┌───────────────┐   ┌──────────────────────────┐ │ 
       │                 │  │               │   │    regex_fn = default    │ │ 
       │                 │  │               │   │    override_regex_fn()   │ │ 
       └─────────────────►  │   regex.h     │   │                          │ │ 
                         │  │               │   │        regex.cpp         │ │ 
                         │  │               │   │                          │ │ 
                         │  │               │   │                          │ │ 
                         │  └───────────────┘   └──────────────────────────┘ │ 
                         └───────────────────────────────────────────────────┘ 
                                                                               
                                                                               
```

With this setup, an application can link to `regex.so` (which doesn't have all the pcre2 symbols and thus having a smaller binary size). If the application wants to add lookahead support, they can additionally link to `regex_lookahead.so` and override the `create_fallback_regex()` function.

Differential Revision: D75391108
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75391108

@larryliu0820 larryliu0820 requested a review from jackzhxng May 28, 2025 04:44
larryliu0820 added a commit that referenced this pull request May 28, 2025
Summary:

As titled. We can't use weak symbol in fbcode since by default libraries are built in shared libraries and weak symbol doesn't work well with shared libraries.

Basically the dynamic linker will resolve to the first definition it finds, so strong symbol will be ignored.

There's an environment variable `LD_DYNAMIC_WEAK` to let `ld` fallback to the old behavior which respects the strong symbol but that's non-standard and support can be dropped.

See https://man7.org/linux/man-pages/man8/ld.so.8.html

As a result, this PR changes the pattern to be using static initializer to override the `create_fallback_regex()` function. 

```
                                                                               
                                                                               
                                                                               
                          regex_lookahead.so                                   
                         ┌───────────────────────────────────────────────────┐ 
                         │  ┌────────────────────────────────────┐┌─────────┐│ 
                         │  │   override_regex_fn(fallback_fn)   ││         ││ 
 ┌─────────────┐         │  │                                    ││         ││ 
 │             │         │  │          regex_lookahead.cpp       ││  pcre2  ││ 
 │             │         │  │                                    ││         ││ 
 │             │         │  │                                    ││         ││ 
 │ application ┼─────────►  └────────────────────────────────────┘└─────────┘│ 
 │             │         └──────────────────────────┬────────────────────────┘ 
 │             │                                    │                          
 │             │                                    │                          
 │             │                                    │link to                   
 └─────┬───────┘                                    │                          
       │                  regex.so                  │                          
       │                 ┌──────────────────────────▼────────────────────────┐ 
       │                 │  ┌───────────────┐   ┌──────────────────────────┐ │ 
       │                 │  │               │   │    regex_fn = default    │ │ 
       │                 │  │               │   │    override_regex_fn()   │ │ 
       └─────────────────►  │   regex.h     │   │                          │ │ 
                         │  │               │   │        regex.cpp         │ │ 
                         │  │               │   │                          │ │ 
                         │  │               │   │                          │ │ 
                         │  └───────────────┘   └──────────────────────────┘ │ 
                         └───────────────────────────────────────────────────┘ 
                                                                               
                                                                               
```

With this setup, an application can link to `regex.so` (which doesn't have all the pcre2 symbols and thus having a smaller binary size). If the application wants to add lookahead support, they can additionally link to `regex_lookahead.so` and override the `create_fallback_regex()` function.

Differential Revision: D75391108
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75391108

larryliu0820 added a commit that referenced this pull request May 28, 2025
Summary:
Pull Request resolved: #78

As titled. We can't use weak symbol in fbcode since by default libraries are built in shared libraries and weak symbol doesn't work well with shared libraries.

Basically the dynamic linker will resolve to the first definition it finds, so strong symbol will be ignored.

There's an environment variable `LD_DYNAMIC_WEAK` to let `ld` fallback to the old behavior which respects the strong symbol but that's non-standard and support can be dropped.

See https://man7.org/linux/man-pages/man8/ld.so.8.html

As a result, this PR changes the pattern to be using static initializer to override the `create_fallback_regex()` function.

```

                          regex_lookahead.so
                         ┌───────────────────────────────────────────────────┐
                         │  ┌────────────────────────────────────┐┌─────────┐│
                         │  │   override_regex_fn(fallback_fn)   ││         ││
 ┌─────────────┐         │  │                                    ││         ││
 │             │         │  │          regex_lookahead.cpp       ││  pcre2  ││
 │             │         │  │                                    ││         ││
 │             │         │  │                                    ││         ││
 │ application ┼─────────►  └────────────────────────────────────┘└─────────┘│
 │             │         └──────────────────────────┬────────────────────────┘
 │             │                                    │
 │             │                                    │
 │             │                                    │link to
 └─────┬───────┘                                    │
       │                  regex.so                  │
       │                 ┌──────────────────────────▼────────────────────────┐
       │                 │  ┌───────────────┐   ┌──────────────────────────┐ │
       │                 │  │               │   │    regex_fn = default    │ │
       │                 │  │               │   │    override_regex_fn()   │ │
       └─────────────────►  │   regex.h     │   │                          │ │
                         │  │               │   │        regex.cpp         │ │
                         │  │               │   │                          │ │
                         │  │               │   │                          │ │
                         │  └───────────────┘   └──────────────────────────┘ │
                         └───────────────────────────────────────────────────┘

```

With this setup, an application can link to `regex.so` (which doesn't have all the pcre2 symbols and thus having a smaller binary size). If the application wants to add lookahead support, they can additionally link to `regex_lookahead.so` and override the `create_fallback_regex()` function.

Differential Revision: D75391108
facebook-github-bot pushed a commit that referenced this pull request May 28, 2025
Summary:

As titled. We can't use weak symbol in fbcode since by default libraries are built in shared libraries and weak symbol doesn't work well with shared libraries.

Basically the dynamic linker will resolve to the first definition it finds, so strong symbol will be ignored.

There's an environment variable `LD_DYNAMIC_WEAK` to let `ld` fallback to the old behavior which respects the strong symbol but that's non-standard and support can be dropped.

See https://man7.org/linux/man-pages/man8/ld.so.8.html

As a result, this PR changes the pattern to be using static initializer to override the `create_fallback_regex()` function. 

```
                                                                               
                                                                               
                                                                               
                          regex_lookahead.so                                   
                         ┌───────────────────────────────────────────────────┐ 
                         │  ┌────────────────────────────────────┐┌─────────┐│ 
                         │  │   override_regex_fn(fallback_fn)   ││         ││ 
 ┌─────────────┐         │  │                                    ││         ││ 
 │             │         │  │          regex_lookahead.cpp       ││  pcre2  ││ 
 │             │         │  │                                    ││         ││ 
 │             │         │  │                                    ││         ││ 
 │ application ┼─────────►  └────────────────────────────────────┘└─────────┘│ 
 │             │         └──────────────────────────┬────────────────────────┘ 
 │             │                                    │                          
 │             │                                    │                          
 │             │                                    │link to                   
 └─────┬───────┘                                    │                          
       │                  regex.so                  │                          
       │                 ┌──────────────────────────▼────────────────────────┐ 
       │                 │  ┌───────────────┐   ┌──────────────────────────┐ │ 
       │                 │  │               │   │    regex_fn = default    │ │ 
       │                 │  │               │   │    override_regex_fn()   │ │ 
       └─────────────────►  │   regex.h     │   │                          │ │ 
                         │  │               │   │        regex.cpp         │ │ 
                         │  │               │   │                          │ │ 
                         │  │               │   │                          │ │ 
                         │  └───────────────┘   └──────────────────────────┘ │ 
                         └───────────────────────────────────────────────────┘ 
                                                                               
                                                                               
```

With this setup, an application can link to `regex.so` (which doesn't have all the pcre2 symbols and thus having a smaller binary size). If the application wants to add lookahead support, they can additionally link to `regex_lookahead.so` and override the `create_fallback_regex()` function.

Differential Revision: D75391108
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75391108

Copy link
Contributor

@jackzhxng jackzhxng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, few nits. Thanks for the detailed PR description and diagram

Summary:

As titled. We can't use weak symbol in fbcode since by default libraries are built in shared libraries and weak symbol doesn't work well with shared libraries.

Basically the dynamic linker will resolve to the first definition it finds, so strong symbol will be ignored.

There's an environment variable `LD_DYNAMIC_WEAK` to let `ld` fallback to the old behavior which respects the strong symbol but that's non-standard and support can be dropped.

See https://man7.org/linux/man-pages/man8/ld.so.8.html

As a result, this PR changes the pattern to be using static initializer to override the `create_fallback_regex()` function. 

```
                                                                               
                                                                               
                                                                               
                          regex_lookahead.so                                   
                         ┌───────────────────────────────────────────────────┐ 
                         │  ┌────────────────────────────────────┐┌─────────┐│ 
                         │  │   override_regex_fn(fallback_fn)   ││         ││ 
 ┌─────────────┐         │  │                                    ││         ││ 
 │             │         │  │          regex_lookahead.cpp       ││  pcre2  ││ 
 │             │         │  │                                    ││         ││ 
 │             │         │  │                                    ││         ││ 
 │ application ┼─────────►  └────────────────────────────────────┘└─────────┘│ 
 │             │         └──────────────────────────┬────────────────────────┘ 
 │             │                                    │                          
 │             │                                    │                          
 │             │                                    │link to                   
 └─────┬───────┘                                    │                          
       │                  regex.so                  │                          
       │                 ┌──────────────────────────▼────────────────────────┐ 
       │                 │  ┌───────────────┐   ┌──────────────────────────┐ │ 
       │                 │  │               │   │    regex_fn = default    │ │ 
       │                 │  │               │   │    override_regex_fn()   │ │ 
       └─────────────────►  │   regex.h     │   │                          │ │ 
                         │  │               │   │        regex.cpp         │ │ 
                         │  │               │   │                          │ │ 
                         │  │               │   │                          │ │ 
                         │  └───────────────┘   └──────────────────────────┘ │ 
                         └───────────────────────────────────────────────────┘ 
                                                                               
                                                                               
```

With this setup, an application can link to `regex.so` (which doesn't have all the pcre2 symbols and thus having a smaller binary size). If the application wants to add lookahead support, they can additionally link to `regex_lookahead.so` and override the `create_fallback_regex()` function.

Reviewed By: billmguo

Differential Revision: D75391108
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75391108

@facebook-github-bot facebook-github-bot merged commit ea21dc7 into main May 29, 2025
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants