Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpp11 unwind_protect() + ALTREP conflicts #274

Open
DavisVaughan opened this issue Jun 3, 2022 · 4 comments
Open

cpp11 unwind_protect() + ALTREP conflicts #274

DavisVaughan opened this issue Jun 3, 2022 · 4 comments

Comments

@DavisVaughan
Copy link
Member

CC @jennybc and @sbearrows

This is a long post, and is intended to outline a rather complex problem in a way that we can come back to in the future. The short story is this:

If you are writing ALTREP methods, such as an _Elt method, then you currently have to be extremely careful about using any C++ or cpp11 inside those methods. This is because your methods can be called directly by base R's C code, which means it can be called without being wrapped in a try/catch block, unlike the standard methods of calling cpp11 code, which goes through .Call and is automatically wrapped in a try/catch by BEGIN_CPP11 and END_CPP11. In particular, if your ALTREP method calls something that triggers unwind_protect() and a longjmp() occurs inside the code that unwind_protect() calls, then R will crash because unwind_protect() will throw an unwind_exception that can't be caught. A practical example of this is calling cpp11::stop() inside your ALTREP method.

The problem

Ok, now for a more detailed explanation. First off, you can reproduce what I am going to discuss yourself using this minimal repo https://github.com/DavisVaughan/cpp11altrepbug

pak::pak("DavisVaughan/cpp11altrepbug")

Then call cpp11altrepbug::works() and cpp11altrepbug::fails(). The fails() call should nuke your R session.

I recommend calling fails() from a terminal R session (not in RStudio), which gives you this useful error message:

> x <- cpp11altrepbug::fails()
> x[1]
Error: oh no
libc++abi: terminating with uncaught exception of type cpp11::unwind_exception: std::exception
Abort trap: 6

Here is what is happening:

There is an ALTREP type named fails_t. It has an _Elt method that simply calls cpp11::stop(). The following chain of events causes that combination to nuke R.

  • The [ function is called when x[1] is run
  • R's internal [ C code sees that x is an ALTREP object. It calls out to fails_Elt() and runs that. Note that this is NOT our typical cpp11 use case. We have not gone through .Call() here to get to our cpp11 code, so there is no try/catch set up for us.
  • fails_Elt() calls cpp11::stop()
  • cpp11::stop() is set up as safe[Rf_errorcall](). The safe bit forces unwind_protect() to be called, and unwind_protect() will be put in charge of calling Rf_errorcall()
  • Before actually calling Rf_errorcall(), unwind_protect() sets up a place to jump back to if a longjmp() is triggered when Rf_errorcall() is run. It does this with setjmp(), which is the place to jump back to. When a jump happens, we return to the place where setjmp() was called, it returns true, causing that if branch to run which throws an unwind exception. Cpp11 makes the assumption that unwind_protect() will only be called from cpp11 code that has gone through the "normal" path of execution - i.e. through .Call(), so that BEGIN_CPP11 and END_CPP11 are set up to catch that unwind exception.
    • The setjmp() and throw location:
      if (setjmp(jmpbuf)) {
      should_unwind_protect = TRUE;
      throw unwind_exception(token);
      }
    • The definitions of BEGIN_CPP11 and END_CPP11:
      #define BEGIN_CPP11 \
      SEXP err = R_NilValue; \
      char buf[CPP11_ERROR_BUFSIZE] = ""; \
      try {
      #define END_CPP11 \
      } \
      catch (cpp11::unwind_exception & e) { \
      err = e.token; \
      } \
      catch (std::exception & e) { \
      strncpy(buf, e.what(), sizeof(buf) - 1); \
      } \
      catch (...) { \
      strncpy(buf, "C++ error (unknown cause)", sizeof(buf) - 1); \
      } \
      if (buf[0] != '\0') { \
      Rf_errorcall(R_NilValue, "%s", buf); \
      } else if (err != R_NilValue) { \
      CPP11_UNWIND \
      } \
      return R_NilValue;
  • When Rf_errorcall() is eventually called, that triggers a longjmp(), so we jump back to the setjmp() location and do throw unwind_exception(token)
  • BUT this happened outside of a try/catch block because of how it was called through base R's C code. The uncaught exception causes R to crash.

Potential solution

The basic idea of the solution is that somehow we need to ensure that all ALTREP methods that we write in C++ with cpp11 are wrapped in a try/catch.

After some thinking, we think that this problem of "someone else" calling our cpp11 code from C can really only occur through ALTREP. That is likely the only place where base R lets us "hook into" their C code. So it is ok if our solution is very specific to ALTREP.

The fix we have come up with is to suggest that cpp11 should provide wrappers around R's ALTREP registration methods, like R_set_altreal_Elt_method(class, fun), that "wrap" the fun with something like:

double wrapper_Elt(SEXP x, R_xlen_t i) {
  BEGIN_CPP11
    return fun(x, i);
  END_CPP11 
}

You'd then register the ALTREP method with something like cpp11::set_altreal_Elt_method(class, fun).

For that to work, END_CPP11 would have to be tweaked because it currently returns R_NilValue unconditionally, which won't work here since the return type is double. So possibly END_CPP11 shouldn't handle the return part, so we can do return 0; manually at the end of this wrapper (which should never be called).

Backstory

First found in vroom, which makes great use of ALTREP. This crash could be triggered by catching the warning thrown by warn_for_errors() and promoting it to an error, triggering the longjmp() from inside unwind_protect() while we aren't in a try/catch. To make the crash occur, you also have to call rlang::warn() inside warn_for_errors () through the cpp11::package() machinery to make sure unwind_protect() was used (rather than what it currently does, which is to only use base R machinery to avoid usage of unwind_protect()).

Discussed here:
tidyverse/vroom#441 (comment)

Which links to this original commit, which mentions that we don't want to use unwind protect here:
tidyverse/vroom@984a3e5

@DavisVaughan
Copy link
Member Author

Partly related to #219

But note that calling cpp11::stop() is not the only way this can occur.

The original vroom issue was calling cpp11::package("rlang")["warn"] and then on the R side we were promoting that to an rlang::abort() by catching the class warning with withCallingHandlers(), so just fixing #219 wouldn't be enough.

@paleolimbot
Copy link
Contributor

Alternatively, you could also just make sure that everything in the frame is trivially destructible and does not throw exceptions? If those two conditions are met, you can just use the R API directly (although it sounds like vroom's methods are doing a few more complex things).

In Arrow we have the advantage that Arrow C++ does not use exceptions for error handling, so perhaps this is only a reasonable option for us 🙂 . (Or maybe my approach in apache/arrow#14271 is very very wrong!)

@DavisVaughan
Copy link
Member Author

cpp11 is the one throwing the exception though?

@paleolimbot
Copy link
Contributor

Yes...part of what I did in that PR is to not use cpp11 in ALTREP methods. I didn't find it all that hard to avoid but perhaps that is unique to Arrow's ALTREP classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants