-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the construction of SimpleArray from slicing ndarray #438
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of allocating the contiguous memory and copying the elements from py::array, I try to fix the bug by passing the stride vector.
for (ssize_t i = 0; i < arr_in.ndim(); ++i) | ||
{ | ||
shape.push_back(arr_in.shape(i)); | ||
stride.push_back(arr_in.strides(i) / itemsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit of the py::array::strides()
is bytes, and the unit of the SimpleArray::strides()
is the elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is clear enough. It is correct to use no comment.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
explicit SimpleArray(small_vector<size_t> const & shape, small_vector<size_t> const & stride, std::shared_ptr<buffer_type> const & buffer) | ||
: SimpleArray(buffer) | ||
{ | ||
if (buffer) | ||
{ | ||
m_shape = shape; | ||
m_stride = stride; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I create this function according to the function explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer)
, but I'm not sure whether it would create the potential bug without checking nbytes == buffer->nbytes()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check the shape and stride are within the buffer range, in the same way it is done in the overload explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer);
tests/test_buffer.py
Outdated
def test_SimpleArray_from_ndarray_slice(self): | ||
ndarr = np.arange(1000, dtype='float64').reshape((10, 10, 10)) | ||
parr = ndarr[1:7, 2:6, 3:9] | ||
sarr = modmesh.SimpleArrayFloat64(array=ndarr[1:7, 2:6, 3:9]) | ||
for i in range(6): | ||
for j in range(4): | ||
for k in range(6): | ||
self.assertEqual(parr[i, j, k], sarr[i, j, k]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the unittest of constructing SimpleArray from slicing ndarray. When the dimension is less than 4, the constructor should work regularly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why you filed issue #437.
@yungyuc @tigercosmos Could you please review the pull request? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of allocating the contiguous memory and copying the elements from
py::array
, I try to fix the bug by passing the stride vector.
This is a good fix. When taking a memory buffer, SimpleArray
should reuse instead of copying it. Sanity check should be done during the simple array construction so that the array operating code can follow the assumption made by the class template SimpleArray
.
Some enhancements are needed in the PR:
-
SimpleArray.hpp:310
: Check the shape and stride are within the buffer range, in the same way it is done in the overloadexplicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer);
-
wrap_SimpleArray.cpp:89
: The wrapper should check if the input array is contiguous. You can use the ndarray flags and/or the source striding value.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
explicit SimpleArray(small_vector<size_t> const & shape, small_vector<size_t> const & stride, std::shared_ptr<buffer_type> const & buffer) | ||
: SimpleArray(buffer) | ||
{ | ||
if (buffer) | ||
{ | ||
m_shape = shape; | ||
m_stride = stride; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check the shape and stride are within the buffer range, in the same way it is done in the overload explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer);
tests/test_buffer.py
Outdated
def test_SimpleArray_from_ndarray_slice(self): | ||
ndarr = np.arange(1000, dtype='float64').reshape((10, 10, 10)) | ||
parr = ndarr[1:7, 2:6, 3:9] | ||
sarr = modmesh.SimpleArrayFloat64(array=ndarr[1:7, 2:6, 3:9]) | ||
for i in range(6): | ||
for j in range(4): | ||
for k in range(6): | ||
self.assertEqual(parr[i, j, k], sarr[i, j, k]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why you filed issue #437.
for (ssize_t i = 0; i < arr_in.ndim(); ++i) | ||
{ | ||
shape.push_back(arr_in.shape(i)); | ||
stride.push_back(arr_in.strides(i) / itemsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is clear enough. It is correct to use no comment.
} | ||
std::shared_ptr<ConcreteBuffer> const buffer = ConcreteBuffer::construct( | ||
arr_in.nbytes(), | ||
arr_in.mutable_data(), | ||
std::make_unique<ConcreteBufferNdarrayRemover>(arr_in)); | ||
return wrapped_type(shape, buffer); | ||
return wrapped_type(shape, stride, buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrapper should check if the input array is contiguous. You can use the ndarray flags and/or the source striding value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should the wrapper check if the input array is contiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because only the wrapper sees the PyObject
of the incoming nearray
, on which the contiguous bit is available. When the buffer pointer is passed in the SimpleArray
C++ code, no information of memory contiguity is available. The constructor can only trust the shape and stride passed in blindly.
Checking for contiguity in the wrapper is more robust.
1c47b73
to
653d264
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
SimpleArray.hpp:310
: Check the shape and stride are within the buffer range, in the same way it is done in the overloadexplicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer);
-
wrap_SimpleArray.cpp:89
: The wrapper should check if the input array is contiguous. You can use the ndarray flags and/or the source striding value.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
const size_t nbytes = std::accumulate(m_shape.begin(), | ||
m_shape.end(), | ||
static_cast<size_t>(1), | ||
std::multiplies<size_t>()) * | ||
ITEMSIZE; | ||
if (nbytes != buffer->nbytes()) | ||
{ | ||
throw std::runtime_error(Formatter() << "SimpleArray: shape byte count " << nbytes | ||
<< " differs from buffer " << buffer->nbytes()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the nbytes()
.
} | ||
std::shared_ptr<ConcreteBuffer> const buffer = ConcreteBuffer::construct( | ||
arr_in.nbytes(), | ||
arr_in.mutable_data(), | ||
std::make_unique<ConcreteBufferNdarrayRemover>(arr_in)); | ||
return wrapped_type(shape, buffer); | ||
return is_c_contiguous ? wrapped_type(shape, buffer) : wrapped_type(shape, stride, buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the array is c contiguous, the wrapper use the original constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f_contiguous also needs to be handled.
parr = ndarr[1:7:3, 6:2:-1, 3:9] | ||
sarr = modmesh.SimpleArrayFloat64(array=ndarr[1:7:3, 6:2:-1, 3:9]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modify the step to make the testcase complicated.
653d264
to
da3f887
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more point to address.
-
wrap_SimpleArray.cpp:90
: f_contiguous also needs to be handled.
} | ||
std::shared_ptr<ConcreteBuffer> const buffer = ConcreteBuffer::construct( | ||
arr_in.nbytes(), | ||
arr_in.mutable_data(), | ||
std::make_unique<ConcreteBufferNdarrayRemover>(arr_in)); | ||
return wrapped_type(shape, buffer); | ||
return is_c_contiguous ? wrapped_type(shape, buffer) : wrapped_type(shape, stride, buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f_contiguous also needs to be handled.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
m_shape.end(), | ||
static_cast<size_t>(1), | ||
std::multiplies<size_t>()) * | ||
ITEMSIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the expression look nicer by placing ITEMSIZE
before std::accumulate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea.
da3f887
to
756010a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Check the F contiguous array.
tests/test_buffer.py
Outdated
def test_SimpleArray_from_ndarray_transpose(self): | ||
ndarr = np.arange(350, dtype='float64').reshape((5, 7, 10)) | ||
# The following array is F contiguous. | ||
parr = ndarr[2:4].T | ||
sarr = modmesh.SimpleArrayFloat64(array=ndarr[2:4].T) | ||
|
||
for i in range(10): | ||
for j in range(7): | ||
for k in range(2): | ||
self.assertEqual(parr[i, j, k], sarr[i, j, k]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unittest for F contiguous ndarray.
if (is_c_contiguous) | ||
{ | ||
if (stride[stride.size() - 1] != 1) | ||
{ | ||
throw std::runtime_error("SimpleArray: C contiguous stride must end with 1"); | ||
} | ||
for (size_t it = 0; it < shape.size() - 1; ++it) | ||
{ | ||
if (stride[it] != shape[it + 1] * stride[it + 1]) | ||
{ | ||
throw std::runtime_error("SimpleArray: C contiguous stride must match shape"); | ||
} | ||
} | ||
} | ||
if (is_f_contiguous) | ||
{ | ||
if (stride[0] != 1) | ||
{ | ||
throw std::runtime_error("SimpleArray: Fortran contiguous stride must start with 1"); | ||
} | ||
for (size_t it = 0; it < shape.size() - 1; ++it) | ||
{ | ||
if (stride[it + 1] != shape[it] * stride[it]) | ||
{ | ||
throw std::runtime_error("SimpleArray: Fortran contiguous stride must match shape"); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the C contiguous and F contiguous array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation is good.
We should refactor it to use a distinct helper (class or function) in a later PR (may use a separate issue to track).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way I have a questions:
- How could we pass and receive the
bool is_c_contiguous
andbool is_f_contiguous
elegantly?
I have considered passing the numpy flags directly, but the constructor may not only deal with the numpy array. Is it a good way to design the customed flags byenum class types
to deal with every possible input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the class template SimpleArray
needs to know f/c continuity. The sanity check may happen in the Python wrapper which has the information from Numpy. Standalone helper (static or free function) allows you to check in C++ and pybind11 wrapper. It could make sense to create a stride object to organize the logics.
It is ok to add more metadata in SimpleArray
, including f/c continuity, but it is a separate topic.
756010a
to
8470ec1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
if (is_c_contiguous) | ||
{ | ||
if (stride[stride.size() - 1] != 1) | ||
{ | ||
throw std::runtime_error("SimpleArray: C contiguous stride must end with 1"); | ||
} | ||
for (size_t it = 0; it < shape.size() - 1; ++it) | ||
{ | ||
if (stride[it] != shape[it + 1] * stride[it + 1]) | ||
{ | ||
throw std::runtime_error("SimpleArray: C contiguous stride must match shape"); | ||
} | ||
} | ||
} | ||
if (is_f_contiguous) | ||
{ | ||
if (stride[0] != 1) | ||
{ | ||
throw std::runtime_error("SimpleArray: Fortran contiguous stride must start with 1"); | ||
} | ||
for (size_t it = 0; it < shape.size() - 1; ++it) | ||
{ | ||
if (stride[it + 1] != shape[it] * stride[it]) | ||
{ | ||
throw std::runtime_error("SimpleArray: Fortran contiguous stride must match shape"); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation is good.
We should refactor it to use a distinct helper (class or function) in a later PR (may use a separate issue to track).
explicit SimpleArray(small_vector<size_t> const & shape, | ||
small_vector<size_t> const & stride, | ||
std::shared_ptr<buffer_type> const & buffer, | ||
bool is_c_contiguous = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use an enum here? @ThreeMonth03
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized I could use an enum class inside the class SimpleArray
after I asked the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ThreeMonth03 Maybe you can consider opening another PR to fix this.
In this pull request, I fix the issue #432 by passing the stride vector to SimpleArray.