Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the construction of SimpleArray from slicing ndarray #438

Merged
merged 1 commit into from
Nov 30, 2024

Conversation

ThreeMonth03
Copy link
Collaborator

In this pull request, I fix the issue #432 by passing the stride vector to SimpleArray.

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of allocating the contiguous memory and copying the elements from py::array, I try to fix the bug by passing the stride vector.

for (ssize_t i = 0; i < arr_in.ndim(); ++i)
{
shape.push_back(arr_in.shape(i));
stride.push_back(arr_in.strides(i) / itemsize);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit of the py::array::strides() is bytes, and the unit of the SimpleArray::strides() is the elements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is clear enough. It is correct to use no comment.

Comment on lines 310 to 369
explicit SimpleArray(small_vector<size_t> const & shape, small_vector<size_t> const & stride, std::shared_ptr<buffer_type> const & buffer)
: SimpleArray(buffer)
{
if (buffer)
{
m_shape = shape;
m_stride = stride;
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I create this function according to the function explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer), but I'm not sure whether it would create the potential bug without checking nbytes == buffer->nbytes().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check the shape and stride are within the buffer range, in the same way it is done in the overload explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer);

Comment on lines 453 to 461
def test_SimpleArray_from_ndarray_slice(self):
ndarr = np.arange(1000, dtype='float64').reshape((10, 10, 10))
parr = ndarr[1:7, 2:6, 3:9]
sarr = modmesh.SimpleArrayFloat64(array=ndarr[1:7, 2:6, 3:9])
for i in range(6):
for j in range(4):
for k in range(6):
self.assertEqual(parr[i, j, k], sarr[i, j, k])
Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the unittest of constructing SimpleArray from slicing ndarray. When the dimension is less than 4, the constructor should work regularly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why you filed issue #437.

@ThreeMonth03
Copy link
Collaborator Author

Instead of allocating the contiguous memory and copying the elements from py::array, I try to fix the bug by passing the stride vector.

@yungyuc @tigercosmos Could you please review the pull request?

@yungyuc yungyuc changed the title Fix: Fix the construction of SimpleArray from slicing ndarray Fix the construction of SimpleArray from slicing ndarray Nov 29, 2024
Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of allocating the contiguous memory and copying the elements from py::array, I try to fix the bug by passing the stride vector.

This is a good fix. When taking a memory buffer, SimpleArray should reuse instead of copying it. Sanity check should be done during the simple array construction so that the array operating code can follow the assumption made by the class template SimpleArray.

Some enhancements are needed in the PR:

Comment on lines 310 to 369
explicit SimpleArray(small_vector<size_t> const & shape, small_vector<size_t> const & stride, std::shared_ptr<buffer_type> const & buffer)
: SimpleArray(buffer)
{
if (buffer)
{
m_shape = shape;
m_stride = stride;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check the shape and stride are within the buffer range, in the same way it is done in the overload explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer);

Comment on lines 453 to 461
def test_SimpleArray_from_ndarray_slice(self):
ndarr = np.arange(1000, dtype='float64').reshape((10, 10, 10))
parr = ndarr[1:7, 2:6, 3:9]
sarr = modmesh.SimpleArrayFloat64(array=ndarr[1:7, 2:6, 3:9])
for i in range(6):
for j in range(4):
for k in range(6):
self.assertEqual(parr[i, j, k], sarr[i, j, k])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why you filed issue #437.

for (ssize_t i = 0; i < arr_in.ndim(); ++i)
{
shape.push_back(arr_in.shape(i));
stride.push_back(arr_in.strides(i) / itemsize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is clear enough. It is correct to use no comment.

}
std::shared_ptr<ConcreteBuffer> const buffer = ConcreteBuffer::construct(
arr_in.nbytes(),
arr_in.mutable_data(),
std::make_unique<ConcreteBufferNdarrayRemover>(arr_in));
return wrapped_type(shape, buffer);
return wrapped_type(shape, stride, buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper should check if the input array is contiguous. You can use the ndarray flags and/or the source striding value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should the wrapper check if the input array is contiguous?

Copy link
Member

@yungyuc yungyuc Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because only the wrapper sees the PyObject of the incoming nearray, on which the contiguous bit is available. When the buffer pointer is passed in the SimpleArray C++ code, no information of memory contiguity is available. The constructor can only trust the shape and stride passed in blindly.

Checking for contiguity in the wrapper is more robust.

@yungyuc yungyuc added bug Something isn't working array Multi-dimensional array implementation labels Nov 29, 2024
Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 319 to 364
const size_t nbytes = std::accumulate(m_shape.begin(),
m_shape.end(),
static_cast<size_t>(1),
std::multiplies<size_t>()) *
ITEMSIZE;
if (nbytes != buffer->nbytes())
{
throw std::runtime_error(Formatter() << "SimpleArray: shape byte count " << nbytes
<< " differs from buffer " << buffer->nbytes());
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the nbytes().

}
std::shared_ptr<ConcreteBuffer> const buffer = ConcreteBuffer::construct(
arr_in.nbytes(),
arr_in.mutable_data(),
std::make_unique<ConcreteBufferNdarrayRemover>(arr_in));
return wrapped_type(shape, buffer);
return is_c_contiguous ? wrapped_type(shape, buffer) : wrapped_type(shape, stride, buffer);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the array is c contiguous, the wrapper use the original constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f_contiguous also needs to be handled.

Comment on lines +455 to +456
parr = ndarr[1:7:3, 6:2:-1, 3:9]
sarr = modmesh.SimpleArrayFloat64(array=ndarr[1:7:3, 6:2:-1, 3:9])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modify the step to make the testcase complicated.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more point to address.

}
std::shared_ptr<ConcreteBuffer> const buffer = ConcreteBuffer::construct(
arr_in.nbytes(),
arr_in.mutable_data(),
std::make_unique<ConcreteBufferNdarrayRemover>(arr_in));
return wrapped_type(shape, buffer);
return is_c_contiguous ? wrapped_type(shape, buffer) : wrapped_type(shape, stride, buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f_contiguous also needs to be handled.

m_shape.end(),
static_cast<size_t>(1),
std::multiplies<size_t>()) *
ITEMSIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the expression look nicer by placing ITEMSIZE before std::accumulate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea.

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Check the F contiguous array.

Comment on lines 463 to 472
def test_SimpleArray_from_ndarray_transpose(self):
ndarr = np.arange(350, dtype='float64').reshape((5, 7, 10))
# The following array is F contiguous.
parr = ndarr[2:4].T
sarr = modmesh.SimpleArrayFloat64(array=ndarr[2:4].T)

for i in range(10):
for j in range(7):
for k in range(2):
self.assertEqual(parr[i, j, k], sarr[i, j, k])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unittest for F contiguous ndarray.

Comment on lines +326 to +353
if (is_c_contiguous)
{
if (stride[stride.size() - 1] != 1)
{
throw std::runtime_error("SimpleArray: C contiguous stride must end with 1");
}
for (size_t it = 0; it < shape.size() - 1; ++it)
{
if (stride[it] != shape[it + 1] * stride[it + 1])
{
throw std::runtime_error("SimpleArray: C contiguous stride must match shape");
}
}
}
if (is_f_contiguous)
{
if (stride[0] != 1)
{
throw std::runtime_error("SimpleArray: Fortran contiguous stride must start with 1");
}
for (size_t it = 0; it < shape.size() - 1; ++it)
{
if (stride[it + 1] != shape[it] * stride[it])
{
throw std::runtime_error("SimpleArray: Fortran contiguous stride must match shape");
}
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the C contiguous and F contiguous array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is good.

We should refactor it to use a distinct helper (class or function) in a later PR (may use a separate issue to track).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way I have a questions:

  1. How could we pass and receive the bool is_c_contiguous and bool is_f_contiguous elegantly?
    I have considered passing the numpy flags directly, but the constructor may not only deal with the numpy array. Is it a good way to design the customed flags by enum class types to deal with every possible input?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the class template SimpleArray needs to know f/c continuity. The sanity check may happen in the Python wrapper which has the information from Numpy. Standalone helper (static or free function) allows you to check in C++ and pybind11 wrapper. It could make sense to create a stride object to organize the logics.

It is ok to add more metadata in SimpleArray, including f/c continuity, but it is a separate topic.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment on lines +326 to +353
if (is_c_contiguous)
{
if (stride[stride.size() - 1] != 1)
{
throw std::runtime_error("SimpleArray: C contiguous stride must end with 1");
}
for (size_t it = 0; it < shape.size() - 1; ++it)
{
if (stride[it] != shape[it + 1] * stride[it + 1])
{
throw std::runtime_error("SimpleArray: C contiguous stride must match shape");
}
}
}
if (is_f_contiguous)
{
if (stride[0] != 1)
{
throw std::runtime_error("SimpleArray: Fortran contiguous stride must start with 1");
}
for (size_t it = 0; it < shape.size() - 1; ++it)
{
if (stride[it + 1] != shape[it] * stride[it])
{
throw std::runtime_error("SimpleArray: Fortran contiguous stride must match shape");
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is good.

We should refactor it to use a distinct helper (class or function) in a later PR (may use a separate issue to track).

@yungyuc yungyuc merged commit 965d3e6 into solvcon:master Nov 30, 2024
12 checks passed
explicit SimpleArray(small_vector<size_t> const & shape,
small_vector<size_t> const & stride,
std::shared_ptr<buffer_type> const & buffer,
bool is_c_contiguous = true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use an enum here? @ThreeMonth03

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized I could use an enum class inside the class SimpleArray after I asked the question.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThreeMonth03 Maybe you can consider opening another PR to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array Multi-dimensional array implementation bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants