Fix the construction of SimpleArray from slicing ndarray #438

ThreeMonth03 · 2024-11-28T18:32:53Z

In this pull request, I fix the issue #432 by passing the stride vector to SimpleArray.

ThreeMonth03

Instead of allocating the contiguous memory and copying the elements from py::array, I try to fix the bug by passing the stride vector.

ThreeMonth03 · 2024-11-28T18:35:19Z

cpp/modmesh/buffer/pymod/wrap_SimpleArray.cpp

                        for (ssize_t i = 0; i < arr_in.ndim(); ++i)
                        {
                            shape.push_back(arr_in.shape(i));
+                            stride.push_back(arr_in.strides(i) / itemsize);


The unit of the py::array::strides() is bytes, and the unit of the SimpleArray::strides() is the elements.

The code is clear enough. It is correct to use no comment.

ThreeMonth03 · 2024-11-28T18:38:29Z

cpp/modmesh/buffer/SimpleArray.hpp

+    explicit SimpleArray(small_vector<size_t> const & shape, small_vector<size_t> const & stride, std::shared_ptr<buffer_type> const & buffer)
+        : SimpleArray(buffer)
+    {
+        if (buffer)
+        {
+            m_shape = shape;
+            m_stride = stride;
+        }
+    }


I create this function according to the function explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer), but I'm not sure whether it would create the potential bug without checking nbytes == buffer->nbytes().

We should check the shape and stride are within the buffer range, in the same way it is done in the overload explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer);

ThreeMonth03 · 2024-11-28T18:42:04Z

tests/test_buffer.py

+    def test_SimpleArray_from_ndarray_slice(self):
+        ndarr = np.arange(1000, dtype='float64').reshape((10, 10, 10))
+        parr = ndarr[1:7, 2:6, 3:9]
+        sarr = modmesh.SimpleArrayFloat64(array=ndarr[1:7, 2:6, 3:9])
+        for i in range(6):
+            for j in range(4):
+                for k in range(6):
+                    self.assertEqual(parr[i, j, k], sarr[i, j, k])


This is the unittest of constructing SimpleArray from slicing ndarray. When the dimension is less than 4, the constructor should work regularly.

This is why you filed issue #437.

ThreeMonth03 · 2024-11-28T18:46:52Z

Instead of allocating the contiguous memory and copying the elements from py::array, I try to fix the bug by passing the stride vector.

@yungyuc @tigercosmos Could you please review the pull request?

yungyuc

Instead of allocating the contiguous memory and copying the elements from py::array, I try to fix the bug by passing the stride vector.

This is a good fix. When taking a memory buffer, SimpleArray should reuse instead of copying it. Sanity check should be done during the simple array construction so that the array operating code can follow the assumption made by the class template SimpleArray.

Some enhancements are needed in the PR:

SimpleArray.hpp:310: Check the shape and stride are within the buffer range, in the same way it is done in the overload explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer);
wrap_SimpleArray.cpp:89: The wrapper should check if the input array is contiguous. You can use the ndarray flags and/or the source striding value.

yungyuc · 2024-11-29T00:50:41Z

cpp/modmesh/buffer/SimpleArray.hpp

+    explicit SimpleArray(small_vector<size_t> const & shape, small_vector<size_t> const & stride, std::shared_ptr<buffer_type> const & buffer)
+        : SimpleArray(buffer)
+    {
+        if (buffer)
+        {
+            m_shape = shape;
+            m_stride = stride;
+        }
+    }


We should check the shape and stride are within the buffer range, in the same way it is done in the overload explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer);

yungyuc · 2024-11-29T00:53:04Z

tests/test_buffer.py

+    def test_SimpleArray_from_ndarray_slice(self):
+        ndarr = np.arange(1000, dtype='float64').reshape((10, 10, 10))
+        parr = ndarr[1:7, 2:6, 3:9]
+        sarr = modmesh.SimpleArrayFloat64(array=ndarr[1:7, 2:6, 3:9])
+        for i in range(6):
+            for j in range(4):
+                for k in range(6):
+                    self.assertEqual(parr[i, j, k], sarr[i, j, k])


This is why you filed issue #437.

yungyuc · 2024-11-29T00:56:37Z

cpp/modmesh/buffer/pymod/wrap_SimpleArray.cpp

                        for (ssize_t i = 0; i < arr_in.ndim(); ++i)
                        {
                            shape.push_back(arr_in.shape(i));
+                            stride.push_back(arr_in.strides(i) / itemsize);


The code is clear enough. It is correct to use no comment.

yungyuc · 2024-11-29T00:59:53Z

cpp/modmesh/buffer/pymod/wrap_SimpleArray.cpp

                        }
                        std::shared_ptr<ConcreteBuffer> const buffer = ConcreteBuffer::construct(
                            arr_in.nbytes(),
                            arr_in.mutable_data(),
                            std::make_unique<ConcreteBufferNdarrayRemover>(arr_in));
-                        return wrapped_type(shape, buffer);
+                        return wrapped_type(shape, stride, buffer);


The wrapper should check if the input array is contiguous. You can use the ndarray flags and/or the source striding value.

Why should the wrapper check if the input array is contiguous?

Because only the wrapper sees the PyObject of the incoming nearray, on which the contiguous bit is available. When the buffer pointer is passed in the SimpleArray C++ code, no information of memory contiguity is available. The constructor can only trust the shape and stride passed in blindly.

Checking for contiguity in the wrapper is more robust.

ThreeMonth03

SimpleArray.hpp:310: Check the shape and stride are within the buffer range, in the same way it is done in the overload explicit SimpleArray(small_vector<size_t> const & shape, std::shared_ptr<buffer_type> const & buffer);
wrap_SimpleArray.cpp:89: The wrapper should check if the input array is contiguous. You can use the ndarray flags and/or the source striding value.

ThreeMonth03 · 2024-11-29T12:25:17Z

cpp/modmesh/buffer/SimpleArray.hpp

+            const size_t nbytes = std::accumulate(m_shape.begin(),
+                                                  m_shape.end(),
+                                                  static_cast<size_t>(1),
+                                                  std::multiplies<size_t>()) *
+                                  ITEMSIZE;
+            if (nbytes != buffer->nbytes())
+            {
+                throw std::runtime_error(Formatter() << "SimpleArray: shape byte count " << nbytes
+                                                     << " differs from buffer " << buffer->nbytes());
+            }


Check the nbytes().

ThreeMonth03 · 2024-11-29T12:26:55Z

cpp/modmesh/buffer/pymod/wrap_SimpleArray.cpp

                        }
                        std::shared_ptr<ConcreteBuffer> const buffer = ConcreteBuffer::construct(
                            arr_in.nbytes(),
                            arr_in.mutable_data(),
                            std::make_unique<ConcreteBufferNdarrayRemover>(arr_in));
-                        return wrapped_type(shape, buffer);
+                        return is_c_contiguous ? wrapped_type(shape, buffer) : wrapped_type(shape, stride, buffer);


If the array is c contiguous, the wrapper use the original constructor.

f_contiguous also needs to be handled.

ThreeMonth03 · 2024-11-29T12:28:33Z

tests/test_buffer.py

+        parr = ndarr[1:7:3, 6:2:-1, 3:9]
+        sarr = modmesh.SimpleArrayFloat64(array=ndarr[1:7:3, 6:2:-1, 3:9])


I modify the step to make the testcase complicated.

yungyuc

One more point to address.

wrap_SimpleArray.cpp:90: f_contiguous also needs to be handled.

yungyuc · 2024-11-29T12:46:29Z

cpp/modmesh/buffer/pymod/wrap_SimpleArray.cpp

                        }
                        std::shared_ptr<ConcreteBuffer> const buffer = ConcreteBuffer::construct(
                            arr_in.nbytes(),
                            arr_in.mutable_data(),
                            std::make_unique<ConcreteBufferNdarrayRemover>(arr_in));
-                        return wrapped_type(shape, buffer);
+                        return is_c_contiguous ? wrapped_type(shape, buffer) : wrapped_type(shape, stride, buffer);


f_contiguous also needs to be handled.

yungyuc · 2024-11-29T12:47:53Z

cpp/modmesh/buffer/SimpleArray.hpp

+                                                  m_shape.end(),
+                                                  static_cast<size_t>(1),
+                                                  std::multiplies<size_t>()) *
+                                  ITEMSIZE;


Would the expression look nicer by placing ITEMSIZE before std::accumulate?

That's a good idea.

ThreeMonth03

Check the F contiguous array.

ThreeMonth03 · 2024-11-29T14:49:43Z

tests/test_buffer.py

+    def test_SimpleArray_from_ndarray_transpose(self):
+        ndarr = np.arange(350, dtype='float64').reshape((5, 7, 10))
+        # The following array is F contiguous.
+        parr = ndarr[2:4].T 
+        sarr = modmesh.SimpleArrayFloat64(array=ndarr[2:4].T)
+
+        for i in range(10):
+            for j in range(7):
+                for k in range(2):
+                    self.assertEqual(parr[i, j, k], sarr[i, j, k])


Unittest for F contiguous ndarray.

ThreeMonth03 · 2024-11-29T14:51:32Z

cpp/modmesh/buffer/SimpleArray.hpp

+            if (is_c_contiguous)
+            {
+                if (stride[stride.size() - 1] != 1)
+                {
+                    throw std::runtime_error("SimpleArray: C contiguous stride must end with 1");
+                }
+                for (size_t it = 0; it < shape.size() - 1; ++it)
+                {
+                    if (stride[it] != shape[it + 1] * stride[it + 1])
+                    {
+                        throw std::runtime_error("SimpleArray: C contiguous stride must match shape");
+                    }
+                }
+            }
+            if (is_f_contiguous)
+            {
+                if (stride[0] != 1)
+                {
+                    throw std::runtime_error("SimpleArray: Fortran contiguous stride must start with 1");
+                }
+                for (size_t it = 0; it < shape.size() - 1; ++it)
+                {
+                    if (stride[it + 1] != shape[it] * stride[it])
+                    {
+                        throw std::runtime_error("SimpleArray: Fortran contiguous stride must match shape");
+                    }
+                }
+            }


Check the C contiguous and F contiguous array.

The current implementation is good.

We should refactor it to use a distinct helper (class or function) in a later PR (may use a separate issue to track).

By the way I have a questions:

How could we pass and receive the bool is_c_contiguous and bool is_f_contiguous elegantly?
I have considered passing the numpy flags directly, but the constructor may not only deal with the numpy array. Is it a good way to design the customed flags by enum class types to deal with every possible input?

I don't think the class template SimpleArray needs to know f/c continuity. The sanity check may happen in the Python wrapper which has the information from Numpy. Standalone helper (static or free function) allows you to check in C++ and pybind11 wrapper. It could make sense to create a stride object to organize the logics.

It is ok to add more metadata in SimpleArray, including f/c continuity, but it is a separate topic.

yungyuc

Looks good!

yungyuc · 2024-11-30T07:16:49Z

cpp/modmesh/buffer/SimpleArray.hpp

+            if (is_c_contiguous)
+            {
+                if (stride[stride.size() - 1] != 1)
+                {
+                    throw std::runtime_error("SimpleArray: C contiguous stride must end with 1");
+                }
+                for (size_t it = 0; it < shape.size() - 1; ++it)
+                {
+                    if (stride[it] != shape[it + 1] * stride[it + 1])
+                    {
+                        throw std::runtime_error("SimpleArray: C contiguous stride must match shape");
+                    }
+                }
+            }
+            if (is_f_contiguous)
+            {
+                if (stride[0] != 1)
+                {
+                    throw std::runtime_error("SimpleArray: Fortran contiguous stride must start with 1");
+                }
+                for (size_t it = 0; it < shape.size() - 1; ++it)
+                {
+                    if (stride[it + 1] != shape[it] * stride[it])
+                    {
+                        throw std::runtime_error("SimpleArray: Fortran contiguous stride must match shape");
+                    }
+                }
+            }


The current implementation is good.

We should refactor it to use a distinct helper (class or function) in a later PR (may use a separate issue to track).

tigercosmos · 2024-12-02T16:10:46Z

cpp/modmesh/buffer/SimpleArray.hpp

+    explicit SimpleArray(small_vector<size_t> const & shape,
+                         small_vector<size_t> const & stride,
+                         std::shared_ptr<buffer_type> const & buffer,
+                         bool is_c_contiguous = true,


why not use an enum here? @ThreeMonth03

I realized I could use an enum class inside the class SimpleArray after I asked the question.

@ThreeMonth03 Maybe you can consider opening another PR to fix this.

ThreeMonth03 commented Nov 28, 2024

View reviewed changes

yungyuc changed the title ~~Fix: Fix the construction of SimpleArray from slicing ndarray~~ Fix the construction of SimpleArray from slicing ndarray Nov 29, 2024

yungyuc requested changes Nov 29, 2024

View reviewed changes

yungyuc assigned ThreeMonth03 Nov 29, 2024

yungyuc added bug Something isn't working array Multi-dimensional array implementation labels Nov 29, 2024

ThreeMonth03 force-pushed the issue432 branch from 1c47b73 to 653d264 Compare November 29, 2024 12:23

ThreeMonth03 commented Nov 29, 2024

View reviewed changes

ThreeMonth03 force-pushed the issue432 branch from 653d264 to da3f887 Compare November 29, 2024 12:40

yungyuc requested changes Nov 29, 2024

View reviewed changes

ThreeMonth03 force-pushed the issue432 branch from da3f887 to 756010a Compare November 29, 2024 14:46

ThreeMonth03 commented Nov 29, 2024

View reviewed changes

Fix: Fix the construction of SimpleArray from slicing ndarray

8470ec1

ThreeMonth03 force-pushed the issue432 branch from 756010a to 8470ec1 Compare November 29, 2024 15:25

yungyuc approved these changes Nov 30, 2024

View reviewed changes

yungyuc merged commit 965d3e6 into solvcon:master Nov 30, 2024
12 checks passed

yungyuc mentioned this pull request Nov 30, 2024

Incorrect construction of SimpleArray from a discontiguous slice of ndarray #432

Closed

tigercosmos reviewed Dec 2, 2024

View reviewed changes

ThreeMonth03 deleted the issue432 branch December 3, 2024 08:43

ThreeMonth03 mentioned this pull request Dec 4, 2024

Refactor: refactor the constructor of SimpleArray #439

Merged

		parr = ndarr[1:7:3, 6:2:-1, 3:9]
		sarr = modmesh.SimpleArrayFloat64(array=ndarr[1:7:3, 6:2:-1, 3:9])

Fix the construction of SimpleArray from slicing ndarray #438

Fix the construction of SimpleArray from slicing ndarray #438

Conversation

ThreeMonth03 commented Nov 28, 2024

ThreeMonth03 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThreeMonth03 Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThreeMonth03 commented Nov 28, 2024

yungyuc left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yungyuc Nov 29, 2024 • edited Loading

Choose a reason for hiding this comment

ThreeMonth03 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yungyuc left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThreeMonth03 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yungyuc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThreeMonth03 Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThreeMonth03 left a comment •

edited

Loading

ThreeMonth03 Nov 28, 2024 •

edited

Loading

yungyuc left a comment •

edited

Loading

yungyuc Nov 29, 2024 •

edited

Loading

yungyuc left a comment •

edited

Loading

ThreeMonth03 left a comment •

edited

Loading

ThreeMonth03 Dec 2, 2024 •

edited

Loading