Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Span struct (replacing StrRange). Spans represent read-only access to a contiguous array, resembling std::span. #100293

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Ivorforce
Copy link
Contributor

@Ivorforce Ivorforce commented Dec 11, 2024

It is currently difficult to run algorithms on String, Vector, LocalVector data ranges or plain C arrays without copying the data first. This leads to inefficiencies.

This PR adds the Span class. A span represents a view into an array (it's a pointer and size).
With Span, it will be easier to implement functions with more agnosticism as to the memory storage, helping to reduce unnecessary copies. Additionally, Span will help bridge the gap between LocalVector and Vector, helping to address godotengine/godot-proposals#5144.

Span is similar to StrView which it replaces, but meant for more than just strings. It is furthermore similar to VectorView which will be replaced in a future PR.

Example

The String.path_to implementation currently has the following lines of code:

godot/core/string/ustring.cpp

Lines 5108 to 5109 in c2e4ae7

Vector<String> src_dirs = src.substr(1, src.length() - 2).split("/");
Vector<String> dst_dirs = dst.substr(1, dst.length() - 2).split("/");

Here, the String is first subselected through substr, causing a copy to be made.
Then, it is split through a split call, through which multiple copies of regions of the string are made.
None of the regions are modified in the process - the copies are made merely because it is not possible to avoid them.

An optimized implementation could look like this:

LocalVector<Span<char32_t>> src_dirs = spans::split(src.span().subspan(1, src.length() - 2), U'/'); 
LocalVector<Span<char32_t>> dst_dirs = spans::split(dst.span().subspan(1, dst.length() - 2), U'/'); 

In this implementation, no copies of the string need to be made, because all Spans used here are views into the original string (spans::split and subspan would need to be added as functions).

Discussion

Span is named after std::span of C++20 std::span.
Span is const-only because a need for mutable spans is not obvious yet. It is easier to use if it is const by default. If mutable spans are needed in the future, it can be proposed then.

@Ivorforce Ivorforce marked this pull request as ready for review December 11, 2024 22:43
@Ivorforce Ivorforce requested a review from a team as a code owner December 11, 2024 22:43
@Ivorforce Ivorforce force-pushed the buffer-view branch 3 times, most recently from e3be56f to b8c9998 Compare December 11, 2024 23:18
@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 12, 2024

Looks like the builds are failing.
The Windows failure is a repeat of a problem I had in #99806 already. It can be fixed by touching the file to wipe caches.
I have no idea what the Linux problem is about though, since tests are completing on my own machine.
Edit: Nevermind, I got it.

@RandomShaper
Copy link
Member

There's a VectorView somewhere in the rendering code (in rendering_device_commons.h, I think) that seems to have the same goal. Also, I barely remember there's a proposal around related to this.

@Ivorforce
Copy link
Contributor Author

There's a VectorView somewhere in the rendering code (in rendering_device_commons.h, I think) that seems to have the same goal. Also, I barely remember there's a proposal around related to this.

Oh yeah, I see it.
I think it would be best to consolidate that into BufferView once merged, in a follow-up PR.

@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 12, 2024

To avoid the MSVC compiler issue (which I have previously determined to be a cache issue), I am touching feed_effects.h by inserting an explicit include to a well-known file.
I think it may be caused by the only include of the file being a generated file, possibly leading to the cache resolver running into a race condition when compiling the file. I hope this change can fix the problem for good (though we won't know for sure until future PRs since touching the file already invalidates the cache, fixing the issue).

@Ivorforce Ivorforce changed the title Rename StrRange -> BufferView. Move find, rfind and contains functions from CowData to BufferView. Rename StrRange -> Span. Move find, rfind and contains functions from CowData to Span. Dec 12, 2024
@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 12, 2024

I renamed BufferView -> Span since that's what C++20 calls the concept. And I shrunk the implementation to be bare-bones as we can still add the rest of what we need later.

@Ivorforce Ivorforce force-pushed the buffer-view branch 2 times, most recently from 967b48e to 49fda0d Compare December 17, 2024 16:11
Copy link
Contributor

@Repiteo Repiteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd encourage adding tests as well. Seeing as you're making a new template, you'd be free to make several functions constexpr for compile-time sanity checks:

TEST_CASE("[Span] Constant Validators") {
	constexpr Span<uint8_t> span_empty;
	static_assert(span_empty.size() == 0);
	static_assert(span_empty.is_empty());

	constexpr uint8_t byte = 0;
	constexpr Span<uint8_t> span_byte = Span<uint8_t>(byte, 1);
	static_assert(span_byte.size() == 1);
	static_assert(!span_byte.is_empty());
}

@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 17, 2024

I'd encourage adding tests as well. Seeing as you're making a new template, you'd be free to make several functions constexpr for compile-time sanity checks:

TEST_CASE("[Span] Constant Validators") {
	constexpr Span<uint8_t> span_empty;
	static_assert(span_empty.size() == 0);
	static_assert(span_empty.is_empty());

	constexpr uint8_t byte = 0;
	constexpr Span<uint8_t> span_byte = Span<uint8_t>(byte, 1);
	static_assert(span_byte.size() == 1);
	static_assert(!span_byte.is_empty());
}

I like this test! I'll add some more as well.
I'm also realizing, i can make every single function currently in Span constexpr. Not just great for the compiler, but it also means we don't need a single runtime test (yet) 😄

@Ivorforce
Copy link
Contributor Author

Nice, looks like my include fix from #100434 has fixed the feed_effects compile issues for good. No more cache problems!

@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 20, 2024

Hrm... Wondering again if it wouldn't be possible to just have const Span<char> propagate constness to the contained type. It would just require a few const cast hacks. I'll have a look later and see if I can make it work.
Edit: It's not possible; it would require const Span<T> const_span(const T *ptr, size_t size) to return const Span, but the constness of the return value is ignored on the caller's side. That's a shame!

@Ivorforce
Copy link
Contributor Author

Ivorforce commented Jan 16, 2025

I've updated the PR according to feedback from the godot core meeting:

  • Span will hold only const pointers for now. If mutable pointers are needed, they should be separately proposed.
  • operator[] subscripts will do a sanity bounds check.
  • find, rfind and count will not be hosted by Span. Instead, they should be hosted by a separate header (to be moved in another PR).
  • Add implicit conversion from Vector, String, CharString and Char16String to Span, to aid transition.

VectorView will be replaced with Span in a future PR.

The PR is ready for review again.

template <typename T>
class Span {
const T *_ptr = nullptr;
uint32_t _len = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is free to make this 64bit as you'll have 32 bit padding right now

Copy link
Contributor Author

@Ivorforce Ivorforce Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 32-bit systems the current implementation should be size 64 align 32, and on 64-bit systems it should be size 96 align 64, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on 32 bit, you're just 64, but I'm not sure 4.x is compiled to 32 bits ? And on 64 you have 64bits of pointer, 32 bits of integer, and then you need to pad to be aligned to the max alignment of your fields (which is 64 due to the pointer)

Copy link
Contributor Author

@Ivorforce Ivorforce Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then you need to pad to be aligned to the max alignment of your fields (which is 64 due to the pointer)

Padding comes before the struct, not after. If you have a Span as a variable, then a uint32_t as the next one, it can fit right after _len without padding because it aligns to the 32 bit left by Span. If I were to make it uint64_t it would take 32 more bits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not right. Padding is at the end of the structure (or between the elements) never at the start. And if you declare a Span on the stack, it will pad, the next uin32_t variable will not be directly after your _len field. Except if you #pragma pack(push, 1) so that the compiler does not put padding bytes, but this is not the case here.

Copy link
Member

@lawnjelly lawnjelly Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another problem to think about in advance here by the way is that we might have a caller that uses something like:

for (u32 n=0; n<span.size(); n++)

This will end up happening, this is partly why variable size types are such a nightmare.

So it will afaik end up promoting n to 64 bit each time in e.g. a tight loop, just to make this comparison. (This might be free, or might not..)

Copy link
Member

@lawnjelly lawnjelly Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these reasons actually I would reconsider fixing size as u32 (similar to LocalVector), and using padding for 64 bit, as it will make usage far easier to reason about and remove platform effects.

In 99.9% of cases, u32 gives 4gb range for 8 bit values, and more when addressing larger T sizes. For those cases that require more than that, they could use e.g. a 64 bit span?

Just my personal thoughts though.

Another knock on effect:

The moment you make size able to hold 64 bit, everywhere that stores this has to account for this possibility, or else have error handling for > 32 bit. This means knock on structures end up having to hold 64 bit instead of 32 bit.

Copy link
Contributor

@kiroxas kiroxas Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on your side, as I tend to prefer fixed size to make the layout clearer and not platform dependant. This i why I wrote that :

I like uint64_t better BUT ...

But size_t seems to be used a lot in the codebase already....

So it will afaik end up promoting n to 64 bit each time in e.g. a tight loop, just to make this comparison. (This might be free, or might not..)

When I look at assembly for this (and take this with a grain of salt, as some different usage may cause different behaviour), at my surprise, it is totally not free, as it forces one additional move ( godbolt ). It makes sense when you think about it, as it have to do the inc on a 32 bit register for overflow reasons, then move it to a 64 bits register for the comparison. Not a big deal, but it has a cost, so we should take it into consideration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But size_t seems to be used a lot in the codebase already....

size_t is (correctly) used where OS and API calls return a size_t, however historically we haven't (to my knowledge) used it much internally, for containers etc. But bear in mind I'm more familiar with 3.x.

I can see it used in lru.h (which may be new), but we should consider whether it should be used there.

Copy link
Contributor Author

@Ivorforce Ivorforce Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me summarize my current thoughts.
32 bit uint would put 64-bit systems at a disadvantage because > 4gb buffers (at 1 byte objects) would not be supported. Such buffers are admittedly rare, but do happen in my line of work (which is not games admittedly).
64 bit uint would be perfect on 64 bit systems, but slightly slow and potentially unsafe on 32 bit systems.
size_t is somewhat fishy because it's different sizes on different systems.

I'm still on the size_t side because it's fast, safe, and while we have no guarantees C++ side, we do know it to be correct for all major platforms we expect Godot to ship on. It's also used by all C++ size types (sizeof(array), malloc, ...) and STL container types (std::span, std::array...).

@Ivorforce Ivorforce changed the title Rename StrRange -> Span. Move find, rfind and contains functions from CowData to Span. Add Span struct (replacing StrRange). Spans represent read-only access to a contiguous array, resembling std::span. Jan 16, 2025
@clayjohn clayjohn requested review from hpvb and RandomShaper January 16, 2025 20:03
…ccess to a contiguous array, resembling `std::span`.
@@ -31,6 +31,7 @@
#ifndef SORT_EFFECTS_RD_H
#define SORT_EFFECTS_RD_H

#include "servers/rendering/renderer_rd/shader_rd.h"
Copy link
Contributor Author

@Ivorforce Ivorforce Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick fix due to build failing because of erroneous cache restores.

I have #100293 (comment) that including a non-generated file explicitly solves such cache issues for good.

Copy link
Contributor

@Repiteo Repiteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codestyle checks out & the use of more modern C++ concepts is a welcome addition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants