New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Acceleration Structure Conversion #790

Open

devshgraphicsprogramming wants to merge 11 commits into master from AS_conv

+783 −69

Member

devshgraphicsprogramming commented Nov 21, 2024 •

edited

Loading

Description

Conversion of ICPU BLAS and TLAS to IGPU including building.

We may need to support a list of IGPUBLAS in IGPUTLAS for sanity/lifetime coupling, but only if update/rebuild is not allowed or something (need to make a separate issue out of it because I have no clue how that's gonna be structured).

Testing

Ray Query Example

TODO list:

devsh added 2 commits

November 20, 2024 16:44


          Decide on the patchable parameters for the TLAS and BLAS builds.

1e3f5dd

Note that pointer/build param encoding stuff shouldn't be in the CPU side but don't touch anything.

Also fix a typo, change the SRange to a std::span, and add default SPIR-V optimizer if none provided to asset converter.


          start going through the implementation

f065d7c

devshgraphicsprogramming assigned keptsecret

devshgraphicsprogramming commented

View reviewed changes

src/nbl/video/utilities/CAssetConverter.cpp Outdated Show resolved Hide resolved

devsh added 5 commits

November 21, 2024 11:17


          const correctness


          shared ownership needs to be settled for the buffers backing acceelra…

215ee50

…tion structures


          fix typos

19e3e57


          Realize that compacted acceleration structures need an allocator for …

100ae7d

…their storage.

Change more stuff to span in `ICPUBottomLevelAccelerationStructure`

Use a semantically better typedef/alias in `ILogicalDevice::createBottomLevelAccelerationStructure`


          start attempts at AS creation

devshgraphicsprogramming commented

View reviewed changes

src/nbl/video/utilities/CAssetConverter.cpp

Comment on lines +1103 to +1104

		// finally the contents
		//TODO: hasher << lookup.asset->getContentHash();

Member Author

devshgraphicsprogramming Nov 21, 2024

note to self, need to make the ICPUBottomLevelAccelerationStructure and IPreHashed

devshgraphicsprogramming commented

View reviewed changes

src/nbl/video/utilities/CAssetConverter.cpp Show resolved Hide resolved

devshgraphicsprogramming commented

View reviewed changes

src/nbl/video/utilities/CAssetConverter.cpp

Comment on lines +2392 to +2512

+              								{asset,uniqueCopyGroupID},
+              								patch
+              							};
+              							if (!visitor())
+              								continue;
+              							const auto instanceCount = as->getInstances().size();
+              							sizes = device->getAccelerationStructureBuildSizes(patch.hostBuild,buildFlags,motionBlur,instanceCount);
+              							inputSize = (motionBlur ? sizeof(IGPUTopLevelAccelerationStructure::DevicePolymorphicInstance):sizeof(IGPUTopLevelAccelerationStructure::DeviceStaticInstance))*instanceCount;
+              						}
+              						else
+              						{
+              							const uint32_t* pMaxPrimitiveCounts = as->getGeometryPrimitiveCounts().data();
+              							// the code here is not pretty, but DRY-ing is of this is for later
+              							if (buildFlags.hasFlags(ICPUBottomLevelAccelerationStructure::BUILD_FLAGS::GEOMETRY_TYPE_IS_AABB_BIT))
+              							{
+              								const auto geoms = as->getAABBGeometries();
+              								if (patch.hostBuild)
+              								{
+              									const std::span<const IGPUBottomLevelAccelerationStructure::Triangles<const IGPUBuffer>> cpuGeoms = {
+              										reinterpret_cast<const IGPUBottomLevelAccelerationStructure::Triangles<const IGPUBuffer>*>(geoms.data()),geoms.size()
+              									};
+              									sizes = device->getAccelerationStructureBuildSizes(buildFlags,motionBlur,cpuGeoms,pMaxPrimitiveCounts);
+              								}
+              								else
+              								{
+              									const std::span<const IGPUBottomLevelAccelerationStructure::Triangles<const ICPUBuffer>> cpuGeoms = {
+              										reinterpret_cast<const IGPUBottomLevelAccelerationStructure::Triangles<const ICPUBuffer>*>(geoms.data()),geoms.size()
+              									};
+              									sizes = device->getAccelerationStructureBuildSizes(buildFlags,motionBlur,cpuGeoms,pMaxPrimitiveCounts);
+              									// TODO: check if the strides need to be aligned to 4 bytes for AABBs
+              									for (const auto& geom : geoms)
+              									if (const auto aabbCount=*(pMaxPrimitiveCounts++); aabbCount)
+              										inputSize = core::roundUp(inputSize,sizeof(float))+aabbCount*geom.stride;
+              								}
+              							}
+              							else
+              							{
+              								core::map<uint32_t,size_t> allocationsPerStride;
+              								const auto geoms = as->getTriangleGeometries();
+              								if (patch.hostBuild)
+              								{
+              									const std::span<const IGPUBottomLevelAccelerationStructure::Triangles<const IGPUBuffer>> cpuGeoms = {
+              										reinterpret_cast<const IGPUBottomLevelAccelerationStructure::Triangles<const IGPUBuffer>*>(geoms.data()),geoms.size()
+              									};
+              									sizes = device->getAccelerationStructureBuildSizes(buildFlags,motionBlur,cpuGeoms,pMaxPrimitiveCounts);
+              								}
+              								else
+              								{
+              									const std::span<const IGPUBottomLevelAccelerationStructure::Triangles<const ICPUBuffer>> cpuGeoms = {
+              										reinterpret_cast<const IGPUBottomLevelAccelerationStructure::Triangles<const ICPUBuffer>*>(geoms.data()),geoms.size()
+              									};
+              									sizes = device->getAccelerationStructureBuildSizes(buildFlags,motionBlur,cpuGeoms,pMaxPrimitiveCounts);
+              									// TODO: check if the strides need to be aligned to 4 bytes for AABBs
+              									for (const auto& geom : geoms)
+              									if (const auto triCount=*(pMaxPrimitiveCounts++); triCount)
+              									{
+              										switch (geom.indexType)
+              										{
+              											case E_INDEX_TYPE::EIT_16BIT:
+              												allocationsPerStride[sizeof(uint16_t)] += triCount*3;
+              												break;
+              											case E_INDEX_TYPE::EIT_32BIT:
+              												allocationsPerStride[sizeof(uint32_t)] += triCount*3;
+              												break;
+              											default:
+              												break;
+              										}
+              										size_t bytesPerVertex = geom.vertexStride;
+              										if (geom.vertexData[1])
+              											bytesPerVertex += bytesPerVertex;
+              										allocationsPerStride[geom.vertexStride] += geom.maxVertex;
+              									}
+              								}
+              								for (const auto& entry : allocationsPerStride)
+              									inputSize = core::roundUp<size_t>(inputSize,entry.first)+entry.first*entry.second;
+              							}
+              						}
+              					}
+              					if (!sizes)
+              						continue;
+              					// this is where it gets a bit weird, we need to create a buffer to back the acceleration structure
+              					IGPUBuffer::SCreationParams params = {};
+              					constexpr size_t MinASBufferAlignment = 256u;
+              					params.size = core::roundUp(sizes.accelerationStructureSize,MinASBufferAlignment);
+              					params.usage = IGPUBuffer::E_USAGE_FLAGS::EUF_ACCELERATION_STRUCTURE_STORAGE_BIT|IGPUBuffer::E_USAGE_FLAGS::EUF_SHADER_DEVICE_ADDRESS_BIT;
+              					// concurrent ownership if any
+              					const auto outIx = i+entry.second.firstCopyIx;
+              					const auto uniqueCopyGroupID = gpuObjUniqueCopyGroupIDs[outIx];
+              					const auto queueFamilies =  inputs.getSharedOwnershipQueueFamilies(uniqueCopyGroupID,as,patch);
+              					params.queueFamilyIndexCount = queueFamilies.size();
+              					params.queueFamilyIndices = queueFamilies.data();
+              					// we need to save the buffer in a side-channel for later
+              					auto& out = accelerationStructureParams[IsTLAS][baseOffset+entry.second.firstCopyIx+i];
+              					out = {
+              						.storage = device->createBuffer(std::move(params)),
+              						.scratchSize = sizes.buildScratchSize,
+              						.motionBlur = motionBlur,
+              						.compactAfterBuild = patch.compactAfterBuild,
+              						.inputSize = inputSize
+              					};

Member Author

devshgraphicsprogramming Nov 21, 2024

this needs some love from me

devshgraphicsprogramming commented

View reviewed changes

src/nbl/video/utilities/CAssetConverter.cpp

Comment on lines +2953 to +2957

+              			// This gets deferred till AFTER the Buffer Memory Allocations and Binding for Acceleration Structures
+              			if constexpr (!std::is_same_v<AssetType,ICPUBottomLevelAccelerationStructure> && !std::is_same_v<AssetType,ICPUTopLevelAccelerationStructure>)
+              				dfsCache.for_each([&](const instance_t<AssetType>& instance, dfs_cache<AssetType>::created_t& created)->void
               				{
+              					auto& stagingCache = std::get<SReserveResult::staging_cache_t<AssetType>>(retval.m_stagingCaches);

Member Author

devshgraphicsprogramming Nov 21, 2024

need to pack up the lambda and defer it

devshgraphicsprogramming commented

View reviewed changes

src/nbl/video/utilities/CAssetConverter.cpp

Comment on lines +3251 to +3304

+              		// Deal with Deferred Creation of Acceleration structures
+              		{
+              			for (auto asLevel=0; asLevel<2; asLevel++)
+              			{
+              				// each of these stages must have a barrier inbetween
+              				size_t scratchSizeFullParallelBuild = 0;
+              				size_t scratchSizeFullParallelCompact = 0;
+              				// we collect that stats AFTER making sure that the BLAS / TLAS can actually be created
+              				for (const auto& deferredParams : accelerationStructureParams[asLevel])
+              				{
+              					// buffer failed to create/allocate
+              					if (!deferredParams.storage.get())
+              						continue;
+              					IGPUAccelerationStructure::SCreationParams baseParams;
+              					{
+              						auto* buf = deferredParams.storage.get();
+              						const auto bufSz = buf->getSize();
+              						using create_f = IGPUAccelerationStructure::SCreationParams::FLAGS;
+              						baseParams = {
+              							.bufferRange = {.offset=0,.size=bufSz,.buffer=smart_refctd_ptr<IGPUBuffer>(buf)},
+              							.flags = deferredParams.motionBlur ? create_f::MOTION_BIT:create_f::NONE
+              						};
+              					}
+              					smart_refctd_ptr<IGPUAccelerationStructure> as;
+              					if (asLevel)
+              					{
+              						as = device->createBottomLevelAccelerationStructure({baseParams,deferredParams.maxInstanceCount});
+              					}
+              					else
+              					{
+              						as = device->createTopLevelAccelerationStructure({baseParams,deferredParams.maxInstanceCount});
+              					}
+              					// note that in order to compact an AS you need to allocate a buffer range whose size is known only after the build
+              					const auto buildSize = deferredParams.inputSize+deferredParams.scratchSize;
+              					// sizes for building 1-by-1 vs parallel, note that
+              					retval.m_minASBuildScratchSize = core::max(buildSize,retval.m_minASBuildScratchSize);
+              					scratchSizeFullParallelBuild += buildSize;
+              					if (deferredParams.compactAfterBuild)
+              						scratchSizeFullParallelCompact += deferredParams.scratchSize;
+              					// triangles, AABBs or Instance Transforms will need to be supplied from VRAM
+              	// TODO: also mark somehow that we'll need a BUILD INPUT READ ONLY BUFFER WITH XFER usage
+              					if (deferredParams.inputSize)
+              						retval.m_queueFlags |= IQueue::FAMILY_FLAGS::TRANSFER_BIT;
+              				}
+              				//
+              				retval.m_maxASBuildScratchSize = core::max(core::max(scratchSizeFullParallelBuild,scratchSizeFullParallelCompact),retval.m_maxASBuildScratchSize);
+              			}
+              			//
+              			if (retval.m_minASBuildScratchSize)
+              			{
+              				retval.m_queueFlags |= IQueue::FAMILY_FLAGS::COMPUTE_BIT;
+              				retval.m_maxASBuildScratchSize = core::max(core::max(scratchSizeFullParallelBLASBuild,scratchSizeFullParallelBLASCompact),core::max(scratchSizeFullParallelTLASBuild,scratchSizeFullParallelTLASCompact));
+              			}
+              		}

Member Author

devshgraphicsprogramming Nov 21, 2024

needs some love from me

devshgraphicsprogramming requested review from Erfan-Ahmadi, keptsecret and AnastaZIuk

November 21, 2024 16:05

devsh added 4 commits

November 21, 2024 21:22


          realize that scratches need to be provided separately for Device and …

11a141c

…Host builds


          starting writing the building code, realize we need to bucket the Dev…

…ice and Host build requests separately


          Figure out the TLAS/BLAS compaction logic and swap in cache.

8f43fef

Also update comments about what ends up in `m_gpuObjects`


          fix a nasty possible threading bug with IDeferredOperation and a bunc…

a4307f4

…h of typos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet