TensorOps kernels refactoring #3346

novakovicdj · 2024-10-30T14:23:15Z

This is draft PR for refactoring tensor ops kernels to solver structure, so far only Op1dTensorGeneric kernel is switched

src/tensor/problem_description.cpp

src/include/miopen/tensor/solvers.hpp

src/solver/tensor/Op1dTensorGeneric.cpp

src/tensor/problem_description.cpp

CAHEK7 · 2024-11-01T13:18:20Z

src/include/miopen/tensor/problem_description.hpp

+                       const void* alpha0_,
+                       const void* alpha1_,
+                       const void* beta_,


Check this conversation
https://github.com/ROCm/MIOpen/pull/3346/files#r1824480257

Probably alpha0/1 must not be a part of the PD, ideally beta as well, but right now it has to be there..

Would a bool marking if alpha0/... has a "default" value meaning no additional work required suffice?

src/include/miopen/tensor/problem_description.hpp

src/solver/tensor/Op1dTensorGeneric.cpp

src/solver/tensor/Op2dTensorLite.cpp

DrizztDoUrden · 2024-11-01T15:06:05Z

src/include/miopen/tensor/invoke_params.hpp

+    size_t Aoffset;
+    size_t Boffset;
+    size_t Coffset;


Do we need to handle this internally? IIRC it should be possible to externally pass any subtensor via changing pointer+descriptor. If so this is a duplicated functionality

I think that the main point is the pointer is void * and actual type is an miopen_Type_t enum. That's why you can't just add them without special helpers.

src/include/miopen/tensor/invoke_params.hpp

src/include/miopen/tensor/problem_description.hpp

DrizztDoUrden · 2024-11-01T15:08:18Z

src/include/miopen/tensor/problem_description.hpp

+                       const void* alpha0_,
+                       const void* alpha1_,
+                       const void* beta_,


Would a bool marking if alpha0/... has a "default" value meaning no additional work required suffice?

src/include/miopen/tensor/problem_description.hpp

src/solver/tensor/Op1dTensorGeneric.cpp

src/include/miopen/tensor_ops.hpp

…arate some of them into unique solvers and tidy the code

src/solver/tensorOp/tensor_op_helpers.hpp

src/solver/tensorOp/Op2dTensorLite.cpp

src/solver/tensorOp/Op2dTensorSquash.cpp

src/solver/tensorOp/Op4dTensorLite.cpp

src/solver/tensorOp/OpTensorFwdBias.cpp

shurale-nkn · 2024-11-08T13:52:56Z

Please provide a comparison of the average only CPU time (new solver vs old api) measurements for 100 calls with same problem and the costs associated with the first call of the unique problem configuration.

novakovicdj · 2024-11-12T12:07:58Z

Please provide a comparison of the average only CPU time (new solver vs old api) measurements for 100 calls with same problem and the costs associated with the first call of the unique problem configuration.

Here is a comparison of average host time between old and new structure

Kernel		New structure [ms]	Old structure [ms]	diff [ms]
Op1dTensorGeneric	first run	279.3786	291.3806	-12.002
	other 100 runs	0.2908	0.2549	0.0359
Op2dTensorGeneric	first run	281.8186	283.4622	-1.6436
	other 100 runs	0.356	0.2432	0.1128
Op2dTensorLite	first run	634.2228	662.2278	-28.005
	other 100 runs	0.335	0.2308	0.1042
Op2dTensorSquash	first run	668.978	699.9932	-31.0152
	other 100 runs	0.3481	0.2548	0.0933
Op3dTensorGeneric	first run	642.1512	656.3394	-14.1882
	other 100 runs	0.2659	0.2485	0.0174
OpTensorFwdBias	first run	636.6204	654.8222	-18.2018
	other 100 runs	0.3351	0.2321	0.103
OpTensorFwdBiasGeneric	first run	636.4756	662.4915	-26.0159
	other 100 runs	0.3498	0.2434	0.1064
OpTensorLeadingOnes	first run	644.8348	666.8713	-22.0365
	other 100 runs	0.3466	0.2755	0.0711
OpTensorLeadingOnesGeneric	first run	648.6535	669.6379	-20.9844
	other 100 runs	0.3552	0.2569	0.0983
Op4dTensorLite	first run	641.4747	664.4976	-23.0229
	other 100 runs	0.33	0.2206	0.1094
Op4dTensorGeneric	first run	650.7638	670.8961	-20.1323
	other 100 runs	0.3563	0.2456	0.1107
Op5dTensorGeneric	first run	655.6774	685.431	-29.7536
	other 100 runs	0.3745	0.2437	0.1308

New structure is faster on average for 20ms for first runs and it is slower for 0.1ms for other 100 calls or 0.001ms per call

shurale-nkn · 2024-11-12T13:05:25Z

Please provide a comparison of the average only CPU time (new solver vs old api) measurements for 100 calls with same problem and the costs associated with the first call of the unique problem configuration.

Here is a comparison of average host time between old and new structure

Kernel New structure [ms] Old structure [ms] diff [ms]
Op1dTensorGeneric first run 279.3786 291.3806 -12.002
other 100 runs 0.2908 0.2549 0.0359
Op2dTensorGeneric first run 281.8186 283.4622 -1.6436
other 100 runs 0.356 0.2432 0.1128
Op2dTensorLite first run 634.2228 662.2278 -28.005
other 100 runs 0.335 0.2308 0.1042
Op2dTensorSquash first run 668.978 699.9932 -31.0152
other 100 runs 0.3481 0.2548 0.0933
Op3dTensorGeneric first run 642.1512 656.3394 -14.1882
other 100 runs 0.2659 0.2485 0.0174
OpTensorFwdBias first run 636.6204 654.8222 -18.2018
other 100 runs 0.3351 0.2321 0.103
OpTensorFwdBiasGeneric first run 636.4756 662.4915 -26.0159
other 100 runs 0.3498 0.2434 0.1064
OpTensorLeadingOnes first run 644.8348 666.8713 -22.0365
other 100 runs 0.3466 0.2755 0.0711
OpTensorLeadingOnesGeneric first run 648.6535 669.6379 -20.9844
other 100 runs 0.3552 0.2569 0.0983
Op4dTensorLite first run 641.4747 664.4976 -23.0229
other 100 runs 0.33 0.2206 0.1094
Op4dTensorGeneric first run 650.7638 670.8961 -20.1323
other 100 runs 0.3563 0.2456 0.1107
Op5dTensorGeneric first run 655.6774 685.431 -29.7536
other 100 runs 0.3745 0.2437 0.1308
New structure is faster on average for 20ms for first runs and it is slower for 0.1ms for other 100 calls or 0.001ms per call

The results are very strange; we need to obtain the experiment protocol. How was the program executed, and what was used for measurement?
so far, according to the table, each subsequent launch is on average 30% slower

…to tensor_refactoring

CAHEK7 · 2024-12-01T02:18:13Z

@randyspauldingamd @BrianHarrisonAMD I guess it's a final review round.

BrianHarrisonAMD · 2024-12-04T18:40:27Z

I think we need to coordinate these changes with #3402 to ensure const is correct after both are merged.
Or I guess we could decide to do a follow-up.

randyspauldingamd · 2024-12-04T20:45:29Z

I think we need to coordinate these changes with #3402 to ensure const is correct after both are merged. Or I guess we could decide to do a follow-up.

If you were asking for feedback, I'd be fine with a follow-up (in a timely fashion). @novakovicdj, are you going to be joining our scrum anytime soon? If not, perhaps we could ask @DrizztDoUrden to add a ticket and coordinate with you.

BrianHarrisonAMD · 2024-12-04T21:07:30Z

Yea I think it depends on what @DrizztDoUrden would like to do.
I think we either merge this first, and fix it in #3402, or merge both, and fix in a separate follow up.

CAHEK7 · 2024-12-04T21:48:16Z

I think we need to coordinate these changes with #3402 to ensure const is correct after both are merged. Or I guess we could decide to do a follow-up.

It can be fixed later. It's quite big and require extra effort for the maintenance.

CAHEK7 · 2024-12-04T21:50:16Z

I think we need to coordinate these changes with #3402 to ensure const is correct after both are merged. Or I guess we could decide to do a follow-up.

If you were asking for feedback, I'd be fine with a follow-up (in a timely fashion). @novakovicdj, are you going to be joining our scrum anytime soon? If not, perhaps we could ask @DrizztDoUrden to add a ticket and coordinate with you.

Probably not in a short-term perspective, there is some bureaucracy involved.

BrianHarrisonAMD · 2024-12-05T15:12:11Z

Greetings @novakovicdj!

Can you update this branch with develop and resolve the conflicts?

CAHEK7 · 2024-12-17T00:07:14Z

Hi @BrianHarrisonAMD @BradPepersAMD,
I'm not sure about the latest merge policies and are they applicable to Djordje, but probably we have to merge it manually.
Housekeeping that big PR can be painful.

BrianHarrisonAMD · 2024-12-17T14:25:07Z

Ill kick off another CI run, and we can merge once it passes.

BrianHarrisonAMD · 2024-12-18T16:40:54Z

Looks like it failed, but seems unrelated.
Restarted that stage.

DrizztDoUrden

lgtm

novakovicdj added 3 commits October 30, 2024 16:17

initial changes and support for 1d generic kernel

a2972c0

1d solver file name change

75cecb2

solver name change in cmakelists.txt

035989c

Vsevolod1983 reviewed Oct 31, 2024

View reviewed changes

src/tensor/problem_description.cpp Outdated Show resolved Hide resolved

more changes, 2d generic and 2d lite kernel

cf91070

CAHEK7 suggested changes Nov 1, 2024

View reviewed changes

DrizztDoUrden requested changes Nov 1, 2024

View reviewed changes

novakovicdj added 4 commits November 1, 2024 18:32

some changes suggested in the comments

f2a11d6

additional changes

ac13ff3

initial switch to solver structure for all kernels, still need to sep…

cadb264

…arate some of them into unique solvers and tidy the code

fix for two kernels in one solver

63603f0

CAHEK7 suggested changes Nov 6, 2024

View reviewed changes

CAHEK7 requested review from DrizztDoUrden and Vsevolod1983 November 6, 2024 16:46

CAHEK7 added quality complexity_middle labels Nov 6, 2024

novakovicdj added 3 commits November 7, 2024 15:29

additional changes

976bd84

clang format

6be98d0

fwd_conv_bias changed

d6ffea5

novakovicdj marked this pull request as ready for review November 7, 2024 15:19

novakovicdj requested review from JehandadKhan and junliume as code owners November 7, 2024 15:19

tidy some part of the code

89dd24c

novakovicdj requested review from BrianHarrisonAMD and bpepers-me as code owners November 8, 2024 07:51

novakovicdj added 2 commits November 8, 2024 09:52

Merge branch 'develop' into tensor_refactoring

9ba8810

fix typos

5a9b5ed

CAHEK7 approved these changes Nov 25, 2024

View reviewed changes

Merge branch 'develop' into tensor_refactoring

2bef739

novakovicdj requested a review from randyspauldingamd November 28, 2024 07:39

CAHEK7 mentioned this pull request Nov 28, 2024

Implement InlineVector and use it for lens and strides in TensorDescriptor #3408

Open

novakovicdj added 3 commits November 29, 2024 10:37

fix windows build issue

146070a

Merge branch 'develop' into tensor_refactoring

a83ac16

Merge branch 'tensor_refactoring' of github.com:novakovicdj/MIOpen in…

b8d9ab0

…to tensor_refactoring

CAHEK7 removed request for bpepers-me and JehandadKhan November 30, 2024 13:05

CAHEK7 added TESTING_CI_PASSED and removed TESTING_CI_PASSED labels Nov 30, 2024

Merge branch 'develop' into tensor_refactoring

6258109

randyspauldingamd approved these changes Dec 4, 2024

View reviewed changes

novakovicdj added 2 commits December 5, 2024 17:19

resolved conflict

0eb63fc

kept changes in CastTensor but in tensor.cpp file

3dc0f66

Merge branch 'develop' into tensor_refactoring

6e37785

junliume approved these changes Dec 28, 2024

View reviewed changes

This comment was marked as duplicate.

Sign in to view

DrizztDoUrden approved these changes Dec 30, 2024

View reviewed changes

Merge branch 'develop' into tensor_refactoring

edaa59a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorOps kernels refactoring #3346

TensorOps kernels refactoring #3346

novakovicdj commented Oct 30, 2024

CAHEK7 Nov 1, 2024

DrizztDoUrden Nov 1, 2024

DrizztDoUrden Nov 1, 2024

CAHEK7 Nov 15, 2024

DrizztDoUrden Nov 1, 2024

shurale-nkn commented Nov 8, 2024

novakovicdj commented Nov 12, 2024

shurale-nkn commented Nov 12, 2024

CAHEK7 commented Dec 1, 2024

BrianHarrisonAMD commented Dec 4, 2024

randyspauldingamd commented Dec 4, 2024

BrianHarrisonAMD commented Dec 4, 2024

CAHEK7 commented Dec 4, 2024

CAHEK7 commented Dec 4, 2024

BrianHarrisonAMD commented Dec 5, 2024

CAHEK7 commented Dec 17, 2024 •

edited

Loading

BrianHarrisonAMD commented Dec 17, 2024

BrianHarrisonAMD commented Dec 18, 2024

This comment was marked as duplicate.

DrizztDoUrden left a comment

TensorOps kernels refactoring #3346

Are you sure you want to change the base?

TensorOps kernels refactoring #3346

Conversation

novakovicdj commented Oct 30, 2024

CAHEK7 Nov 1, 2024

Choose a reason for hiding this comment

DrizztDoUrden Nov 1, 2024

Choose a reason for hiding this comment

DrizztDoUrden Nov 1, 2024

Choose a reason for hiding this comment

CAHEK7 Nov 15, 2024

Choose a reason for hiding this comment

DrizztDoUrden Nov 1, 2024

Choose a reason for hiding this comment

shurale-nkn commented Nov 8, 2024

novakovicdj commented Nov 12, 2024

shurale-nkn commented Nov 12, 2024

CAHEK7 commented Dec 1, 2024

BrianHarrisonAMD commented Dec 4, 2024

randyspauldingamd commented Dec 4, 2024

BrianHarrisonAMD commented Dec 4, 2024

CAHEK7 commented Dec 4, 2024

CAHEK7 commented Dec 4, 2024

BrianHarrisonAMD commented Dec 5, 2024

CAHEK7 commented Dec 17, 2024 • edited Loading

BrianHarrisonAMD commented Dec 17, 2024

BrianHarrisonAMD commented Dec 18, 2024

This comment was marked as duplicate.

DrizztDoUrden left a comment

Choose a reason for hiding this comment

CAHEK7 commented Dec 17, 2024 •

edited

Loading