Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorflowJS conversion #4

Closed
shilsircar opened this issue Jun 28, 2020 · 57 comments
Closed

TensorflowJS conversion #4

shilsircar opened this issue Jun 28, 2020 · 57 comments

Comments

@shilsircar
Copy link

I am trying to convert the savedmodel using tensorflowjs 2.X using the following command:

tensorflowjs_converter --control_flow_v2=False --input_format=tf_saved_model --saved_model_tags=serve --signature_name=serving_default --strip_debug_ops=False --weight_shard_size_bytes=4194304 C:\Users\ss\Documents\workspace\DTLN\DTLN-master\pretrained_model\DTLN_norm_500h_saved_model C:\Users\ss\Documents\workspace\DTLN\tfjs

I get the following two errors:

Traceback (most recent call last):
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 497, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input 0 of node StatefulPartitionedCall/model/lstm/AssignVariableOp was passed float from Func/StatefulPartitionedCall/input/_4:0 incompatible with expected resource.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 482, in convert_tf_saved_model
frozen_graph = _freeze_saved_model_v2(concrete_func, control_flow_v2)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 352, in _freeze_saved_model_v2
concrete_func, lower_control_flow=not control_flow_v2).graph
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\convert_to_constants.py", line 680, in convert_variables_to_constants_v2
return _construct_concrete_function(func, output_graph_def, converted_inputs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\convert_to_constants.py", line 406, in _construct_concrete_function
new_output_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 633, in function_from_graph_def
wrapped_import = wrap_function(_imports_graph_def, [])
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 611, in wrap_function
collections={}),
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\func_graph.py", line 981, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 86, in call
return self.call_with_variable_creator_scope(self._fn)(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 92, in wrapped
return fn(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 631, in _imports_graph_def
importer.import_graph_def(graph_def, name="")
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 501, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: Input 0 of node StatefulPartitionedCall/model/lstm/AssignVariableOp was passed float from Func/StatefulPartitionedCall/input/_4:0 incompatible with expected resource.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\wizard.py", line 606, in run
converter.convert(arguments)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\converter.py", line 681, in convert
control_flow_v2=args.control_flow_v2)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 485, in convert_tf_saved_model
output_node_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 342, in _freeze_saved_model_v1
sess, g.as_graph_def(), output_node_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\graph_util_impl.py", line 359, in convert_variables_to_constants
inference_graph = extract_sub_graph(input_graph_def, output_node_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\graph_util_impl.py", line 205, in extract_sub_graph
_assert_nodes_are_present(name_to_node, dest_nodes)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\graph_util_impl.py", line 160, in _assert_nodes_are_present
assert d in name_to_node, "%s is not in graph" % d
AssertionError: Identity is not in graph

@breizhn
Copy link
Owner

breizhn commented Jun 29, 2020

Hi,

instead of using the SavedModel, build the build_DTLN_model_stateful fromDTLN_model.pyand load the weights from the ./pretrained_model folder.

Here the code for conversion:

import tensorflowjs as tfjs
from DTLN_model import DTLN_model

model_class = DTLN_model()
model_class.build_DTLN_model_stateful()
model_class.model.load_weights('./pretrained_model/model.h5')
tfjs.converters.save_keras_model(model_class.model, 'DTLN_js')

This code returned fine on my system.

Best,
Nils

@breizhn breizhn assigned breizhn and unassigned breizhn Jun 29, 2020
@shilsircar
Copy link
Author

Thanks it works. Now in the process of changing the lambda layers to custom layer that can be loadable from js. It looks like there are 3 I need to port 2 lambda for fft and ifft and one normalization layer. Is that correct understanding ?

@breizhn
Copy link
Owner

breizhn commented Jun 30, 2020

Yes that is correct.

@shilsircar
Copy link
Author

In your last checkin I noticed you now have converted to tflite by splitting the model. Are you also considering tfjs model to run in a browser. I have started porting the layers but its going slow. Basically I have created 2 Sublayers to change the lambda

class FFTlayer(tf.keras.layers.Layer):
def init(self, **kwargs):
super(FFTlayer, self).init(**kwargs)

def call(self, x):
    # expanding dimensions
    frame = tf.expand_dims(x, axis=1)
    # calculating the fft over the time frames. rfft returns NFFT/2+1 bins.
    stft_dat = tf.signal.rfft(frame)
    # calculating magnitude and phase from the complex signal
    mag = tf.abs(stft_dat)
    phase = tf.math.angle(stft_dat)
    # returning magnitude and phase as list
    return [mag, phase]

class IFFTlayer(tf.keras.layers.Layer):
def init(self, **kwargs):
super(IFFTlayer, self).init(**kwargs)

def call(self, x):
    # calculating the complex representation
    s1_stft = (tf.cast(x[0], tf.complex64) * 
                tf.exp( (1j * tf.cast(x[1], tf.complex64))))
    # returning the time domain frames
    return tf.signal.irfft(s1_stft)  

and now in the process of writing the serialization in javascript
tf.serialization.registerClass(FFTlayer);
tf.serialization.registerClass(IFFTlayer);
tf.serialization.registerClass(InstantLayerNormalization);

My question is are you considering a tfjsmodel as well ?

@breizhn
Copy link
Owner

breizhn commented Jul 2, 2020

I don’t have any experience with JavaScript, so I will probably not do it. I can port the model to ONNX similar to the tf lite model. ONNX also has JavaScript API and from my first look, the model must not be converted. The states and the fft must be handled outside the model similar to the tf lite model. But as I said, I don’t any experience whatsoever regarding signal processing in JS.

@shilsircar
Copy link
Author

Sounds good. Perhaps we can compare . I am getting familiar with the layers in your model. Actually in Tensorflowjs has all the necessary API's to perform the irfft and rfft so no real signal processing is necessary. Its just a matter of setting it up with the hooks for handling the serialization and correct shapes to model inputs at the various layers. I have never tired porting a custom layer so its going slow. For example JS code looks like this below. From the input layer to fft layer shape is not correct so experimenting with it.

<title>DTLN TensorflowJS</title>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/[email protected]"> </script>
<script>
class FFTlayer extends tf.layers.Layer{
   static className = 'FFTlayer';
   constructor()
   {
	console.log("FFTLayer Constructor");
	super({});
   }
   call(x)
   {
	   console.log("Call Called for FFTLayer");
		frame = tf.expand_dims(x, axis=1);
		stft_dat = tf.signal.rfft(frame);
		mag = tf.abs(stft_dat);
		phase = tf.math.angle(stft_dat);
		return [mag, phase]
   }
   
   computeOutputShape(inputShape) 
   {
	   console.log("Compute Output Shape!!!")
	   return [1,1,257];
   }
}
class IFFTlayer extends tf.layers.Layer
{
	static className = 'IFFTlayer';
	constructor(config)
	{
		super(config);
	}
	call(x)
	{
    	//s1_stft = (tf.cast(x[0], tf.complex64) * tf.exp( (1j * tf.cast(x[1], tf.complex64))))
		s1_stft = tf.complex(x[0], x[1]);
    	return tf.signal.irfft(s1_stft) 
	}
	
	static get className() {
		console.log(className);
	   return className;
	}
}

class InstantLayerNormalization extends tf.layers.Layer
{
	static className = 'InstantLayerNormalization';
	epsilon = 1e-7 
    gamma;
    beta;
	constructor(config) 
	{
		super(config);
	}
	getConfig() 
	{
		const config = super.getConfig();
		return config;
	}

	build(input_shape)
	{
		//shape = input_shape[-1:]
		shape = tf.shape(input_shape)
		// initialize gamma
		self.gamma = self.add_weight(shape=shape,
							initializer='ones',
							trainable=True,
							name='gamma')
		// initialize beta
		self.beta = self.add_weight(shape=shape,
							initializer='zeros',
							trainable=True,
							name='beta')
	}		 

	call(inputs){
		mean = tf.math.reduce_mean(inputs, axis=[-1], keepdims=True)
		variance = tf.math.reduce_mean(tf.math.square(inputs - mean), axis=[-1], keepdims=True)
		std = tf.math.sqrt(variance + self.epsilon)
		outputs = (inputs - mean) / std
		outputs = outputs * self.gamma
		outputs = outputs + self.beta
		return outputs
	}
	static get className() {
		console.log(className);
	   return className;
	}
}

tf.serialization.registerClass(FFTlayer);
tf.serialization.registerClass(IFFTlayer);
tf.serialization.registerClass(InstantLayerNormalization);
</script>
<script>
    console.log('Loading DTLN Model....');

    var model;
    async function loadDTLN_model()
    {
        model = tf.loadLayersModel('http://myserver:8000/model/model.json');
        //console.log(model);
        //console.log(tf.getBackend());
    }
    loadDTLN_model();
</script>

@breizhn
Copy link
Owner

breizhn commented Jul 3, 2020

When I look at

call(x)
   {
	   console.log("Call Called for FFTLayer");
		frame = tf.expand_dims(x, axis=1);
		stft_dat = tf.signal.rfft(frame);
		mag = tf.abs(stft_dat);
		phase = tf.math.angle(stft_dat);
		return [mag, phase]
   }
   
   computeOutputShape(inputShape) 
   {
	   console.log("Compute Output Shape!!!")
	   return [1,1,257];
   }

then probably the output shape is not correct. Let us try:

computeOutputShape(inputShape) 
   {
	   console.log("Compute Output Shape!!!")
	   return [[1,1,257],[1,1,257]];
   }

@shilsircar
Copy link
Author

Thanks that works. I can now load the fft layer. The output shape of the NormalizationLayer is (1, 1, 256) is this correct?

@breizhn
Copy link
Owner

breizhn commented Jul 3, 2020

Yes this is correct, but you can just use input shape as output shape, because the normalization layer does not change the shape.

computeOutputShape(inputShape) 
   {
	   console.log("Compute Output Shape!!!")
	   return inputShape;
   }

@shilsircar
Copy link
Author

Hi Nils,
Two question
Question 1. Would it be possible to use tfa.layers.InstanceNormalization as per https://www.tensorflow.org/addons/tutorials/layers_normalizations#instance_normalization_tutorial this saves me from porting this custom layer if I can use the TF adon Instance normalziation and replace your normalization which is a custom layer in model building.

Question 2: the function separation kernal in DTLN Model you have a comment about not using Lambda. I suspect there is a typo i.e using lambda the weights dont get updated correctly ? Also would it work if I change in the python code and make it a layer instead of function call - I mean not Lambda but extend layer and convert it.

I will try it and validate the model still works and correctly works but wanted to check first.

@breizhn
Copy link
Owner

breizhn commented Jul 6, 2020

Question 1. Would it be possible to use tfa.layers.InstanceNormalization as per https://www.tensorflow.org/addons/tutorials/layers_normalizations#instance_normalization_tutorial this saves me from porting this custom layer if I can use the TF adon Instance normalziation and replace your normalization which is a custom layer in model building.

Yes, you can use Instance normalization. It should do the exact same thing. I did not use the tfa layer, because I tried a lot around with the normalization.

Question 2: the function separation kernal in DTLN Model you have a comment about not using Lambda. I suspect there is a typo i.e using lambda the weights dont get updated correctly ? Also would it work if I change in the python code and make it a layer instead of function call - I mean not Lambda but extend layer and convert it.

You can just copy the layers from the function, if that is your question. But if you like you can also port the separation kernel to a custom layer, but I don't think thats a good idea, because it doesn't have any advantage. The function call was supposed to make model more readable and modular.

@shilsircar
Copy link
Author

shilsircar commented Jul 6, 2020

I have been able to convert and port the appropriate layers in TFJS. Below is the summary loading it in browser. Looks good to me. What do you think?

Will try inference on real audio data.


[email protected]:17 Layer (type) Output shape Param # Receives inputs
[email protected]:17 ==================================================================================================
[email protected]:17 input_1 (InputLayer) [1,512] 0
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 ff_tlayer (FFTlayer) [[1,1,257],[1,1,257] 0 input_1[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 lstm (LSTM) [1,1,128] 197632 ff_tlayer[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 dropout (Dropout) [1,1,128] 0 lstm[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 lstm_1 (LSTM) [1,1,128] 131584 dropout[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 dense (Dense) [1,1,257] 33153 lstm_1[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 activation (Activation) [1,1,257] 0 dense[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 multiply (Multiply) [1,1,257] 0 ff_tlayer[0][0]
[email protected]:17 activation[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 iff_tlayer (IFFTlayer) [1,1,512] 0 multiply[0][0]
[email protected]:17 ff_tlayer[0][1]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 conv1d (Conv1D) [1,1,256] 131072 iff_tlayer[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 instant_layer_normalization (In [1,1,256] 512 conv1d[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 lstm_2 (LSTM) [1,1,128] 197120 instant_layer_normalization[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 dropout_1 (Dropout) [1,1,128] 0 lstm_2[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 lstm_3 (LSTM) [1,1,128] 131584 dropout_1[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 dense_1 (Dense) [1,1,256] 33024 lstm_3[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 activation_1 (Activation) [1,1,256] 0 dense_1[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 multiply_1 (Multiply) [1,1,256] 0 conv1d[0][0]
[email protected]:17 activation_1[0][0]
[email protected]:17 __________________________________________________________________________________________________
[email protected]:17 conv1d_1 (Conv1D) [1,1,512] 131072 multiply_1[0][0]
[email protected]:17 ==================================================================================================
[email protected]:17 Total params: 986753
[email protected]:17 Trainable params: 986753
[email protected]:17 Non-trainable params: 0
[email protected]:17 __________________________________________________________________________________________________

@breizhn
Copy link
Owner

breizhn commented Jul 6, 2020

Well done! This looks really good!

@shilsircar
Copy link
Author

Thanks. next step would be to run inference with real time audio samples. One question I have is most of your examples for real-time are with saved model. However this tjfs conversion was on model.h5 I hope frame by frame procesaing would be possible on this keras model.

I will try out and see how it behaves with random input tensor of size [1,512] before passing real audio.

@breizhn
Copy link
Owner

breizhn commented Jul 7, 2020

Did you set the stateful=True flag for the LSTMs? Whiteout that, block by block processing will not work.

@shilsircar
Copy link
Author

One question. In the statefulDTLN model you use fft layer and the other one u use STFT. The difference looks like the block shift. So is the idea of not using the shift in statful case because it's done outside the model ? I mean for real-time case it's done post inference blocks of 128 shifted and input data is also shifted.

@shilsircar
Copy link
Author

Hi Nils,
I am having trouble with the last output layer. It seems TFJS 2.0.1 doesnt implement casual padding. I opened an issue with them to see what they say. tensorflow/tfjs#3578

I am getting this error when I run inference on the mode.

Uncaught (in promise) Error: The support for CAUSAL padding mode in conv1dWithBias is not implemented yet.

@shilsircar shilsircar changed the title TensorflowJS conversion fails for savedmodel TensorflowJS conversion Jul 8, 2020
@breizhn
Copy link
Owner

breizhn commented Jul 8, 2020

One question. In the statefulDTLN model you use fft layer and the other one u use STFT. The difference looks like the block shift. So is the idea of not using the shift in statful case because it's done outside the model ? I mean for real-time case it's done post inference blocks of 128 shifted and input data is also shifted.

Yes the difference is the block shift. The STFT "layer" is well suited for training and for processing whole sequences. And also yes, it is easier to handle the shift outside the model during real time inference. I wanted to create a system, which works one block in one block out. The shift is optional. But for a model without shift the whole thing must be retrained.

I am having trouble with the last output layer. It seems TFJS 2.0.1 doesnt implement casual padding. I opened an issue with them to see what they say. tensorflow/tfjs#3578

I am getting this error when I run inference on the mode.

Uncaught (in promise) Error: The support for CAUSAL padding mode in conv1dWithBias is not implemented yet.

Then I hope they will implement it soon. Padding "same" could probably also work, but I think the network must be retrained for that.

@shilsircar
Copy link
Author

shilsircar commented Jul 8, 2020

Hi Nils,
Thanks for all your insight and suggestion. I have finally a functional dtln model in tfjs. However, I had to use "Same" as padding mode due to the issue with Causal mode not supported for conv1d. I haven't tried on real audio samples to see the results since I was more focused on the port to TJFS and Perf on inference time. I had to rewrite a few layers fft, irfft, norm, logadd etc. In anycase this is what I have so far. So inference time [1,512] samples over 1000 samples average inference time webgl back end is:
3 years old windows laptop
Inference Time TFJS-DTLN: 13.54ms
Android Pixel2
Inference Time TFJS-DTLN: 17.62ms
Now this is before doing any shift. Do you think this will work for realtime noise suppression or is the latency too high?

Also how real audio works with "same" padding is to be seen :)

@vinod1234567890
Copy link

@shilsircar the latency needs to be less than 8ms for a 32ms block as @breizhn says in the Execution Times section of this repo's ReadMe.

Also, since you've got a working model, I have a couple of questions:

  1. Which model did you use for conversion? .h5 or savedmodel?
  2. Does tfjs have stateful LSTMs? Or did you handle states outside the model?

@shilsircar
Copy link
Author

shilsircar commented Jul 9, 2020

@shilsircar the latency needs to be less than 8ms for a 32ms block as @breizhn says in the Execution Times section of this repo's ReadMe.

Also, since you've got a working model, I have a couple of questions:

  1. Which model did you use for conversion? .h5 or savedmodel?

.h5 norm model

  1. Does tfjs have stateful LSTMs? Or did you handle states outside the model?

I didn't have to handle it outside. Tfjs handles sateful lstm [email protected]

Trouble is TFJS team told me they don't have any immediate plans to implement conv1dwithbias and causal padding. I feel it's a bug since the last layer bias is false.
Issue open: tensorflow/tfjs#3578

@vinod1234567890
Copy link

vinod1234567890 commented Jul 9, 2020

OK. I was able to load the model too. I used the model.h5 without mag normalization.
But after loading the model, the model.predict() is throwing an error: Input 0 is incompatible with layer lstm: expected ndim=3, found ndim=2.

Below is the model summary:

Layer (type)                    Output shape         Param #     Receives inputs                  
==================================================================================================
input_1 (InputLayer)            [1,512]              0                                            
__________________________________________________________________________________________________
lambda_Lambda1 (Lambda)         [[1,1,257],[1,1,257] 0           input_1[0][0]                    
__________________________________________________________________________________________________
lstm (LSTM)                     [1,1,128]            197632      lambda_Lambda1[0][0]             
__________________________________________________________________________________________________
dropout (Dropout)               [1,1,128]            0           lstm[0][0]                       
__________________________________________________________________________________________________
lstm_1 (LSTM)                   [1,1,128]            131584      dropout[0][0]                    
__________________________________________________________________________________________________
dense (Dense)                   [1,1,257]            33153       lstm_1[0][0]                     
__________________________________________________________________________________________________
activation (Activation)         [1,1,257]            0           dense[0][0]                      
__________________________________________________________________________________________________
multiply (Multiply)             [1,1,257]            0           lambda_Lambda1[0][0]             
                                                                 activation[0][0]                 
__________________________________________________________________________________________________
lambda1_Lambda11 (Lambda1)      [1,1,512]            0           multiply[0][0]                   
                                                                        lambda_Lambda1[0][1]             
 __________________________________________________________________________________________________
conv1d (Conv1D)                 [1,1,256]            131072      lambda1_Lambda11[0][0]           
__________________________________________________________________________________________________
instant_layer_normalization (In [1,1,256]            512         conv1d[0][0]                     
__________________________________________________________________________________________________
lstm_2 (LSTM)                   [1,1,128]            197120      instant_layer_normalization[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout)             [1,1,128]            0           lstm_2[0][0]                     
__________________________________________________________________________________________________
lstm_3 (LSTM)                   [1,1,128]            131584      dropout_1[0][0]                  
__________________________________________________________________________________________________
dense_1 (Dense)                 [1,1,256]            33024       lstm_3[0][0]                     
__________________________________________________________________________________________________
activation_1 (Activation)       [1,1,256]            0           dense_1[0][0]                    
__________________________________________________________________________________________________
multiply_1 (Multiply)           [1,1,256]            0           conv1d[0][0]                     
                                                                activation_1[0][0]               
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               [1,1,512]            131072      multiply_1[0][0]                 
==================================================================================================
Total params: 986753
Trainable params: 986753
Non-trainable params: 0
__________________________________________________________________________________________________

@shilsircar
Copy link
Author

How did you load the model in tfjs with all the lambda ? Based on your summary looks like it's keras model in python u loaded?

In anycase the reason you have that is probably because on the output of the mag,phase lambda is not compatible. The output of fft is mag and phase each 1,1,257 .. see comments above.

@vinod1234567890
Copy link

I just changed the names of the lambda classes in the model json to differentiate between the two Lambda layers as the weights are getting mixed up while loading.

Yep, I was able to debug that error. But, now I'm getting an error in the instant_norm_layer custom layer

@vinod1234567890
Copy link

vinod1234567890 commented Jul 10, 2020

Finally, I too was able to run it on realtime webbrowser!! But the average processing times are high for real time audio. I'm gettin an average of 13ms per a 32ms block on my laptop

@shilsircar
Copy link
Author

shilsircar commented Jul 10, 2020

@vinod1234567890 that's good. Yeah it's a bit on the higher side and majority of the time is spent is in Lstm. Nothing on the js i can do. Will be trying wasm with SIMD soon.

@vinod1234567890
Copy link

@shilsircar Clearer when compared to padding 'causal' you mean? these audio samples were cleaned in Python I assume? or in JS?

Is the high inference time for the first few iterations related to webgl-shader warm-up?

I too should look at threading using a webworker.

@vinod1234567890
Copy link

looks like tfjs with webgl backend doesn't support webworkers like WebAudio API's audioworklet:
https://github.com/w3c/mediacapture-record/issues/166

correct me if I'm wrong

@shilsircar
Copy link
Author

webworker with offscreen canvas should work based on this https://developers.google.com/web/updates/2018/08/offscreen-canvas
This means it should be possible to create a glcontext offscreen and run on a webworker. I haven't tried this yet if you make progress would be interested. I have been looking at the onnx dtln loading issue ... I wanted to compare both but looks like few limitations ..

@vinod1234567890
Copy link

I also tried to quantize the tfjs model to float16 and uint8 resulting in an 2MB and a 900kb models. But for some reason, the inference time is exactly the same when compared to the full model.

@breizhn Any idea why? Is it the number of OPs? I was assuming since tflite quantization increased speed, I would replicate it in tfjs

@breizhn
Copy link
Owner

breizhn commented Jul 13, 2020

@breizhn padding same works well. It does a good job of suppressing noise interspeech. In speech some minor noise leaks if higher frequency (bird chirps). I am experimenting with isolating the pitch harmonics before passing in to the module to see if that helps

The model had problems with bird chirps before, so that isn't a new behaviour. But it's cool to hear, that it also works with padding same!

@breizhn
Copy link
Owner

breizhn commented Jul 13, 2020

@breizhn I made some optimization to js side of things. Now I get ~11ms on my old windows laptop when I skip first 10 iterations as per measuretime python. I noticed first 10 iterations are always higher. Any insight to that why ? Not critical since I can always skip 300 ms in my code. Just curious.

I noticed that before. Maybe it is something about prioritizing the process and at the first iteration, the processes for running the network are initialized. That takes some time I think.

@breizhn
Copy link
Owner

breizhn commented Jul 13, 2020

I also tried to quantize the tfjs model to float16 and uint8 resulting in an 2MB and a 900kb models. But for some reason, the inference time is exactly the same when compared to the full model.

@breizhn Any idea why? Is it the number of OPs? I was assuming since tflite quantization increased speed, I would replicate it in tfjs

Maybe, the TFjs still calculates in float32. There are a lot of optimizations going on in tf-lite, like operator fusion and so on. Probably TFjs is not doing such stuff.

@shilsircar
Copy link
Author

@breizhn Is this correct translation for real-time use: I mean the block shift in audio real_time_dtln_audio looks like
128 samples input and 128 samples output = 8ms. I suspect thats why inference must be within 8 ms to avoid audio loss?

//set some parameters
var block_len_ms = 32
var block_shift_ms = 8
var fs_target = 16000
var block_shift = Math.floor(fs_target * (block_shift_ms / 1000))
var block_len = Math.floor(fs_target * (block_len_ms / 1000))
// create buffers
var in_buffer = tf.zeros([block_len], "float32");
var out_buffer = tf.zeros([block_len], "float32");
const tfzeros_blk_shft = tf.zeros([block_shift]);
function deNoiseFrame(audio_in, model)
{
/**
* prepare the in_buffer by shift left block_shift amount then
* concat audio_in such that in_buffer is 1D tensor of [512] as this
* is a requirement for the model input [1,512]
*/

audio_in = tf.tensor1d(audio_in);
in_buffer = in_buffer.slice([block_shift], [-1]).concat(audio_in);
var in_block = tf.expandDims(in_buffer,0);
// run inference on the model for in_block
var out_block = model.predict(in_block).squeeze ();
//write to buffer
out_buffer = out_buffer.slice([block_shift], [-1]).concat(tfzeros_blk_shft);
out_buffer = tf.add(out_buffer,out_block);
var audio_out = out_buffer.slice([0],[block_shift]);
audio_out = audio_out.dataSync();
//console.log("Audio_Out" , audio_out);
return audio_out;

}

@breizhn
Copy link
Owner

breizhn commented Jul 14, 2020

Yes, the shift is 8 ms and this is also the reason why the inference must be under 8 ms. For changing that the model must be retrained. A shift of 16 ms (256) will also work, if you retrain it, but it maybe has a bit decreased audio quality compared to the current version.

@shilsircar
Copy link
Author

Yeah you want maximum look ahead in the audio for best results for states. Makes sense... I have ran out of options now to bring tfjs below 8 ms consistently.
Are you considering to produce a 40 hr norm model with 16 ms shift?

Or for the 40 hr one is it the same training scripts and setup ? And the only thing I guess will change is shift parameter from 128 to 256?

@breizhn
Copy link
Owner

breizhn commented Jul 14, 2020

Yeah you want maximum look ahead in the audio for best results for states. Makes sense... I have ran out of options now to bring tfjs below 8 ms consistently.
Are you considering to produce a 40 hr norm model with 16 ms shift?

Or for the 40 hr one is it the same training scripts and setup ? And the only thing I guess will change is shift parameter from 128 to 256?

Yes, the only thing which changes is the shift.
At the moment I don't have time to provide code for the 40h case.

@shilsircar
Copy link
Author

shilsircar commented Jul 14, 2020

Yeah you want maximum look ahead in the audio for best results for states. Makes sense... I have ran out of options now to bring tfjs below 8 ms consistently.
Are you considering to produce a 40 hr norm model with 16 ms shift?
Or for the 40 hr one is it the same training scripts and setup ? And the only thing I guess will change is shift parameter from 128 to 256?

Yes, the only thing which changes is the shift.
At the moment I don't have time to provide code for the 40h case.

@breizhn
Would you be open to producing a 40hr model with 16 ms shift? I think for any sort of realtime on browser use would need at this range.

@hchintada
Copy link

Yeah you want maximum look ahead in the audio for best results for states. Makes sense... I have ran out of options now to bring tfjs below 8 ms consistently.
Are you considering to produce a 40 hr norm model with 16 ms shift?

Or for the 40 hr one is it the same training scripts and setup ? And the only thing I guess will change is shift parameter from 128 to 256?

Are you able to bring the tfjs inference to close to 8 ms?
I'm still stuck in high teens. Suggestions here please!

@shilsircar
Copy link
Author

Yeah you want maximum look ahead in the audio for best results for states. Makes sense... I have ran out of options now to bring tfjs below 8 ms consistently.
Are you considering to produce a 40 hr norm model with 16 ms shift?
Or for the 40 hr one is it the same training scripts and setup ? And the only thing I guess will change is shift parameter from 128 to 256?

Are you able to bring the tfjs inference to close to 8 ms?
I'm still stuck in high teens. Suggestions here please!

Without wasm SIMD tjfs won't get to 8ms or below. Audio is highly succeptible with delay than video ... So ideally processing time needs to be much smaller than 8 ms if 128 blockshift used ..

@shilsircar
Copy link
Author

shilsircar commented Jul 16, 2020

Tfjs team responsed that they will be adding complex number support which will be great for this tensorflow/tfjs#3585

If anyone is open to producing a model with 16 and 24 ms shift model I can test the realtime inference. @breizhn

@breizhn
Copy link
Owner

breizhn commented Jul 22, 2020

@shilsircar, sadly I don't have the capacity at moment to train the networks.

@hchintada
Copy link

@shilsircar the latency needs to be less than 8ms for a 32ms block as @breizhn says in the Execution Times section of this repo's ReadMe.
Also, since you've got a working model, I have a couple of questions:

  1. Which model did you use for conversion? .h5 or savedmodel?

.h5 norm model

  1. Does tfjs have stateful LSTMs? Or did you handle states outside the model?

I didn't have to handle it outside. Tfjs handles sateful lstm [email protected]

Trouble is TFJS team told me they don't have any immediate plans to implement conv1dwithbias and causal padding. I feel it's a bug since the last layer bias is false.
Issue open: tensorflow/tfjs#3578

I just found that in tfjs, conv1d doesn't have an option to set usebias to false. It only supports usebias - true.

@shilsircar
Copy link
Author

That's correct @hchintada but you can still get reasonable results without it. The main issue is tjfs while ok for image is still not ready for real-time low latency such as audio. I am hoping with wasm SIMD it may be better but without complex numbers it's not going to work.

@shilsircar
Copy link
Author

shilsircar commented Jul 28, 2020

@breizhn do you have any suggestion on data preparation to train 40hr norm model. I intend to try with 40 hrs hopefully 21 min per epoch reduced to 120 epoch might give satisfactory results. I intend to use the same DNS corpus data from your forked repo. My goal is to adapt to 24 ms latency.

@shilsircar
Copy link
Author

Closing this. The conclusion is DTLN is definitely portable for tfjs. And can me made to run completely in browser but not realtime in default configuration due to latency requirement of 8 ms cannot be achieved. Offline processing is possible and audio output is sufficiently clean. I am happy to write up instructions how to if anyone else is interested in experimenting. @breizhn thanks for all your help.

@kashikarparth
Copy link

Closing this. The conclusion is DTLN is definitely portable for tfjs. And can me made to run completely in browser but not realtime in default configuration due to latency requirement of 8 ms cannot be achieved. Offline processing is possible and audio output is sufficiently clean. I am happy to write up instructions how to if anyone else is interested in experimenting. @breizhn thanks for all your help.

Please do, it would be really useful. Now that SIMD is in play, experimentation on this becomes relevant again.

@dlutolf
Copy link

dlutolf commented Mar 1, 2021

Closing this. The conclusion is DTLN is definitely portable for tfjs. And can me made to run completely in browser but not realtime in default configuration due to latency requirement of 8 ms cannot be achieved. Offline processing is possible and audio output is sufficiently clean. I am happy to write up instructions how to if anyone else is interested in experimenting. @breizhn thanks for all your help.

@shilsircar can you post your js code for porting the custom layers or any other instructions? Do you have any other code for real time processing tests? Would like to experiment with this further, thanks!

@shubhamjoshi2130
Copy link

@shilsircar please can you share your js code for loading the model, and the custom layer creation code?

We tried converting keras model to Tensorflow.js , and also created all custom layers, but when we try to load the model in browser , the javascripts goes to some infinite loop freezing the page.

@Raulkg
Copy link

Raulkg commented Sep 16, 2021

Closing this. The conclusion is DTLN is definitely portable for tfjs. And can me made to run completely in browser but not realtime in default configuration due to latency requirement of 8 ms cannot be achieved. Offline processing is possible and audio output is sufficiently clean. I am happy to write up instructions how to if anyone else is interested in experimenting. @breizhn thanks for all your help.

@shilsircar Please provide some instructions to your js code and tests to run on the browser for latency. Thank you.

@husainnazer
Copy link

Is there any update regarding the instructions for converting?

@WujuMaster
Copy link

OK. I was able to load the model too. I used the model.h5 without mag normalization. But after loading the model, the model.predict() is throwing an error: Input 0 is incompatible with layer lstm: expected ndim=3, found ndim=2.

Below is the model summary:

Layer (type)                    Output shape         Param #     Receives inputs                  
==================================================================================================
input_1 (InputLayer)            [1,512]              0                                            
__________________________________________________________________________________________________
lambda_Lambda1 (Lambda)         [[1,1,257],[1,1,257] 0           input_1[0][0]                    
__________________________________________________________________________________________________
lstm (LSTM)                     [1,1,128]            197632      lambda_Lambda1[0][0]             
__________________________________________________________________________________________________
dropout (Dropout)               [1,1,128]            0           lstm[0][0]                       
__________________________________________________________________________________________________
lstm_1 (LSTM)                   [1,1,128]            131584      dropout[0][0]                    
__________________________________________________________________________________________________
dense (Dense)                   [1,1,257]            33153       lstm_1[0][0]                     
__________________________________________________________________________________________________
activation (Activation)         [1,1,257]            0           dense[0][0]                      
__________________________________________________________________________________________________
multiply (Multiply)             [1,1,257]            0           lambda_Lambda1[0][0]             
                                                                 activation[0][0]                 
__________________________________________________________________________________________________
lambda1_Lambda11 (Lambda1)      [1,1,512]            0           multiply[0][0]                   
                                                                        lambda_Lambda1[0][1]             
 __________________________________________________________________________________________________
conv1d (Conv1D)                 [1,1,256]            131072      lambda1_Lambda11[0][0]           
__________________________________________________________________________________________________
instant_layer_normalization (In [1,1,256]            512         conv1d[0][0]                     
__________________________________________________________________________________________________
lstm_2 (LSTM)                   [1,1,128]            197120      instant_layer_normalization[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout)             [1,1,128]            0           lstm_2[0][0]                     
__________________________________________________________________________________________________
lstm_3 (LSTM)                   [1,1,128]            131584      dropout_1[0][0]                  
__________________________________________________________________________________________________
dense_1 (Dense)                 [1,1,256]            33024       lstm_3[0][0]                     
__________________________________________________________________________________________________
activation_1 (Activation)       [1,1,256]            0           dense_1[0][0]                    
__________________________________________________________________________________________________
multiply_1 (Multiply)           [1,1,256]            0           conv1d[0][0]                     
                                                                activation_1[0][0]               
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               [1,1,512]            131072      multiply_1[0][0]                 
==================================================================================================
Total params: 986753
Trainable params: 986753
Non-trainable params: 0
__________________________________________________________________________________________________

Hello everyone, I'm trying to integrate DTLN into my app too, but after conversion from h5 to model.json, when loading the model I got the same issue Input 0 is incompatible with layer lstm: expected ndim=3, found ndim=2. Can anybody tell me what I should do now? (I cannot log the model because the error is from the tf.loadLayersModel function, I tried changing the output of FFT layer but nothing happened) I'm so stucked right now....

@StuartIanNaylor
Copy link

OK. I was able to load the model too. I used the model.h5 without mag normalization. But after loading the model, the model.predict() is throwing an error: Input 0 is incompatible with layer lstm: expected ndim=3, found ndim=2.
Below is the model summary:

Layer (type)                    Output shape         Param #     Receives inputs                  
==================================================================================================
input_1 (InputLayer)            [1,512]              0                                            
__________________________________________________________________________________________________
lambda_Lambda1 (Lambda)         [[1,1,257],[1,1,257] 0           input_1[0][0]                    
__________________________________________________________________________________________________
lstm (LSTM)                     [1,1,128]            197632      lambda_Lambda1[0][0]             
__________________________________________________________________________________________________
dropout (Dropout)               [1,1,128]            0           lstm[0][0]                       
__________________________________________________________________________________________________
lstm_1 (LSTM)                   [1,1,128]            131584      dropout[0][0]                    
__________________________________________________________________________________________________
dense (Dense)                   [1,1,257]            33153       lstm_1[0][0]                     
__________________________________________________________________________________________________
activation (Activation)         [1,1,257]            0           dense[0][0]                      
__________________________________________________________________________________________________
multiply (Multiply)             [1,1,257]            0           lambda_Lambda1[0][0]             
                                                                 activation[0][0]                 
__________________________________________________________________________________________________
lambda1_Lambda11 (Lambda1)      [1,1,512]            0           multiply[0][0]                   
                                                                        lambda_Lambda1[0][1]             
 __________________________________________________________________________________________________
conv1d (Conv1D)                 [1,1,256]            131072      lambda1_Lambda11[0][0]           
__________________________________________________________________________________________________
instant_layer_normalization (In [1,1,256]            512         conv1d[0][0]                     
__________________________________________________________________________________________________
lstm_2 (LSTM)                   [1,1,128]            197120      instant_layer_normalization[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout)             [1,1,128]            0           lstm_2[0][0]                     
__________________________________________________________________________________________________
lstm_3 (LSTM)                   [1,1,128]            131584      dropout_1[0][0]                  
__________________________________________________________________________________________________
dense_1 (Dense)                 [1,1,256]            33024       lstm_3[0][0]                     
__________________________________________________________________________________________________
activation_1 (Activation)       [1,1,256]            0           dense_1[0][0]                    
__________________________________________________________________________________________________
multiply_1 (Multiply)           [1,1,256]            0           conv1d[0][0]                     
                                                                activation_1[0][0]               
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               [1,1,512]            131072      multiply_1[0][0]                 
==================================================================================================
Total params: 986753
Trainable params: 986753
Non-trainable params: 0
__________________________________________________________________________________________________

Hello everyone, I'm trying to integrate DTLN into my app too, but after conversion from h5 to model.json, when loading the model I got the same issue Input 0 is incompatible with layer lstm: expected ndim=3, found ndim=2. Can anybody tell me what I should do now? (I cannot log the model because the error is from the tf.loadLayersModel function, I tried changing the output of FFT layer but nothing happened) I'm so stucked right now....

Sounds like it might be a bit like the tensorflow-lite conversion where the indexes get a bit confused.

#56 (comment)

Maybe swap the indexes around as discussed here?

@WujuMaster
Copy link

Hi,

instead of using the SavedModel, build the build_DTLN_model_stateful fromDTLN_model.pyand load the weights from the ./pretrained_model folder.

Here the code for conversion:

import tensorflowjs as tfjs
from DTLN_model import DTLN_model

model_class = DTLN_model()
model_class.build_DTLN_model_stateful()
model_class.model.load_weights('./pretrained_model/model.h5')
tfjs.converters.save_keras_model(model_class.model, 'DTLN_js')

This code returned fine on my system.

Best, Nils

@StuartIanNaylor I'm using this model.json file actually, and I can't seem to figure out how to visualize the input/output dims of all layers, because I can't even load the model from json file, it gives the error Input 0 is incompatible with layer lstm: expected ndim=3, found ndim=2. when loading, not predicting...
I'm trying to use tfjs-node to load the tflite files too but don't know how to use the Interpreter, basically tfjs doesn't support that and I even tried this repo but still not sure how should I set and get the tensors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests