Problem with training using static_rnn

tkb_zh · January 6, 2020, 6:16am

Hi,
For some reason, I want to train the Deepspeech model with static_rnn function using LSTMCell in the normal TensorFlow environment (not TFLite), but when I replace rnn_impl_lstmblockfusedcell with rnn_impl_static_rnn, it shows:

 File "DeepSpeech.py", line 959, in main
    train()
  File "DeepSpeech.py", line 494, in train
    gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
  File "DeepSpeech.py", line 321, in get_tower_results
    avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
  File "DeepSpeech.py", line 248, in calculate_mean_edit_distance_and_loss
    logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl)
  File "DeepSpeech.py", line 199, in create_model
    output, output_state = rnn_impl(layer_3, seq_length, previous_state, reuse)
  File "DeepSpeech.py", line 101, in rnn_impl_lstmblockfusedcell
    x = [x[l] for l in range(x.shape[0])]
TypeError: __index__ returned non-int (type NoneType)

I have also set the export_tflite flag to TRUE, but it didn’t help here. Where I missed?
Thank you.

lissyx · January 6, 2020, 9:18am

For what reasons ?

Are you sure about what / how you did it ? This would suggest it’s still usign the LSTMBlockFusedCell implementation.

tkb_zh · January 6, 2020, 9:31am

I want to change the activation function to be the linear activations in the lstm so that I can build privacy-preserving deepspeech version. But it’s too difficult to change it in lstmblockfusedcell so I want to change activation in rnn_cell.LSTMCell. If you have any better idea, please tell me, thank you.
uhm, I’m not very sure about what you mean about the

usign the LSTMBlockFusedCell implementation

but in fact, I change the codes in rnn_impl_lstmblockfusedcell directly as follows:

def rnn_impl_lstmblockfusedcell(x, seq_length, previous_state, reuse):
    with tfv1.variable_scope('cudnn_lstm/rnn/multi_rnn_cell/cell_0'):
        
        #    log_info('I am in LSTMBlockFusedCell...')
        #    fw_cell = tf.contrib.rnn.LSTMBlockFusedCell(Config.n_cell_dim,
        #                                                forget_bias=0,
        #                                                reuse=reuse,
        #                                                name='cudnn_compatible_lstm_cell')
#
#            output, output_state = fw_cell(inputs=x,
#                                           dtype=tf.float32,
##                                           initial_state=previous_state)

        fw_cell = tfv1.nn.rnn_cell.LSTMCell(Config.n_cell_dim,
                                            reuse=reuse,
                                            activation=relu,
                                            name='cudnn_compatible_lstm_cell')

        # Split rank N tensor into list of rank N-1 tensors
        # log_info(x.shape[0])
        x = [x[l] for l in range(x.shape[0])]

        output, output_state = tfv1.nn.static_rnn(cell=fw_cell,
                                                  inputs=x,
                                                  sequence_length=seq_length,
                                                  initial_state=previous_state,
                                                  dtype=tf.float32,
                                                  scope='cell_0')

        output = tf.concat(output, 0)

    return output, output_state

I know it’s not good to change this way, but I just want to see whether this can run successfully so that I can do further changes.

lissyx · January 6, 2020, 9:52am

Can you elaborate here ? I’m not sure I get the link between linear activation and privacy-preserving deepspeech.

That’s error-prone. Please have a look at the rnn_impl variable, this is how we change the implementation. In your case, maybe you just want to change rnn_impl= default value of create_model and in calculate_mean_edit_distance_and_loss.

tkb_zh · January 6, 2020, 10:11am

In my case, I want to use Deepspeech model to inference on additively-secret-shared data. And only addition and multiplication are accepted.
Yes, I have changed there (set rnn_impl=rnn_impl_static_rnn in both create_model and calculate_mean_edit_distance_and_loss) at first but shows the same error message.

I don’t know whether it’s because of my version of keras, but I’m on

MacOS
Python3.6.5
Keras-Applications 1.0.8
Keras-Preprocessing 1.0.8
tensorflow 1.14.0
tensorflow-estimator 1.14.0

lissyx · January 6, 2020, 10:21am

I still don’t understand the requirement here. That looks more like hardware-related requirements.

You do understand we have rnn_impl_static_rnn actually used when producing the TFLite inference model for instance ?

You do understand this is 100% not something we support ? Though, it would be great to have a python stack that actually matches replacing both the rnn_impl= in default value of create_model as well as calculate_mean_edit_distance_and_loss.

You don’t need Keras.

tkb_zh · January 6, 2020, 11:14am

The additive secret share is a form of encryption like homomorphic encryption which only supports addition and multiples. Maybe you can think it as something hardware-related.
And as you know, in the whole inference of Deepspeech(any other deep learning), most of the operations are multiplications and rest of them are non-linear activation functions. So if I want to do the inference, I have to change the non-linear functions to be approximated linear functions. Thatz why I want to change the lstmblockfusedcell.
Yes, I know rnn_impl_static_rnn is used for TFLite but I just think it may be okay to use this to be the RNN function in normal TF environment. Is there other good way to change the RNN function?
Okay, and sorry I’m not very familiar with deep learning.

lissyx · January 6, 2020, 11:20am

That’s out of my scope, sorry. I’m really unsure about what you are trying to achieve here …

Here there is something plainly wrong / confusing. At first you were focusing on training, now you say that your goal is inference. I insist, but we already have different inference graph.

If you are worried about non-linear behavior, there are likely other pain points in the model.

Why don’t you just use the TFLite model ?
And if you absolutely need a Protocolbuffer file, why don’t you just hack export() and change create_inference_graph(batch_size=FLAGS.export_batch_size, n_steps=FLAGS.n_steps, tflite=FLAGS.export_tflite) to create_inference_graph(batch_size=FLAGS.export_batch_size, n_steps=FLAGS.n_steps, tflite=True). This way you get the TFLite-targetted inference model but you export it as protobuf.

tkb_zh · January 6, 2020, 11:30am

Sorry for confusing here. The final goal is inference, and why I am focusing on training is that I thought the change of activation may decrease the accuracy largely especially in the case of linear functions.(in fact, not sure)

I can handle ReLU and sigmoid but not tanh and this the reason to change the lstm part.

That is because the engine I’m going to use for cryptography implementation is based on tensorflow. But maybe I should have a good look at tflite. Thank you.

lissyx · January 6, 2020, 11:40am

We have verified a ~+2% (8.2->10.2% WER) when running TFLite implementation. So there’s degradation, and maybe not just because of the LSTM activations. But you should just save your own sanity and try to export protobuf model with the same inference graph as tflite model, that’s the simplest and most efficient way for you, from my understanding.

tkb_zh · January 6, 2020, 11:41am

Okay, Thank you very much, I will have a try.