Reason for using softmax activation function only during evaluation

shan18 · November 12, 2019, 10:10am

During training, the output obtained from the last hidden layer was directly used to calculate the CTC loss but during inference, a softmax activation function is applied on the output of the last hidden layer before sending the output to the CTC loss function.

github.com

mozilla/DeepSpeech/blob/4b29b78832036216b53f59b953639bde7cde7dfe/evaluate.py#L63




(batch_x, batch_x_len), batch_y = iterator.get_next()


# One rate per layer
no_dropout = [None] * 6
logits, _ = create_model(batch_x=batch_x,
                         seq_length=batch_x_len,
                         dropout=no_dropout)


# Transpose to batch major and apply softmax for decoder
transposed = tf.nn.softmax(tf.transpose(logits, [1, 0, 2]))


loss = tf.nn.ctc_loss(labels=batch_y,
                      inputs=logits,
                      sequence_length=batch_x_len)


tf.train.get_or_create_global_step()


# Get number of accessible CPU cores for this process
try:
    num_processes = cpu_count()

reuben · November 12, 2019, 10:31am

tf.nn.ctc_loss applies the softmax internally. The decoder expects the input to already have softmax applied to it.

shan18 · November 12, 2019, 11:50am

Ok, got it. Thank you.