An Alternate Policy Gradient Estimator for Softmax Policies