Analyzing the Performance Portability of Tensor Decomposition