Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification