MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition