Audio deletion tampering detection faces severe challenges in the field of digital audio authentication, particularly under anti-forensic attacks. To address the difficulties in detecting and locating deletion tampering, a multi-stage detection and multimodal localization method for audio deletion tampering is proposed. Firstly, a header information analysis method is designed to screen out audio files suspected of undergoing header/footer deletion tampering. Subsequently, a column-average-based constant Q spectral sketch feature is introduced, along with a middle deletion tampering classification network that leverages a deep residual shrinkage network and an attention mechanism. Next, by integrating the results from header information analysis and the classification network, a comprehensive judgment is made on whether the audio deletion tampering has occurred. Finally, for detected middle deletion tampering, a localization method combining wavelet packet analysis with multimodal features is proposed. Comparative experimental results demonstrate that the proposed method can effectively detect header/footer deletion tampering and accurately locate middle deletion tampering. Specifically, the accuracy, precision, recall, and F1 score for middle deletion classification all exceed 98%, and the method exhibits enhanced robustness and localization accuracy when faced with conventional signal processing attacks.