Problem decoding Breton audio

I’m using Common Voice to train a real-time spoken language identification system. To extract audio from the tar.gz files I’m using the following Python code

def language_reader(path):
    def get_language_data(code):
        with tarfile.open('{}/{}.tar.gz'.format(path,code)) as tar:
            clips = [clip for clip in tar.getmembers()
                         if clip.name.endswith('.mp3')]
            for clip in clips:
                tar.extract(clip)
                for sample in open_mp3(clip.name):
                    yield sample
    return get_language_data

def open_mp3(data):
    mp3 = subprocess.Popen(['ffmpeg','-i',data,
                             '-f','wav','-acodec','pcm_s16le','-ac','1','-ar','16000','-'],
                              stdout=subprocess.PIPE)
    Running = True
    while Running:
        sample = mp3.stdout.read(2)
        if sample == b'':
            Running = False
        else:
             yield int.from_bytes(sample,byteorder='little',signed=True)
    os.remove(data)

This works fine most of the time. However, for common_voice_br_17332422.mp3 in the Breton dataset, ffmpeg freezes up and the whole system hangs waiting for it to send the data. Does anyone know how to fix this? The file plays normally in VLC.

Examining the ffmpeg process with System Monitor shows “Waiting channel: pipe_wait”

That might just be a corrupted file, and VLC is better at reading it ?

I think I’ve worked it out. I’m piping data between several processes, and I think that something downstream crashed, causing all the pipes to back up.