Which checkpoints I can delete if i have storage problem

I am fine-tuning deepspeech 0.4.1 model on aws and now checkpoint size is more than 7 gb I have only 15 gb of size on aws so i am getting OOM error.
so which old checkpoints can i delete.

OOM does not mean you are running out of disk space. It means you are running out of GPU memory.

Also we can’t decide which checkpoints you can delete as we don’t know which ones are important to your particular use case.

@kdavis i dont remember the exact error but it said resource exhause error and also my aws disk was 100% full when i deleted some files it started training.

now my disk is full because of all the checkpoints and now I have deleted all files which are not mentioned in checkpoint txt

model_checkpoint_path: "model.ckpt-228410"
all_model_checkpoint_paths: "model.ckpt-228363"
all_model_checkpoint_paths: "model.ckpt-228375"
all_model_checkpoint_paths: "model.ckpt-228387"
all_model_checkpoint_paths: "model.ckpt-228399"
all_model_checkpoint_paths: "model.ckpt-228410"

ex- all files before 228300 and now my fine-tuning started normally.

is this the right way.

This seems reasonable. However, again we can’t decide which checkpoints you can delete as we don’t know which ones are important to your particular use case.

thanks, have to go with reasonable then, no other option:thinking::thinking: