I believe that we’re actually dealing with two different issues there, depending on the notion of (un)interesting and what we want to do with the camera.
1. Interesting vs. uninteresting
Example use case: recording wild animals.
This case is categorized by the fact that the user will be able to watch movies and classify manually interesting frames (e.g. “there is a lion”) vs. uninteresting frames (e.g. “it’s just the wind blowing in the trees”).
I suspect that we can learn to classify interesting/uninteresting with Deep Neural Networks. I also suspect that we don’t want to provide as input only the current frame, but a mega-image built from e.g. the last 5 seconds.
Any input from someone with experience in Deep Neural Networks (or other learning techniques) would be interesting.
2. Normal vs. odd
Example use case: detecting that someone has fallen in the stairs.
This case is categorized by the fact that we typically have a corpus of “normal” frames (e.g. people walking the stairs up or down, people looking at the ceiling) but no samples of “abnormal” frames (e.g. people falling).
It is very unclear to me how we can classify/learn to discover abnormal frames.