Some kind of analysis of the transcript is probably the way, maybe coupled with AI analysis of some snapshots from each video. But to do this, YouTube would need to have such an index, or you'd have to make one. Or maybe bypass YouTube's search API completely and source YouTube videos from links in a general web search.
If somebody opens YouTube for entertainment purposes, to find something to watch, then I understand why you'd want to sanitize thumbnails. But if I'm searching for information, I'd like the original thumbnails to remain, so that I can know to avoid videos with clickbait thumbnails, since the video itself will be low quality.