Some websites should be indexed, regardless of their robots.txt. One (my only) example of this is LegInfo, California's official website for state law. Because their tech department is incompetent, their robots.txt is
User-agent: *
Disallow: /
Disallow: /billPdf.xhtml$
Disallow: /billAnalysisClient.xhtml$
Crawl-Delay: 10
Sitemap: https://leginfo.legislature.ca.gov/sitemap.xml
I should be able to search a law (by standard citation!) and get the official source first, not buried deep in results or not visible at all. Too often I have to search through their website rather than a search engine, and this is absurd. I don't have a good suggestion for a general-purpose rule about when to respect or ignore robots.txt, but at least in this case, I see no reason to respect it.