OK This is a very interesting discussion 😋.
Here is my opinion.
1.
Even Google returns AI slop often (specifically those AI-generated videos).
Google is often fooled by auto-generated PDF files claiming to be books, but contain nothing but metadata and blurbs. This was commonplace even before AI (these are scrape-and-generate scripts).
And given Kagi indexes are based on those like of Google, the presence of AI slop is understandable.
However, Kagi is in a favorable position of being able to filter out the slop as many users here expect.
As Vlad mentioned, a ML model can be a very effective solution here.
2.
Being able to report AI slop, I feel empowered.
This is the kind of feature I love and expect from a community-serving service like Kagi.
This is not to say it is perfect, but to emphasize that the presence of such a feature, even if imperfect, is greatly valued.
3.
Now to the most critical part of the discussion (and this is really exciting :D):
If you let users tag contents themselves, you run into many problems:
― malicious users will report useful and valuable results
― biased users ('activists') will report contents they do not like, even if it is useful or acceptable to others
― unethical e-marketers will try to shove down competing results and push up theirs (I've seen this on a local real estate community ads site).
― confused or presumptuous users may mistakenly report good content.
― some users may accidentally report a site.
For the last case, a simple UI feature can fix that: ability to retoggle the report off (this is faster than visiting the Kagi Settings page and looking for that entry).
As for relying on user contributions: REVIEW, as currently made, rather than automated, is the best solution. If volume of reported contents grows large or huge, reliance on automated tools (eg: trained ML models from HUMAN REVIEWS) is probably effective.
One user suggested reputation points for users based on their contributions: the more accurate their reports, the more weigh their report gets. This can be used as a weighing metric in evaluation of a reported item iff it is too difficult to identify if it is slop. Another metric would be if it has been reported by other users. This would only apply to very sly AI generated content. At the present time, AI content is easy to identify to require that level of heuristics.
But this kind of heuristic analysis may be necessary if things get very gloomy. For the fun of it, here is a suggested algorithm: progressing in checks until a certain level of decision confidence is reached: content itself → analyzing other contents by channel/website (ie: stepping up from content to content author) → reporter reputation → reported by others and their mean reputation? → human review.
The user also suggested a voluntary community effort to review contents. I think this is very prone to abuse. As mentioned earlier, there will always be someone trying to downrank competing results, increase their karma points by reviewing blindly, etc. This is the sad reality of Campbell's observation (any metric used for evaluation can become a target for abuse).
The good thing here is, given Kagi is a paid service with a relatively small user base, it is less likely for such probabilities to manifest. This is unless there is targeted or intentional trolling by someone willing to pay just to do that (or be hired). As the user base grows, the service gains popularity, and/or the price lowers, the risk increases.
So what am I recommending then?
Basically:
- Keep the 'Report as AI-generated' feature. It is really empowering, even if imperfect.
- Require human review of reports to avoid abuse of the feature. If volume is huge, ML assistance may help.
- Turning review into a communal effort (ie: giving power to the community to influence results), due to Campbell's law, can be a really bad idea, especially as user base grows and becomes more diversified, as there will always be trolls, biased users, unethical marketers, karma seekers, etc.
- We expect Kagi to do a better job, and believe it is in a favorable position to, filter out AI contents from search results. Utilizing ML, as suggested by Vlad, sounds like a great idea.
All the best.