Focus on filtering out AI generated content and search results

BinaryFireball

This is less a well defined feature and more of a general appeal. With the recent dawn of generative AI we are seeing massive amounts of trash flood the internet. Bots are creating "content" at incredible rates and it is drowning out content made by people...for people! Relevant and truthful information now has another middleman and tldr it sucks. I would like Kagi to address this issue head on as I believe you all are the only ones in position to do so.

Features / Ideas (some overlap):

AI content is tagged as such and can be excluded from search results vis user settings & search UI
Disabling AI search features in settings (if not already implemented), I'd rather have a structured search language with keywords (e.g 'cats with hats TYPE: article FROM: 2012-2023 TAGS: pictures, calico, fancy'), then asking a question in a conversation format (it takes too long, is often wrong(AI yes men), and I talk to enough people everyday thank you) The backend can/should use AI as I don't see a way around classification of results
Kagi can provide a certification & registry for sites: "content for humans made by humans".
- content creators/ websites can apply for verification (expires annually?) (for a small fee if need be). When accepted their domain is given the tag/badge. Kagi users can search with the tag/badge as a filter.

In general, these features would be in the user settings to be applied as defaults and available as filter options on the search page.

Thibaultmol

BinaryFireball

while ai content itself is not visually flagged, there are certain things already being downranked. one of them is large amounts of trackers/ads which can be an indicator of low quality content. AI slop sites are covered in them as that they're business model
'structured search l language' exists just like any other search engine, there is no 'ai search feature in kagi atm' unless you're referring to Quick Answer
I don't think the fee thing would work because kagi doesn't have the scale where people would care to pay them. But some kind of human indicator would be interesting I suppose, but hard to do. picking out ai might be easier. Do know that Kagi already promotes Small Web content if applicable over other content as those sites should normally be human made

There is also an initiative going on on the Kagi Discord to dig deeper into sites that 'might' be ai or not and flag them. It's still WIP

EDIT: this is also something that's being explored https://kagifeedback.org/d/4502-opt-in-collaborative-shared-result-ranking

RoxyRoxyRoxy

Thanks for your input on this, I (and everyone on the team I'm sure) agree that AI can pose quite a risk to the quality search results we want to deliver. @Thibaultmol raised some good points in response here and I'd like to add a few as well :)
We already have AI image detection, and we're continually working on both improving that detection and AI detection in general results. As Thibault said we already have Small Web which is pretty similar to your proposal for a "human certification".
Marking this as under review for anyone else to chime in!

kirkmc

Yes, but how? AI slop sites don't have big MADE WITH AI disclaimers. There are some telltale signs of AI generated text, such as certain types of headers, lots of bullet points, but as above, there are two posts with bullet points just like AI often generates.

Perhaps there can be a way of flagging sites that have a lot or mostly AI slop. In the current up/downrank menu, there could be an AI button that users could click to report a slopsite. If there are enough reports like that, perhaps Kagi could automatically downrank the site.

Thibaultmol

kirkmc I think filtering out the obvious ai generated ones would already help a lot.

I think community helping with reporting like the suggestion i mentioned above, will help

Backfeed

TL;DR: LLMs are effective at detecting themselves. I think their accuracy could be vastly improved by further training them for this specific usecase and by prompt engineering. Cost may be a prohibitive factor of implementing this approach, although good practices could alleviate this quite a bit. If good practices alone prove to be insufficient, limiting how many sites are scanned monthly or by search may help. An additional subscription or limiting it to the highest tier may also be options in this case. Additionally, this approach may introduce additional latency, but most of what solves costs solves this as well. I believe this would overall greatly improve search quality.

kirkmc I've just done some (very - sample size 5) limited testing and it appears that even with a very naive and off-the-shelf approach, AI can recognize itself quite well (4/5; 0 false positives). Here's what I did, specifically:
1.) I tested some search results as I already had an example in mind and that's the intended end use case. I used Qwen-235B (Thinking) for all of these tests. Probably a bit overkill. The following two sites were tested - text only:

https://www.khavaranparaffin.com/en/articles/wax-emulsion-manufacturing/ (Very confident it's human. originality.ai - an ai detection service (which I don't think is a good option for kagi, will get into it later) - agrees)
https://www.khonorwax.com/how-to-make-paraffin-wax-emulsion-a-stepbystep-guid.html (blatantly AI; originality.ai also agrees)

It got both of them right.

2.) Gave the AI an old document of mine. It got it right that it was written by a human.
3.) From this point on I used two instances of Qwen. I'll refer to these as agents 1 and 2 respectively. Anyhow, I tasked agent 1 with generating a short-medium length article about the differences between ibuprofen and acetaminophen (paracetamol). Agent 2 was then asked to ascertain as to whether or not the text was AI generated. It answered correctly that it was. I then shut down both instances and spun up two new ones. Agent 1 neo was then asked to write a "short-medium article about the differences between ibuprofen and paracetamol in the style of a small-web blogpost". Additionally, it was informed of the situation and was told that it was generating synthetic data for this experiment and that it was to sound human-like and to cover its tracks. Agent 2 neo was then of course asked to determine if the text was AI generated. It answered correctly. Then, for the final experiment, I restarted agent 1's instance again. Agent 2 was then tasked with humanizing the text it was given using the insights it gained from detecting that it was AI generated. It successfully fooled agent 1 neo².

From this it was clear that LLMs can definitely be used to detect themselves. They are not perfect, although it was promising that no false positives showed so far - downranking a human site for is much worse than not detecting an AI site in my books. It's clear that they can definitely be fooled if enough effort is put in, but I highly doubt that AI slop mills would ever do that - something also pretty promising. Additionally, it's important to note that this was text only. I've noticed that a hallmark feature of AI slop sites are their stock images, something that could also probably be detected and accounted for in the final verdict.

I believe that any sort of prompt engineering or post-training would greatly improve detection accuracy. For example, the AIs seemed completely oblivious to the fact that they love their em and en dashes (—). Sure, it's used sometimes, but rarely outside of books and academia. Informing Agent 1 neo² about this would've made this experiment 5/5. This video does a great job of explaining some other signs, at least from my experience:

Note that, especially for the subtler signs, lots of fine tuning would likely be necessary if even possible.

As for existing solutions like originality.ai, I think that they aren't a good fit for kagi due to potential data privacy issues. Also, it's likely to be more expensive in the long run and they're mostly made for documents, not websites.

While we're on the topic of cost:

"The average Google user searches three or four times per day or about 100 times per month."
The average search yields about 30 results without expanding it further (which almost no one does)

Assuming a worst case scenario where all of the searches are unique or result caching isn't implemented, this could bring the cost per user up to $9 a month (way too high when kagi's lowest tier is $5/month). Of course, that's using cost estimates from kagi assistant for Qwen-235B (Thinking) + assuming a worst case scenario. Cheaper models would likely work equally as well if sufficiently trained - you could even try distillation as it should work well in this scenario as far as I understand (https://en.wikipedia.org/wiki/Knowledge_distillation). This could probably bring the cost per evaluation down to around $0.001, so a total cost of $3 in the aforementioned worst case scenario. Training would also cost of course.
If caching doesn't alleviate costs, some other options to consider may be limiting scans to only the first 10 or so sites (most users click the first three or so links anyways) or making it something the user has to do manually akin to site summarization currently.
Personally, I'd be happy with paying a small fee (up to $5/month) or upgrading to a more expensive plan for such features, but I can't speak for everyone.

Search speed may also suffer with this approach as the LLMs need time to scan the sites. I suppose displaying some sort of loading icon may be an option? Like, it first displays the "raw" results to the user and only then starts retrieving scan results/scanning the sites.

To conclude, I think using LLMs to scan for AI generated text has great potential. My small experiment has shown that they seem to pick up on it quite well even when not specifically post-trained for it or instructed on how to do it. While they can be fooled with some amount of effort, I doubt that most slop sites would go to such lengths - especially as kagi hasn't got that big of a user base. Costs could be an issue, although I find this unlikely if good caching and the right model are applied. This should also reduce latency. Overall, I believe that implementing this sort of scanning would greatly benefit search quality. If I want an AI answer - I'll just ask AI. I hate it when I get AI slop in search results as it's not credible and wastes my time.