I’d love to have my own data indexed in Kagi, this would really be a game changer like @ragnor said.
Examples of my personal use cases:
I could connect my private Zettelkasten / knowledge base so that when I search, I can see results from my own notes. My notes don’t have a URL, but I could easily publish them on a private hosted where they’d be identifiable by a URL.
I have 10,000+ articles saved in Instapaper, but Instapaper’s search sucks so I rarely use it. I’d love to have better search and resurface saved content more often. Instapaper content already has a URL, it’s just that it’s private. So if I could give the data to Kagi to index, then Kagi could show the title and URL of the article and link to Instapaper.
I follow 200 websites using RSS feeds. I’d love to have a way to search the content of these feeds.
More generally, here’s an implementation idea.
Kagi could offer an API endpoint to add, list and delete “documents”. A document is an HTML page with a URL and some metadata (e.g. title, created_at, etc). I think this API would be sufficient to cover the use cases presented here so far. It is generic enough to offer a lot of possibilities.
More things to look into:
- is this document model enough, e.g. what about images?
- how will Kagi handle the storage (maybe it should index the data at creation time, keep the index, and throw it away?)
While writing that I’ve just thought of another idea, perhaps even lighter on Kagi’s side in terms of development:
What if we could tell Kagi to index a specific website, which then Kagi would crawl while presenting a secret token?
This way, Kagi would reuse their crawler and all the logic they’ve built already. Developers can then present any content they wish (tweets, private notes, Instapaper articles) by making it available via the public internet but only to authenticated visitors. The authentication could be done via an HTTP header presented by the Kagi crawler.
This could be configured in Kagi’s settings as a base path (HTTP URL) to start crawling from, and an auth header name and value.
Of course, this doesn’t mean the developer needs to generate HTML pages for all the data they wish to be indexed. This authenticated HTTP website/server is just a way to communicate with Kagi. The server could be dynamic, e.g. notes.alex.com/ could show all the notes with links to notes.alex.com/note1, note2, etc. For example, for an email app, the server could query the mailbox and render emails as HTML pages on demand when requested by Kagi.
I find this model very natural, given Kagi’s nature 😄
I think this 2nd implementation would also be flexible enough to cover the ideas presented above.
I’d pay $30-$50/mo if Kagi had a feature like this.