API - Custom Indices

ragnor

Would be great to have the ability to index custom documents/data to be shown alongside results.
These custom indices could include personal files, emails, datasets.
I would imagine an API similar to Algolia which allows indexing of data for search.

Vlad

ragnor Can you dive deeper into aspects of technical implementation of this as you imagine it for us not familiar with Algolia API? Namely how would be indexation of personal documents performed across various data providers?

ragnor

Vlad
Kagi would provide a POST API endpoint that accepts a JSON array of documents to be indexed.
Kagi would then index all string attributes in the JSON object and potentially allow for custom ranking parameters.

See Algolia docs for reference: https://www.algolia.com/doc/guides/sending-and-managing-data/prepare-your-data/

This would really be a game changer. In the future, I imagine an ecosystem where developers can build "extensions", i.e., custom indices or index building functions that non-developers can just plug in to their personalized search engine.

Vlad

ragnor Interesting thought. Problem is ditribution. In orders for ecosystem to thrive there needs to be alot of users to begin with but Kagi is a paid product.

I personally like this idea, but see a lot of effort for potentially minimal or zero benefit for current user base.

Lets see if it gets more upvotes.

Browsing6853

This is a big one, basically importing some of you.com and Neeva's ideas and minimizing the distance between the search engine and other information aggregators.

The idea is to allow to add apps to the lens, so they can do even more specific searches. This comes with the performance degradation, which may not be acceptable for ordinary searches, but can be tolerated if you are doing a niche search and really need quality.

Examples:

social media lens: Integrate your own facebook/instagram/twitter account and use their own search APIs to search for content that may not even be available for web crawlers.
personal lens: Integrate your different email, cloud, calendar, contacts providers so you don't need to use only Google/Apple to find everything in a few clicks.
better quality lens overall: External crawlers are good for most of the people, but they are usually not better than the own service's search and sometimes you need more match quality in your results. (e.g. No match for "plist.bundle", despite the term existing on prominent sites such as stackoverflow and github.).

An important point to discuss is about privacy and whether the APIs will be called from the client or the server.

(you.com: "Websites and sources are presented as 'Apps' on the search results to discover answers at a glance, dive deep into specific findings, or take action directly from search results.")

(Neeva: "To make searching even more convenient, sync your email, calendar, and documents so you can find everything you need from one search bar.")

Vlad

Browsing6853 Another way is to do this

https://kagifeedback.org/d/414-api-custom-indices

It is more powerful (infinite use), more privacy friendly (you control what is ingested) but less user friendly (requires technical knowledge and effort to set up)

Not quite sure on the original idea. We should offer Twitter search as a lens, but it would not require you to connect your account.

Heptic

I'd want all my emails, messages etc to be indexed on device and possibly injected via browser extension however this would make it inaccessible for mobile users

Browsing6853

Vlad Not quite sure on the original idea. We should offer Twitter search as a lens, but it would not require you to connect your account.

What you.com does that I find interesting is showing the results without account but also allowing you to connect your account if you want. This should be useful to search tweets from people with private accounts or search for code/issues in private Github repositories.

aluxian

I’d love to have my own data indexed in Kagi, this would really be a game changer like @ragnor said.

Examples of my personal use cases:

I could connect my private Zettelkasten / knowledge base so that when I search, I can see results from my own notes. My notes don’t have a URL, but I could easily publish them on a private hosted where they’d be identifiable by a URL.
I have 10,000+ articles saved in Instapaper, but Instapaper’s search sucks so I rarely use it. I’d love to have better search and resurface saved content more often. Instapaper content already has a URL, it’s just that it’s private. So if I could give the data to Kagi to index, then Kagi could show the title and URL of the article and link to Instapaper.
I follow ₂₀₀ websites using RSS feeds. I’d love to have a way to search the content of these feeds.

More generally, here’s an implementation idea.

Kagi could offer an API endpoint to add, list and delete “documents”. A document is an HTML page with a URL and some metadata (e.g. title, created_at, etc). I think this API would be sufficient to cover the use cases presented here so far. It is generic enough to offer a lot of possibilities.

More things to look into:

is this document model enough, e.g. what about images?
how will Kagi handle the storage (maybe it should index the data at creation time, keep the index, and throw it away?)

While writing that I’ve just thought of another idea, perhaps even lighter on Kagi’s side in terms of development:

What if we could tell Kagi to index a specific website, which then Kagi would crawl while presenting a secret token?

This way, Kagi would reuse their crawler and all the logic they’ve built already. Developers can then present any content they wish (tweets, private notes, Instapaper articles) by making it available via the public internet but only to authenticated visitors. The authentication could be done via an HTTP header presented by the Kagi crawler.

This could be configured in Kagi’s settings as a base path (HTTP URL) to start crawling from, and an auth header name and value.

Of course, this doesn’t mean the developer needs to generate HTML pages for all the data they wish to be indexed. This authenticated HTTP website/server is just a way to communicate with Kagi. The server could be dynamic, e.g. notes.alex.com/ could show all the notes with links to notes.alex.com/note1, note2, etc. For example, for an email app, the server could query the mailbox and render emails as HTML pages on demand when requested by Kagi.

I find this model very natural, given Kagi’s nature 😄

I think this 2nd implementation would also be flexible enough to cover the ideas presented above.

I’d pay $30-$50/mo if Kagi had a feature like this.