Assistant community plugins support

httpjames

Microsoft announced Copilot for Windows yesterday, an assistant with something they call a "natural language interface". Basically, they aim to make this an assistant that can interpret what you say, find the right integration and/or resources and execute the task(s) for you without anything on your part other than prompting and clarification.

ChatGPT Plus has something similar with plugins. With a simple query, you can get ChatGPT to add items to your Instacart, reserve a restaurant, and more.

I think this could be a great addition to Assistant. Like with Raycast, the community probably has thousands of developers. If we all come together to make integrations, Kagi Assistant can become the best NLI.

I propose the following plugins implementation in Assistant that accounts for security and developer experience.

Overview

User loads the plugins in the UI
Assistant modifies the system prompt to let the LLM know it can call these assistants. Something like "Call a plugin by writing '!plugin_name <args>'"
If necessary, Assistant calls these plugins with arguments

Technical

Plugins include a manifest that includes the different commands with names and descriptions
example:

{
  "name": "tasks",
  "description": "Interact with and manage tasks",
  "apiURL": "https://tasksapp.com/v1",
  "commands": [
    {
      "name": "list-tasks",
      "description": "List today's tasks",
      "method": "GET",
      "endpoint": "/tasks",
      "args": [
        {
          "name": "limit",
          "type": "numeric",
          "min": 1,
          "max": 10,
          "required": false,
          "default": 5
        }
      ]
    }
  ]
}

This manifest is converted into an LLM optimized prompt that lets it know the of plugin's existence, its commands and arguments it can supply
When the LLM calls a command, like !tasks list-tasks, Kagi looks up the plugin and its manifest. It sends an API request according to the manifest. For example, in this one, Kagi would send a request to https://tasksapp.com/v1/tasks?limit=5. The API would send a plaintext response that's optimized for the LLM.

example:

- Task ID 1
Name: Sign new apartment lease
Completed: No
Due: Today at 23:00

- Task ID 2
Name: Meet with financial advisor
Completed: No
Due: Sep 23 at 15:30

...

With this response, Kagi supplies it into the chat and lets the LLM interpret it.

Using this model, Kagi does not risk running any risky code on its infrastructure as it's really only relaying information from an API to another. It's easy to interface with as a developer because developers only need to write manifests and Kagi will take care of the AI execution.