Vlad
I'd like to return to this question. It seems that it is actually doable to build something like pagerank for books and PDFs. Wikipedia articles are extensibly sourced, and the key is ISBN numbers. Many citations on Wikipedia reference the ISBN number, and shadow libraries such as Anna's Archive lets you search by that ISBN number and find the book or paper. The references on Wikipedia even have page numbers, but it seems to be a pain to get this to work right.
Could there be a way to make the search engine understand Wikipedia references where no ISBN is given? For a person it is trivial to find the books referenced by the article, but can a machine do that automatically?
The way I see it, there's really no difference between a URL and an ISBN that makes one better than the other for building an index. And maybe I'm dreaming, but wouldn't it be nice to be able to enter isbn:// in the browser just in the same way you enter https:// ? What's keeping that from happening? There are services that will parse those numbers.
What if the search engine and web browser could have links to content like this: ISBN://123456789/chapterY/headingX
The W3 recently updated their recommendations for epub on this page:
https://www.w3.org/TR/epub-33/
It is a bit too technical for me, I tried to understand if they already have a standard for linking to within books, but I couldn't make sense of the Docs.
Alright, let's take a look at what options are available today to search within books:
1. Google
You have to specifically use the "Books" tab for it to surface any books. The results are excellent, with book name, author name, publishing year and a snippet of the relevant text, with page number. Clicking the result takes you to a book preview hosted by Google, where you might or might not be allowed to see the content. The book preview is presented pretty much like a PDF, with good performance and bad accessibility. You have links to try find the book in a library or try to buy it online. When you exit the preview, Google shows you a splendid page about the book and author.
2. Z-Library
They offer fulltext search of a pretty bad quality. It is evident that they have indexed the content of a ton of their books (if not all), but their search feature is set by default to surface "Most Popular" results instead of "Best match". However, changing that doesn't improve results much. Results are presented with a thumbnail, title, author, relevant text snippets, year, language, and a link to the book (or PDF) presented very nicely on Z-Library, where you can also download it.
3. J-Stor
Limited content, I couldn't find anything relevant using their fulltext search, but it seems to be presented nicely.
4. Open Library (Internet Archive)
Quick and pretty good fulltext search, books are presented with links to read or to borrow. However, most material is very old, since they focus on stuff out of copyright. They boast 4 million books to search within.
And these are all resources to search within books that I could find. The prominent shadow libraries Library Genesis and Anna's Archive will only search for book titles, not the content.
In the Orion feedback forum, I've suggested as a feature for the browser to open ebooks, but there is also the option of going the Google route and have Kagi open some kind of web-view with the book inside. I assume it will take some time before browsers in general will start treating ebooks nicely.
For people looking for content in books today, the best option is to use Googles book search and preview, and then acquire the book in some way using the ISBN with a third party, such as bookstore, library, or shadow library. Can Kagi beat Google?
E-Books have somewhat been embraced by the tech giants. Google don't want anything to do with ePub files, but have indexed books and made an excellent search feature. Apple have an ePub reader included on MacOS and iOS devices. Click an ePub in Safari on your iPhone and it opens without an issue. Microsoft used to view ePub files natively in Microsoft Edge, until they switched to Chromium. I think I don't need to mention Amazon's involvement in e-books 😉
I know Kagis main base right now are programmers, and that the content they need is usually not found in books. But there are many different sectors of knowledge workers who would be much better served by having search results from books and PDFs than from web pages. Engineers, medical professionals, historians, journalists, and all kinds of people who do research professionally or for fun.
Annas Archive provide torrents with their whole library, which can be useful for making an index. Unfortunately I wouldn't be able to download them, since it requires a ton of storage.
I think it would be possible for Kagi to massively improve the availability of the world's knowledge, without too much work.