6

There are hundreds of spam websites pretending to be official airport websites which come up for queries such as "{airport name} departures". These fake websites only exist to serve ads and/or push referral links, regurgitating information scraped from other sources that may be inaccurate or out of date, so wherever possible official websites should be prioritised for queries such as that.

For major airports (e.g. JFK, Heathrow) it looks like Kagi is correctly showing the official websites first but for smaller airports the fake sites sometimes win (example below).

These are some query patterns that should prioritise these websites:

  • "{airport} departures"
  • "{airport} arrivals"
  • "{airport} map"
  • "{airport} terminals"
  • "{airport}" if the airport name is distinct enough

At the very least queries with IATA codes (e.g. "JFK") or explicit "{name} airport" substrings should get this prioritisation. It would also be great if Kagi recognised other common identifiers but supporting that in multiple languages might be challenging, but fortunately IATA codes are universal and the below Wikidata query can be modified to return airport names in different languages which may help.

As a bonus it would be great to punish these fake websites and demote them since they provide little value and try to deceive users. They usually have a tiny disclaimer in their footer saying e.g. "This website is not the official {airport name} website" which could help identify them.

Wikidata has excellent coverage of official airport websites which I think provides all the data needed for this enhancement. Here is a query providing all airport IATA codes with their corresponding official websites, and that result as JSON (I've never touched SPARQL before so I'm sure that query can be improved but it looks suitable, and I've limited the queries to 10 results for demo purposes but that line should of course be removed to retrieve the full results).

This is an example query for "tfs airport" which shows the fake, unofficial tfsairport.com website is ranking above the official aena.es site:

This same result also happens for "tenerife airport" and "tenerife south airport".

(An existing feature request has some overlap but is very broad covering visas, accommodation, etc in addition to airports – this request is specific to airports which can be addressed directly)

  • Vlad replied to this.

    adamaveray How do you recognize a 'spam' website from a legit one? Or rather how would Kagi know that a site is an official one? Is there a list of either spam airport sites or official ones? Thanks.

      Vlad I think you could probably query this on WikiData with P239 ‘ICAO airport code’, P238 ‘IATA airport code’, and P240 ‘FAA airport code’; and then use the P856 ‘official website’ property of the same object. For example, on WikiData TFS/GCTS — Tenerife South Airport has two of those three airport code properties, and an official website. Part of the problem with this one is that it wouldn't be naturally surfaced as well because for some reason they have chosen not to give individual airports their own domains there.

        Vlad As @xorgy has mentioned Wikidata can give you that data – here's a response showing the IATA codes & official websites for as many airports as Wikidata is willing to return for my query: query.wikidata.org

        But @xorgy looks much better at writing SPARQL than I am so I'd go with that query they've linked (modified to also output the URLs & airport codes)!

        xorgy This is amazing! what other categories of stuff we could get official websites for that would be useful in the context of search?

          Vlad I imagine a good first step would be significant physical venues since they’re valuable targets for SEO scams but limited in number.

          The list could include airports as this issue discusses, but also sports centres/stadiums, event centres/music venues/theatres, major landmarks/tourist attractions, and so on. The possibilities are endless, and that’s before considering non-POI results like bands, sports teams, companies, etc.

          Fortunately it looks like Wikidata has that same standard “official website URL” across all record types it’s applicable for so I’d imagine if the Kagi team gets that data ingestion set up for airports it’s fairly straightforward applying that across other areas as they are identified.

            adamaveray Is there any category of interest where kagi could use this kind of improvement right now? I want to do this incrementally and one by one if we can identify categories (and appropriate queries) to get the list?

            Vlad one that comes to mind is hospitals, but it may be harder to make that into a feature given the more varied ways people search for hospitals.

            There is a way to do this in general, by taking records with an official website property, and working through properties that could be used as query strings matched to it (e.g. short names, corporation names, codes like ICAO and IATA codes). Some train stations have IATA codes as well.

            This can also be matched to POIs on OpenStreetMap, but that may not be super useful to you given that your maps are Apple Maps.

            No one is typing