4

KagiForMe It is not always accurate, but I agree, more than 90% of the times it is accurate. Nonetheless, the API permits retrieval solely for a specific video. Furthermore, they will have to constantly hit YouTube, which could cause the IP address to have a bad reputation. It would be easier for us to raise issues on GitHub to remove a specific channel with proof if, say, it has more than say 10% sponsorship time in the last 5 videos.

Correct me if I am wrong 😊

    The code for the site is open source, and this could be implemented - by the community. I am kind of done with investing more time in the site and need to move on, to actually run the company.

    Filtering could be relatively easilly added:
    https://github.com/kagisearch/smallweb

      4 months later

      Vlad
      After some experimenting, I have working (non perfect) python code that checks whether there are sponsored segments/advertisements in any video in a given feed URL.
      Details:

      • It can take a list of feeds, such as the small_yt.txt and gets the feed xml.
      • For each feed it retrieves the video URLs and checks them against the SponsorBlock API.
      • If there is sponsored content/ads it will add up the total duration of it and write the feed URL into a file together with the total duration of all sponsored content. The threshold for this is currently 0.

      Note that the SponsorBlock API is licensed under CC-BY-NC-SA 4.0, see here. This means two things:

      The author explicitly says "If you need to use the database or API in a way that violates this license, contact me with your reason and I may grant you access under a different license."(link)

      So if you do not want to relicense your code, you could ask them for permission (chances are good I think).

      @Vlad, if you are still interested, I can share my code with you.

        KagiForMe Well any check we do needs to be doen runtime and very fast (<100ms per video). Does it do that?

        We do not intend to crawl the entire youtube to run the script on every video (resources constrained).

          What I did was more of a script to check whether a content creator belongs on the list. I.e. not something to be run at runtime, rather something to be run on the list small_ty.txt periodically to weed out any channels that makes use of sponsored advertisements.

          It would be trivial to implement a check at runtime, but unfortunately, the API will be much too slow.

          Vlad We do not intend to crawl the entire youtube to run the script on every video (resources constrained).

          That is not what the script does. It takes the last 10 videos (number can be adapted) which it gets from the feed and checks them vs the API. And as something that only has to be run occasionally the performance is very fine.

          I am not mad if you do not want to implement this (and it was fun coding it), but I think this could be a cool feature for the small YT channels to automatically weed out the channels that do not adhere to the Small Web style. Running this script occasionally and removing bad channels would lower server load as well, right?

          TL;DR The script checks whether the channels in small_yt.txt really belong there and can be run once a month or even less. Something like your small snippet on GitHub to weed out duplicates.

            carl
            Nope, as it is licensed as non-commercial. The code I wrote building on that and the data produced from it have to be licensed the same. And honestly, I wouldn't feel comfortable with that as well. The code can and should(!) benefit all Small Web users.

              Maybe to illustrate the benefits of this code: Just running it on a single channel already revealed this in the last 10 videos, apart from some other sponsoring:

              This channel should IMO just be removed. And the script can check the entire Small YT file for such channels.

                This is an example output:
                Of the first 50 entries in small_yt.txt, 15 have sponsored content in their last 10 videos, some well over 10 mins overall. 🙁

                Note that those values are optimistic lower bounds, as entirely sponsored videos and API errors are currently skipped (they are currently having some server issues).

                The third col is WIP, it should be able to detect entirely sponsored videos as well (after some changes)

                • Vlad replied to this.

                  Well, to me, and I think most will agree, those seconds are literal advertisements. "If you use $LIPSTICK, as I do, you lips will be so plush and red. It's is amazing. SO everyone, in the description the is a link to $BRAND_URL, which is the website of $BRAND. They make such great products "

                  I think this violates the Small Web spirit.

                    Vlad Well sponsors are better than ads so not sure what to do about it.

                    With YouTube premium you never see any ads, while sponsor segments remain. They are ads just the same, and more difficult to get rid off.

                    KagiForMe Nope, as it is licensed as non-commercial. The code I wrote building on that and the data produced from it have to be licensed the same. And honestly, I wouldn't feel comfortable with that as well. The code can and should(!) benefit all Small Web users.

                    I was referring to Vlads statement on crawling YouTube.

                      carl I was referring to Vlads statement on crawling YouTube.

                      Ah, okay, sorry. 🙈

                        15 days later

                        Vlad Well sponsors are better than ads so not sure what to do about it.

                        I have asked a bit around on the Discord, and people seemed to agree with me that sponsored segments do not belong in small_yt, and that this filtering would be a nice feature.

                          No one is typing