Filter out youtube links with a lot of sponsored segments (i.e. advertisements)
Vlad
After some experimenting, I have working (non perfect) python code that checks whether there are sponsored segments/advertisements in any video in a given feed URL.
Details:
- It can take a list of feeds, such as the
small_yt.txt
and gets the feed xml. - For each feed it retrieves the video URLs and checks them against the SponsorBlock API.
- If there is sponsored content/ads it will add up the total duration of it and write the feed URL into a file together with the total duration of all sponsored content. The threshold for this is currently 0.
Note that the SponsorBlock API is licensed under CC-BY-NC-SA 4.0, see here. This means two things:
- It cannot be used for commercial purposes. IANAL, but I think this is fine. See https://creativecommons.org/faq/#does-my-use-violate-the-noncommercial-clause-of-the-licenses.
- It is SA ("Share alike"), which means the rest of the source code has to be CC-BY-SA-NC 4.0 licensed as well, but:
The author explicitly says "If you need to use the database or API in a way that violates this license, contact me with your reason and I may grant you access under a different license."(link)
So if you do not want to relicense your code, you could ask them for permission (chances are good I think).
@Vlad, if you are still interested, I can share my code with you.
- Edited
What I did was more of a script to check whether a content creator belongs on the list. I.e. not something to be run at runtime, rather something to be run on the list small_ty.txt
periodically to weed out any channels that makes use of sponsored advertisements.
It would be trivial to implement a check at runtime, but unfortunately, the API will be much too slow.
Vlad We do not intend to crawl the entire youtube to run the script on every video (resources constrained).
That is not what the script does. It takes the last 10 videos (number can be adapted) which it gets from the feed and checks them vs the API. And as something that only has to be run occasionally the performance is very fine.
I am not mad if you do not want to implement this (and it was fun coding it), but I think this could be a cool feature for the small YT channels to automatically weed out the channels that do not adhere to the Small Web style. Running this script occasionally and removing bad channels would lower server load as well, right?
TL;DR The script checks whether the channels in small_yt.txt
really belong there and can be run once a month or even less. Something like your small snippet on GitHub to weed out duplicates.
Could this be something for paid Kagi add-ons? https://kagifeedback.org/d/2894-paid-kagi-add-ons
- Edited
carl
Nope, as it is licensed as non-commercial. The code I wrote building on that and the data produced from it have to be licensed the same. And honestly, I wouldn't feel comfortable with that as well. The code can and should(!) benefit all Small Web users.
Maybe to illustrate the benefits of this code: Just running it on a single channel already revealed this in the last 10 videos, apart from some other sponsoring:
This channel should IMO just be removed. And the script can check the entire Small YT file for such channels.
- Edited
This is an example output:
Of the first 50 entries in small_yt.txt
, 15 have sponsored content in their last 10 videos, some well over 10 mins overall.
Note that those values are optimistic lower bounds, as entirely sponsored videos and API errors are currently skipped (they are currently having some server issues).
The third col is WIP, it should be able to detect entirely sponsored videos as well (after some changes)
- Edited
Well, to me, and I think most will agree, those seconds are literal advertisements. "If you use $LIPSTICK, as I do, you lips will be so plush and red. It's is amazing. SO everyone, in the description the is a link to $BRAND_URL, which is the website of $BRAND. They make such great products "
I think this violates the Small Web spirit.
Vlad Well sponsors are better than ads so not sure what to do about it.
With YouTube premium you never see any ads, while sponsor segments remain. They are ads just the same, and more difficult to get rid off.
KagiForMe Nope, as it is licensed as non-commercial. The code I wrote building on that and the data produced from it have to be licensed the same. And honestly, I wouldn't feel comfortable with that as well. The code can and should(!) benefit all Small Web users.
I was referring to Vlads statement on crawling YouTube.
carl I was referring to Vlads statement on crawling YouTube.
Ah, okay, sorry.
- Edited
Vlad Well sponsors are better than ads so not sure what to do about it.
I have asked a bit around on the Discord, and people seemed to agree with me that sponsored segments do not belong in small_yt
, and that this filtering would be a nice feature.