YSG (YouTube Subtitle Grabber)
A short description
I created a very simple script for bulk downloading automatically generated subtitles/transcriptions of YouTube videos. The tool is designed to help analyze content on YouTube without the need to watch it in its entirety. It downloads the subtitles and generates .csv and .txt files, which assist in later analysis.
You can download the script here: YSG 0.1
How it works
My script utilizes a much more advanced tool, YT-DLP, which allows for downloading videos from YouTube. However, the basic version of YT-DLP is quite impractical, has a complicated structure, and only allows for downloading content from a selected channel. If we want to analyze videos from the last 3 days that have appeared on 20 channels, my tool should be significantly more useful.
Of course, in many cases, this work can be done “manually,” but my script is useful for analyzing a large amount of content from various channels, which can be applicable for monitoring, for example, extremist channels.
How to use script
Before first usage
Install:
- Python 3
- ffmpeg
- yt-dlp
Settings:
- In the settings folder, there is a file called channels.txt. In separate lines, paste the YouTube channels that should be included by the tool.
- In the settings folder, there is a file called language.txt. It’s content is the language code (default is set to Polish) in which the subtitles generated by YouTube should be retrieved.
- In the settings folder, there is no cookies.txt file. It is required, and the user needs to paste it into the folder. To do this, install the Cookies.txt extension for Firefoxa or Chrome, go to YouTube, and save the cookies.txt file. This will allow the tool to retrieve subtitles from multiple channels.
Operation in practice
python3 ysg.py -d 3
Running the script with the flag -d 3
will retrieve transcriptions of videos from the last 3 days from the selected channels. By default, the -d parameter is 1. I have tested the tool on up to 20 channels so far. You have to generate the cookies.txt before every use.
If everything went according to plan then in the terminal you should be able to see something like this:
The subtitles will be saved in the subitles folder, while the output folder will contain two files. An output.csv file and an output.txt file, both files will contain the date, movie title, channel name and transcription.
If there were problems in the script folder, a log.txt file should appear, which will contain details of the script’s operation.