Do you want to know what my favorite movie-related website is?  It is opensubtitles.org. I am obsessed with that website.  

When people talk about the internet, they talk about the good, they talk about the bad, and they talk about the ugly.  No matter what you are trying to find, you can find it on the internet. Do you want to find a book? Look on Amazon.com.  Are you looking for the best online casino bonus? Checkout Intertops casino bonus.

But what sets Open Subtitles part from the rest is what happens after users upload srt files on opensubtitles.com website.  Srt files are text files that are read by DVD players and online video players to display subtitled text while a movie is playing.  It is a pretty simple file format. Every segment of text that displayed while a movie is playing is given a sequential number and a time frame of when to display the text.

It works.  It works exactly how it was designed to work.  But what is really amazing is what the open-source community has done to expand on that standard that the original creators of the Srt standard probably never envisioned would happen.

Word Frequency Lists

Did you ever wonder how which words were the most commonly spoken words in any given language?  Invoke IT Limited has written software to create word frequency lists in almost every language available on Opensubtitles.org.  They have been doing this since at least 2006, and the most recent version is from 2018. Around every two years, the author of this website reruns his scripts.  

The raw data files can be found on GitHub.

The raw data file are just text files with the data separated by spaces.

Wiktionary

Wiktionary then takes these raw data files and creates word lists.  These word lists are located on the Wiktionary section of their site.  

https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

From these word lists, we can learn from very interesting information.  Out of 24 million words from all of these subtitle files, 1,000 words cover 85.5% of all of the words.  10,000 words cover 97.2% of all of the words. Remember that we are talking about spoken English, not written English, but since these subtitles contain both documentaries, fiction stories, and educational shows, it is a wide variety of words that are being processed.  13,000 is the word count where the word appears less than 50 times out of 24 million words.

Wiktionary than uses these word lists to determine what words are still missing from their website.

Subtitletoolsdotcom

Subtitlestools is an online tool for doing additional manipulation to Srt subtitle files.

Srt is the main standard that is currently used, but sometimes people need to convert subtitle files from other formats to Srt format.  The following converters exit:

  • Converting ssa/ass to Srt
  • Converting WebVTT to Srt
  • Convert smi to Srt
  • Converting MicroDVD (sub) to Srt
  • Polish MPL2 to Srt
  • oTranscribe to Srt

All of these programs will allow a batch of subtitle files to be processed at one time, but each file is processed individually.

  • Converts many types of text-based subtitle files to WebVtt
  • Converts picture based sub/idx subtitles to srt
  • Converts picture based sup subtitles to srt.
  • Converts text-based subtitles to plain text files.

Is your subtitle file slightly off?  Are the words being spoken not match what is being said on the screen?  Then you need to process your subtitle file through one of these tools.

  • Subtitle shifter – shift all cue timestamps of a subtitle file in sync with the video.
  • Partial subtitle shifter – Resync multiple specific parts of a subtitle file.

Sometimes you may need to fix your subtitle file.

  • Srt cleaner – remove incorrect formatting and SDH from srt files.
  • Convert to Utf-8 – Change text encoding of any text file to UTF-8.  This is needed for old foreign language subtitle files that were created before Unicode became the standard.

Subtitle Merger

This is one of my personal favorite programs.  It allows you to merge two different subtitle files into one subtitle file.  Let’s look at a TV show like the Simpsons. There is a subtitle file in English and there is a subtitle file in French.  With this tool, you can merge the two subtitle files together, so that both language’s subtitles are displayed on the screen at the same time.

The following options are available:

  • Top and bottom of the screen.  One language is displayed on the top of the screen, and the second language is displayed on the bottom of the screen.
  • Nearest cue
  • Glue end-to-end.  The second language’s text is appended after the text of the first language.
  • Choose the color for the first subtitle file. 
  • Choose the color for the second subtitle file.

The most logical use case for this is learning a foreign language through the use of Movies, TV shows, and their corresponding subtitle files.

Color Changer

Changes the color of the text in a subtitle file.  But this particular tool changes all of the text to the same color.

I was hoping that this tool would provide a way to give a word list and then all of those words get their color changed, but alas, this specific tool is not designed for that.

In order to do that, you would have to change the actual srt text manually by adding a font tag around the word or words you would want the font changed for.

example text

The use case I am thinking of is a teacher teaching a lesson on plants, and the teacher is using the Magic Schoolbus TV episode on plants.  The teacher wants to emphasize the unit’s vocabulary words that are spoken in the video. Although it is possible to do this, it is not possible with this specific tool.

Summary

This is just a small subset of how people are manipulating and using subtitle Srt files beyond their original intention of displaying text for the hard of hearing.  These files are used for the basis of dictionaries, spell checkers, and trends, and even psychology of human behavior.

How are you using Srt subtitle (SubRip) files?