Well, i admit it, i am a
TED lover. I love the passion, and the "new" factor TED brings to me :)
I was watching
this presentation for Simon Sink and someone I know downloaded the video but wasn't able to view the translation locally, so I thought I might be able to download the translations for him and convert them to SRT format to be able to display them on any Video Player. and Here is how I did it :)
If you visited the site before, you will find it has a flash control where you can view the video and choose the translation if needed.
First, To be able to investigate how the flash control gets the translation, you have to open a network sniffer which will enable you to view all the packets going to and from your network card. A good packet sniffer is
Wireshark, and if you don't know it check those links to know more about how to use it :)
Packet Sniffing using Wireshark Tutorial (Video)
Fifteen Minute Wireshark Tutorial - Wheeler Software
Second, you need to add a filter for "HTTP" requests only, and navigate through any video (like the one above) to view how the flash control communicates with the server :)
When you choose a translation, you will find a request to a URL like this:
www.ted.com/talks/subtitles/id/848/lang/eng , where 848 is the Talk ID and eng is the language choosed "English".
So there two questions now, given a URL to the TED TalkWhat format are translation subtitles returned?
How do i get the Talk ID?To answer the first question, just click the link above, and you will find the translation is returned in JSON (Javascript Object Notation) format. So this is good news. We just need to find a good library to handle JSON, and be able to convert it to SRT format to be able to use it on almost any Video Player like VLCMedia Player.
And the answer to the second question is to do a simple "View Source" and search for this number, you will find it in several places through the page's source code, so simple parsing the page HTML should do the job.
Here is the full Script to do the job given the URL on ted.com and the language code which is eng for english and ara for arabic. Will try to provide the rest of the language codes later.
This is a simple command to test the script:
python TEDSubtitles.py "http://www.ted.com/talks/simon_sinek_how_great_leaders_inspire_action.html" "eng"
And here is the full source code, will try to upload it somewhere soon.
Hope it's worth spreading :))
Updates:You can get the script here .You need at least Python 2.6 for the json module to be available.Update on 22/09/2010:Although it's hosted on sourceforge, the source code is not available. It would be better if it was shared.
Update on 24/04/2011:A Google Appengine application was created for the same functionality on
http://tedsubtitles.appspot.comYou can view the source below, but for better colorized viewing check this link
hereimport os
import sys
import json
import urllib2
# Format Time from TED Subtitles format to SRT time Format
def formatTime ( time ) :
milliseconds = 0
seconds = ((time / 1000) % 60)
minutes = ((time / 1000) / 60)
hours = (((time / 1000) / 60) / 60)
formatedTime = str ( hours ) + ':' + str (minutes) + ':' + str ( seconds ) + ',' + str ( milliseconds )
return formatedTime
# Convert TED Subtitles to SRT Subtitles
def convertTEDSubtitlesToSRTSubtitles ( jsonString , introDuration ) :
jsonObject = json.loads( jsonString )
srtContent = ''
captionIndex = 1
for caption in jsonObject['captions'] :
startTime = str ( formatTime ( introDuration + caption['startTime'] ) )
endTime = str ( formatTime ( introDuration + caption['startTime'] + caption['duration'] ) )
srtContent += ( str ( captionIndex ) + os.linesep )
srtContent += ( startTime + ' --> ' + endTime + os.linesep )
srtContent += ( caption['content'] + os.linesep )
srtContent += os.linesep
captionIndex = captionIndex + 1
return srtContent
def getTEDSubtitlesByTalkID ( talkId , language ) :
tedSubtitleUrl = 'http://www.ted.com/talks/subtitles/id/' + str(talkId) + '/lang/' + language
req = urllib2.Request(tedSubtitleUrl)
response = urllib2.urlopen(req)
result = response.read()
return ( result )
tedTalkUrl = sys.argv[1]
language = sys.argv[2]
req = urllib2.Request(tedTalkUrl)
response = urllib2.urlopen(req)
result = response.read()
## Get Talk ID value
splits = result.split ( ';ti=' )
talkId = splits[1].split ( '&' )[0]
print talkId
## Get Talk Intro Duration value
splits = result.split ( ';introDuration=' )
talkIntroDuration = splits[1].split ( '&' )[0]
talkIntroDuration = int ( talkIntroDuration )
print talkIntroDuration
jsonString = getTEDSubtitlesByTalkID ( talkId , language )
srtContent = convertTEDSubtitlesToSRTSubtitles ( jsonString , talkIntroDuration )
# Generate SRT file name
splits = tedTalkUrl.split ( '/' )
srtFilename = splits[len ( splits )-1].split ('.')[0]
srtFile = open ( './' + srtFilename + '.srt' , 'w' )
srtFile.write ( srtContent.encode ( "utf-8" ) )
srtFile.close ()
.. more.