On Tuesday, April 7, 2020 12:27:20 PM PDT David Wright wrote: > On Tue 07 Apr 2020 at 19:11:57 (+0300), Anastasios Lisgaras wrote: > > Youtube-dl <https://github.com/ytdl-org/youtube-dl> is indeed a > > powerful and very good software for this job with many features and > > options, but can you download videos *from anywhere ?* > > > > What I want to say is that there are many web pages which greatly > > hinder (prohibit) this possibility. > > In this case, what can we do? Can we always find the hidden link > > (source) of the video? If so, how? > > If the page requires you to be logged in, what can we do? > > I'm not sure what the implications are of having to login to a site. > But in general you need different tools for different web sites. > The BBC iplayer and youtube-dl are two such tools, and sometimes > a download link is even available, which either the browser or > wget can use (the latter preserving the metadata). > > Where videos exist in their entirety, some sites still play them > by downloading to a temporary file (and you can see the > download in the progress bar, ahead of what's actually playing. > A technique there is to examine /proc/N/fd where N is the > process number of the browser tab. (The process name used to > be xul-runner, Web Content etc, and looks as if it's currently > /usr/lib/firefox-esr/firefox-esr -contentproc.) > If you find an fd number F that's pointing to a file (deleted) in > /tmp, then try copying that /proc/N/fd/F (following links). Do it > when the download progress bar has reached the end, but the file > is still playing. (Sometimes everything disappears as soon as the > end is reached.) > > Another technique is where the source is streaming (and might be > open-ended). Here, the video can end up as fragments in your > browser cache. How you handle them depends on whether they are > audiovisual or in two seperate streams, and whether they are > timestamped. Some are, some aren't. The former are relatively > easy to reassemble with ffprobe to read the timings and ffmpeg > to concatenate the pieces (and merge audio/video if necessary). > > Where there's no internal timestamping, you can sometimes rely > on the filesystem's own to figure out the correct ordering. > But I prefer to run a script that watches files in the cache > as they are closed (with inotifywait), and immediately copies > them out (if the filetype is of interest) with a sequence > number and the file type in the filename. The relevant segments > can then be concatenated quite easily. A timeformat of > %Y%m%d-%H%M%S works well as a more meaningful sequence number, > particularly if you append %N to include nanoseconds for the > necessary time resolution. > > Be aware that the fragments in your cache might not all be > identified by the file program's defaults. For example, I use > 0 string G@ TS transport stream > in ~/.magic to pickup files that file might otherwise label > as 'data'. > > Sometimes, even then, you have to use a little ingenuity for > the quiet life: eg there's a UK railway site that has three > webcams (two stations and the yard) which run simultaneously > on the same web page. Fortunately, each webcam runs with a > different frame speed, so it's quick and easy to distinguish > their files and divide them up. > > Finally, when all else fails, and if you've read this far, > you can just capture the screen contents with ffmpeg's > x11grab and record it to an mpg file. The disadvantages are > that you capture extraneous screen decorations, and you've got > to dedicate the whole screen to watching the video, remembering > to increase your blanking timeout too. If you can only record > audio through the microphone, you get more extraneous rubbish > there too. >
That is one comprehensive write up! Thanks David, today I learned something new thanks to you.
-- Ihor Antonov https://useplaintext.email |