[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: how to save video on web page



On Tue 07 Apr 2020 at 19:11:57 (+0300), Anastasios Lisgaras wrote:

> Youtube-dl <https://github.com/ytdl-org/youtube-dl> is indeed a
> powerful and very good software for this job with many features and
> options, but can you download videos *from anywhere ?*
> 
> What I want to say is that there are many web pages which greatly
> hinder (prohibit) this possibility.
> In this case, what can we do? Can we always find the hidden link
> (source) of the video? If so, how?
> If the page requires you to be logged in, what can we do?

I'm not sure what the implications are of having to login to a site.
But in general you need different tools for different web sites.
The BBC iplayer and youtube-dl are two such tools, and sometimes
a download link is even available, which either the browser or
wget can use (the latter preserving the metadata).

Where videos exist in their entirety, some sites still play them
by downloading to a temporary file (and you can see the
download in the progress bar, ahead of what's actually playing.
A technique there is to examine /proc/N/fd where N is the
process number of the browser tab. (The process name used to
be xul-runner, Web Content etc, and looks as if it's currently
/usr/lib/firefox-esr/firefox-esr -contentproc.)
If you find an fd number F that's pointing to a file (deleted) in
/tmp, then try copying that /proc/N/fd/F (following links). Do it
when the download progress bar has reached the end, but the file
is still playing. (Sometimes everything disappears as soon as the
end is reached.)

Another technique is where the source is streaming (and might be
open-ended). Here, the video can end up as fragments in your
browser cache. How you handle them depends on whether they are
audiovisual or in two seperate streams, and whether they are
timestamped. Some are, some aren't. The former are relatively
easy to reassemble with ffprobe to read the timings and ffmpeg
to concatenate the pieces (and merge audio/video if necessary).

Where there's no internal timestamping, you can sometimes rely
on the filesystem's own to figure out the correct ordering.
But I prefer to run a script that watches files in the cache
as they are closed (with inotifywait), and immediately copies
them out (if the filetype is of interest) with a sequence
number and the file type in the filename. The relevant segments
can then be concatenated quite easily. A timeformat of
%Y%m%d-%H%M%S works well as a more meaningful sequence number,
particularly if you append %N to include nanoseconds for the
necessary time resolution.

Be aware that the fragments in your cache might not all be
identified by the file program's defaults. For example, I use
0	string	G@	TS transport stream
in ~/.magic to pickup files that file might otherwise label
as 'data'.

Sometimes, even then, you have to use a little ingenuity for
the quiet life: eg there's a UK railway site that has three
webcams (two stations and the yard) which run simultaneously
on the same web page. Fortunately, each webcam runs with a
different frame speed, so it's quick and easy to distinguish
their files and divide them up.

Finally, when all else fails, and if you've read this far,
you can just capture the screen contents with ffmpeg's
x11grab and record it to an mpg file. The disadvantages are
that you capture extraneous screen decorations, and you've got
to dedicate the whole screen to watching the video, remembering
to increase your blanking timeout too. If you can only record
audio through the microphone, you get more extraneous rubbish
there too.

Cheers,
David.


Reply to: