Compare commits

...

14 Commits

Author SHA1 Message Date
Tom-Oliver Heidel b28e751688
[skip travis] 2020-11-11 00:40:43 +01:00
Tom-Oliver Heidel 7ee5015a34
Merge pull request #149 from RobinD42/fix-subtitle-fallback
fall-back to the old way to fetch subtitles, if needed
2020-11-11 00:08:18 +01:00
Tom-Oliver Heidel 00c38ef28d
Merge pull request #151 from wlritchi/youtube-playlist-polymer
RFC: youtube: Polymer UI and JSON endpoints for playlists
2020-11-11 00:05:27 +01:00
Tom-Oliver Heidel 34861f1c96
Merge pull request #137 from nsapa/fix_subtitle
Fix issue triggered by tubeup
2020-11-11 00:02:09 +01:00
Unknown 104bfdd24d ytsearchurl 5 pages for around 100 results 2020-11-11 00:00:27 +01:00
Luc Ritchie 73ac856785 [youtube] max_pages=5 for search, unlimited for everything else
Also drop a few leftover methods in search that are no longer used.
2020-11-10 17:49:43 -05:00
Tom-Oliver Heidel d91fdaff03
Merge pull request #79 from rigstot/thisvid
implement ThisVid extractor
2020-11-10 23:34:16 +01:00
Tom-Oliver Heidel c54f4aada5
Merge branch 'master' into youtube-playlist-polymer 2020-11-10 23:27:55 +01:00
Unknown 0f8566e90b manually set limit for youtubesearchurl 2020-11-10 23:20:52 +01:00
rigstot d7aec208f2 implement ThisVid extractor
deobfuscates the video URL using a reverse engineered version of KVS
player's algorithm. This was tested against version 4.0.4, 5.0.1,
5.1.1.4 and 5.2.0.4 of the player and a warning will be issued if the
major version changes.
2020-11-10 22:44:53 +01:00
Luc Ritchie 9833e7a015 fix: youtube: Polymer UI and JSON endpoints for playlists
We already had a few copies of Polymer-style pagination handling logic
for certain circumstances, but now we're forced into using it for all
playlists since we can no longer disable Polymer. Refactor the logic to
move it to the parent class for all entry lists (including e.g. search
results, feeds, and list of playlists), and generify a bit to cover the
child classes' use cases.
2020-11-10 03:38:26 -05:00
Robin Dunn 142f2c8e99 fall-back to the old way to fetch subtitles, if needed 2020-11-09 15:24:42 -08:00
Nicolas SAPA 8263104fe4 [youtube] Fix 'liveChatReplayContinuationData' missing 'continuation' key
live_chat_continuation['continuations'][0]['liveChatReplayContinuationData']['continuation'] can not exist.
So catch the KeyError.

Traceback:
$ tubeup 'https://youtube.com/watch?v=JyE9OF03cao'
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dlc version 2020.10.25
[debug] Python version 3.7.3 (CPython) - Linux-5.8.0-0.bpo.2-amd64-x86_64-with-debian-10.6
[debug] exe versions: ffmpeg 3.3.9, ffprobe 3.3.9
[debug] Proxy map: {}
There are no annotations to write.
[download] 452.59KiB at 615.35KiB/s (00:01)ERROR: 'liveChatReplayContinuationData'
Traceback (most recent call last):
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/YoutubeDL.py", line 846, in extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/YoutubeDL.py", line 901, in process_ie_result
    return self.process_video_result(ie_result, download=download)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/YoutubeDL.py", line 1696, in process_video_result
    self.process_info(new_info)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/YoutubeDL.py", line 1894, in process_info
    dl(sub_filename, sub_info, subtitle=True)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/YoutubeDL.py", line 1866, in dl
    return fd.download(name, info, subtitle)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/downloader/common.py", line 375, in download
    return self.real_download(filename, info_dict)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/downloader/youtube_live_chat.py", line 85, in real_download
    continuation_id = live_chat_continuation['continuations'][0]['liveChatReplayContinuationData']['continuation']
KeyError: 'liveChatReplayContinuationData'
2020-11-08 08:49:03 +01:00
Nicolas SAPA b860e4cc2f [common] Make sure self.params.get('sleep_interval_subtitles') is int
This can happen if another software is using yt-dlc'API (ie: tubeup).
The stack trace would be:
$ tubeup 'https://youtube.com/watch?v=JyE9OF03cao'
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dlc version 2020.10.25
[debug] Python version 3.7.3 (CPython) - Linux-5.8.0-0.bpo.2-amd64-x86_64-with-debian-10.6
[debug] exe versions: ffmpeg 3.3.9, ffprobe 3.3.9
[debug] Proxy map: {}
There are no annotations to write.
ERROR: '>' not supported between instances of 'NoneType' and 'int'
Traceback (most recent call last):
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/YoutubeDL.py", line 846, in extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/YoutubeDL.py", line 901, in process_ie_result
    return self.process_video_result(ie_result, download=download)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/YoutubeDL.py", line 1696, in process_video_result
    self.process_info(new_info)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/YoutubeDL.py", line 1894, in process_info
    dl(sub_filename, sub_info, subtitle=True)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/YoutubeDL.py", line 1866, in dl
    return fd.download(name, info, subtitle)
  File "/mnt/data2/Backup/Wiki/.local/lib/python3.7/site-packages/youtube_dlc/downloader/common.py", line 367, in download
    if self.params.get('sleep_interval_subtitles') > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
2020-11-08 08:36:26 +01:00
7 changed files with 255 additions and 199 deletions

View File

@ -364,8 +364,10 @@ class FileDownloader(object):
else '%.2f' % sleep_interval))
time.sleep(sleep_interval)
else:
if self.params.get('sleep_interval_subtitles') > 0:
sleep_interval_sub = 0
if type(self.params.get('sleep_interval_subtitles')) is int:
sleep_interval_sub = self.params.get('sleep_interval_subtitles')
if sleep_interval_sub > 0:
self.to_screen(
'[download] Sleeping %s seconds...' % (
sleep_interval_sub))

View File

@ -82,7 +82,10 @@ class YoutubeLiveChatReplayFD(FragmentFD):
offset = int(replay_chat_item_action['videoOffsetTimeMsec'])
processed_fragment.extend(
json.dumps(action, ensure_ascii=False).encode('utf-8') + b'\n')
continuation_id = live_chat_continuation['continuations'][0]['liveChatReplayContinuationData']['continuation']
try:
continuation_id = live_chat_continuation['continuations'][0]['liveChatReplayContinuationData']['continuation']
except KeyError:
continuation_id = None
self._append_fragment(ctx, processed_fragment)

View File

@ -1175,6 +1175,7 @@ from .theweatherchannel import TheWeatherChannelIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .thisoldhouse import ThisOldHouseIE
from .thisvid import ThisVidIE
from .threeqsdn import ThreeQSDNIE
from .tiktok import TikTokIE
from .tinypic import TinyPicIE

View File

@ -0,0 +1,97 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class ThisVidIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisvid\.com/(?P<type>videos|embed)/(?P<id>[A-Za-z0-9-]+/?)'
_TESTS = [{
'url': 'https://thisvid.com/videos/french-boy-pantsed/',
'md5': '3397979512c682f6b85b3b04989df224',
'info_dict': {
'id': '2400174',
'ext': 'mp4',
'title': 'French Boy Pantsed',
'thumbnail': 'https://media.thisvid.com/contents/videos_screenshots/2400000/2400174/preview.mp4.jpg',
'age_limit': 18,
}
}, {
'url': 'https://thisvid.com/embed/2400174/',
'md5': '3397979512c682f6b85b3b04989df224',
'info_dict': {
'id': '2400174',
'ext': 'mp4',
'title': 'French Boy Pantsed',
'thumbnail': 'https://media.thisvid.com/contents/videos_screenshots/2400000/2400174/preview.mp4.jpg',
'age_limit': 18,
}
}]
def _real_extract(self, url):
main_id = self._match_id(url)
webpage = self._download_webpage(url, main_id)
# URL decryptor was reversed from version 4.0.4, later verified working with 5.2.0 and may change in the future.
kvs_version = self._html_search_regex(r'<script [^>]+?src="https://thisvid\.com/player/kt_player\.js\?v=(\d+(\.\d+)+)">', webpage, 'kvs_version', fatal=False)
if not kvs_version.startswith("5."):
self.report_warning("Major version change (" + kvs_version + ") in player engine--Download may fail.")
title = self._html_search_regex(r'<title>(?:Video: )?(.+?)(?: - (?:\w+ porn at )?ThisVid(?:.com| tube))?</title>', webpage, 'title')
# video_id, video_url and license_code from the 'flashvars' JSON object:
video_id = self._html_search_regex(r"video_id: '([0-9]+)',", webpage, 'video_id')
video_url = self._html_search_regex(r"video_url: '(function/0/.+?)',", webpage, 'video_url')
license_code = self._html_search_regex(r"license_code: '([0-9$]{16})',", webpage, 'license_code')
thumbnail = self._html_search_regex(r"preview_url: '((?:https?:)?//media.thisvid.com/.+?.jpg)',", webpage, 'thumbnail', fatal=False)
if thumbnail.startswith("//"):
thumbnail = "https:" + thumbnail
if (re.match(self._VALID_URL, url).group('type') == "videos"):
display_id = main_id
else:
display_id = self._search_regex(r'<link rel="canonical" href="' + self._VALID_URL + r'">', webpage, 'display_id', fatal=False),
return {
'id': video_id,
'display_id': display_id,
'title': title,
'url': getrealurl(video_url, license_code),
'thumbnail': thumbnail,
'age_limit': 18,
}
def getrealurl(video_url, license_code):
urlparts = video_url.split('/')[2:]
license = getlicensetoken(license_code)
newmagic = urlparts[5][:32]
for o in range(len(newmagic) - 1, -1, -1):
new = ""
l = (o + sum([int(n) for n in license[o:]])) % 32
for i in range(0, len(newmagic)):
if i == o:
new += newmagic[l]
elif i == l:
new += newmagic[o]
else:
new += newmagic[i]
newmagic = new
urlparts[5] = newmagic + urlparts[5][32:]
return "/".join(urlparts)
def getlicensetoken(license):
modlicense = license.replace("$", "").replace("0", "1")
center = int(len(modlicense) / 2)
fronthalf = int(modlicense[:center + 1])
backhalf = int(modlicense[center:])
modlicense = str(4 * abs(fronthalf - backhalf))
retval = ""
for o in range(0, center + 1):
for i in range(1, 5):
retval += str((int(license[o + i]) + int(modlicense[o])) % 10)
return retval

View File

@ -308,17 +308,26 @@ class VikiIE(VikiBaseIE):
'url': thumbnail.get('url'),
})
new_video = self._download_json(
'https://www.viki.com/api/videos/%s' % video_id, video_id,
'Downloading new video JSON to get subtitles', headers={'x-viki-app-ver': '2.2.5.1428709186'}, expected_status=[200, 400, 404])
subtitles = {}
for sub in new_video.get('streamSubtitles').get('dash'):
subtitles[sub.get('srclang')] = [{
'ext': 'vtt',
'url': sub.get('src'),
'completion': sub.get('percentage'),
}]
try:
# New way to fetch subtitles
new_video = self._download_json(
'https://www.viki.com/api/videos/%s' % video_id, video_id,
'Downloading new video JSON to get subtitles', headers={'x-viki-app-ver': '2.2.5.1428709186'}, expected_status=[200, 400, 404])
for sub in new_video.get('streamSubtitles').get('dash'):
subtitles[sub.get('srclang')] = [{
'ext': 'vtt',
'url': sub.get('src'),
'completion': sub.get('percentage'),
}]
except AttributeError:
# fall-back to the old way if there isn't a streamSubtitles attribute
for subtitle_lang, _ in video.get('subtitle_completions', {}).items():
subtitles[subtitle_lang] = [{
'ext': subtitles_format,
'url': self._prepare_call(
'videos/%s/subtitles/%s.%s' % (video_id, subtitle_lang, subtitles_format)),
} for subtitles_format in ('srt', 'vtt')]
result = {
'id': video_id,

View File

@ -36,6 +36,7 @@ from ..utils import (
get_element_by_attribute,
get_element_by_id,
int_or_none,
js_to_json,
mimetype2ext,
orderedSet,
parse_codecs,
@ -70,6 +71,8 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
_LOGIN_REQUIRED = False
_PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}'
_INITIAL_DATA_RE = r'(?:window\["ytInitialData"\]|ytInitialData)\W?=\W?({.*?});'
_YTCFG_DATA_RE = r"ytcfg.set\(({.*?})\)"
_YOUTUBE_CLIENT_HEADERS = {
'x-youtube-client-name': '1',
@ -274,7 +277,6 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
def _download_webpage_handle(self, *args, **kwargs):
query = kwargs.get('query', {}).copy()
query['disable_polymer'] = 'true'
kwargs['query'] = query
return super(YoutubeBaseInfoExtractor, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))
@ -297,16 +299,61 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
class YoutubeEntryListBaseInfoExtractor(YoutubeBaseInfoExtractor):
# Extract entries from page with "Load more" button
def _entries(self, page, playlist_id):
more_widget_html = content_html = page
mobj_reg = r'(?:(?:data-uix-load-more-href="[^"]+?;continuation=)|(?:"continuation":"))(?P<more>[^"]+)"'
for page_num in itertools.count(1):
for entry in self._process_page(content_html):
def _find_entries_in_json(self, extracted):
entries = []
c = {}
def _real_find(obj):
if obj is None or isinstance(obj, str):
return
if type(obj) is list:
for elem in obj:
_real_find(elem)
if type(obj) is dict:
if self._is_entry(obj):
entries.append(obj)
return
if 'continuationCommand' in obj:
c['continuation'] = obj
return
for _, o in obj.items():
_real_find(o)
_real_find(extracted)
return entries, try_get(c, lambda x: x["continuation"])
def _entries(self, page, playlist_id, max_pages=None):
seen = []
yt_conf = {}
for m in re.finditer(self._YTCFG_DATA_RE, page):
parsed = self._parse_json(m.group(1), playlist_id,
transform_source=js_to_json, fatal=False)
if parsed:
yt_conf.update(parsed)
data_json = self._parse_json(self._search_regex(self._INITIAL_DATA_RE, page, 'ytInitialData'), None)
for page_num in range(1, max_pages + 1) if max_pages is not None else itertools.count(1):
entries, continuation = self._find_entries_in_json(data_json)
processed = self._process_entries(entries, seen)
if not processed:
break
for entry in processed:
yield entry
mobj = re.search(mobj_reg, more_widget_html)
if not mobj:
if not continuation or not yt_conf:
break
continuation_token = try_get(continuation, lambda x: x['continuationCommand']['token'])
continuation_url = try_get(continuation, lambda x: x['commandMetadata']['webCommandMetadata']['apiUrl'])
if not continuation_token or not continuation_url:
break
count = 0
@ -315,12 +362,23 @@ class YoutubeEntryListBaseInfoExtractor(YoutubeBaseInfoExtractor):
try:
# Downloading page may result in intermittent 5xx HTTP error
# that is usually worked around with a retry
more = self._download_json(
'https://www.youtube.com/browse_ajax?ctoken=%s' % mobj.group('more'), playlist_id,
'Downloading page #%s%s'
% (page_num, ' (retry #%d)' % count if count else ''),
data_json = self._download_json(
'https://www.youtube.com%s' % continuation_url,
playlist_id,
'Downloading continuation page #%s%s' % (page_num, ' (retry #%d)' % count if count else ''),
transform_source=uppercase_escape,
headers=self._YOUTUBE_CLIENT_HEADERS)
query={
'key': try_get(yt_conf, lambda x: x['INNERTUBE_API_KEY'])
},
data=bytes(json.dumps({
'context': try_get(yt_conf, lambda x: x['INNERTUBE_CONTEXT']),
'continuation': continuation_token
}), encoding='utf-8'),
headers={
'Content-Type': 'application/json'
}
)
break
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
@ -329,31 +387,30 @@ class YoutubeEntryListBaseInfoExtractor(YoutubeBaseInfoExtractor):
continue
raise
content_html = more['content_html']
if not content_html.strip():
# Some webpages show a "Load more" button but they don't
# have more videos
break
more_widget_html = more['load_more_widget_html']
def _extract_title(self, renderer):
title = try_get(renderer, lambda x: x['title']['runs'][0]['text'], compat_str)
if title:
return title
return try_get(renderer, lambda x: x['title']['simpleText'], compat_str)
class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
def _process_page(self, content):
for video_id, video_title in self.extract_videos_from_page(content):
yield self.url_result(video_id, 'Youtube', video_id, video_title)
def _is_entry(self, obj):
return 'videoId' in obj
def extract_videos_from_page_impl(self, video_re, page, ids_in_page, titles_in_page):
for mobj in re.finditer(video_re, page):
# The link with index 0 is not the first video of the playlist (not sure if still actual)
if 'index' in mobj.groupdict() and mobj.group('id') == '0':
def _process_entries(self, entries, seen):
ids_in_page = []
titles_in_page = []
for renderer in entries:
video_id = try_get(renderer, lambda x: x['videoId'])
video_title = self._extract_title(renderer)
if video_id is None or video_title is None:
# we do not have a videoRenderer or title extraction broke
continue
video_id = mobj.group('id')
video_title = unescapeHTML(
mobj.group('title')) if 'title' in mobj.groupdict() else None
if video_title:
video_title = video_title.strip()
if video_title == '► Play all':
video_title = None
video_title = video_title.strip()
try:
idx = ids_in_page.index(video_id)
if video_title and not titles_in_page[idx]:
@ -362,19 +419,17 @@ class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
ids_in_page.append(video_id)
titles_in_page.append(video_title)
def extract_videos_from_page(self, page):
ids_in_page = []
titles_in_page = []
self.extract_videos_from_page_impl(
self._VIDEO_RE, page, ids_in_page, titles_in_page)
return zip(ids_in_page, titles_in_page)
for video_id, video_title in zip(ids_in_page, titles_in_page):
yield self.url_result(video_id, 'Youtube', video_id, video_title)
class YoutubePlaylistsBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
def _process_page(self, content):
for playlist_id in orderedSet(re.findall(
r'"/?playlist\?list=([0-9A-Za-z-_]{10,})"',
content)):
def _is_entry(self, obj):
return 'playlistId' in obj
def _process_entries(self, entries, seen):
for playlist_id in orderedSet(try_get(r, lambda x: x['playlistId']) for r in entries):
yield self.url_result(
'https://www.youtube.com/playlist?list=%s' % playlist_id, 'YoutubePlaylist')
@ -3241,11 +3296,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
}]
class YoutubeSearchBaseInfoExtractor(YoutubePlaylistBaseInfoExtractor):
_VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})(?:[^"]*"[^>]+\btitle="(?P<title>[^"]+))?'
class YoutubeSearchIE(SearchInfoExtractor, YoutubeSearchBaseInfoExtractor):
class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistBaseInfoExtractor):
IE_DESC = 'YouTube.com searches'
# there doesn't appear to be a real limit, for example if you search for
# 'python' you get more than 8.000.000 results
@ -3342,11 +3393,10 @@ class YoutubeSearchDateIE(YoutubeSearchIE):
_SEARCH_PARAMS = 'CAI%3D'
class YoutubeSearchURLIE(YoutubeSearchBaseInfoExtractor):
class YoutubeSearchURLIE(YoutubePlaylistBaseInfoExtractor):
IE_DESC = 'YouTube.com search URLs'
IE_NAME = 'youtube:search_url'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/results\?(.*?&)?(?:search_query|q)=(?P<query>[^&]+)(?:[&]|$)'
_SEARCH_DATA = r'(?:window\["ytInitialData"\]|ytInitialData)\W?=\W?({.*?});'
_TESTS = [{
'url': 'https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video',
'playlist_mincount': 5,
@ -3358,63 +3408,20 @@ class YoutubeSearchURLIE(YoutubeSearchBaseInfoExtractor):
'only_matching': True,
}]
def _find_videos_in_json(self, extracted):
videos = []
def _process_json_dict(self, obj, videos, c):
if "videoId" in obj:
videos.append(obj)
return
def _real_find(obj):
if obj is None or isinstance(obj, str):
return
if type(obj) is list:
for elem in obj:
_real_find(elem)
if type(obj) is dict:
if "videoId" in obj:
videos.append(obj)
return
for _, o in obj.items():
_real_find(o)
_real_find(extracted)
return videos
def extract_videos_from_page_impl(self, page, ids_in_page, titles_in_page):
search_response = self._parse_json(self._search_regex(self._SEARCH_DATA, page, 'ytInitialData'), None)
result_items = self._find_videos_in_json(search_response)
for renderer in result_items:
video_id = try_get(renderer, lambda x: x['videoId'])
video_title = try_get(renderer, lambda x: x['title']['runs'][0]['text']) or try_get(renderer, lambda x: x['title']['simpleText'])
if video_id is None or video_title is None:
# we do not have a videoRenderer or title extraction broke
continue
video_title = video_title.strip()
try:
idx = ids_in_page.index(video_id)
if video_title and not titles_in_page[idx]:
titles_in_page[idx] = video_title
except ValueError:
ids_in_page.append(video_id)
titles_in_page.append(video_title)
def extract_videos_from_page(self, page):
ids_in_page = []
titles_in_page = []
self.extract_videos_from_page_impl(page, ids_in_page, titles_in_page)
return zip(ids_in_page, titles_in_page)
if "nextContinuationData" in obj:
c["continuation"] = obj["nextContinuationData"]
return
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
query = compat_urllib_parse_unquote_plus(mobj.group('query'))
webpage = self._download_webpage(url, query)
return self.playlist_result(self._process_page(webpage), playlist_title=query)
return self.playlist_result(self._entries(webpage, query, max_pages=5), playlist_title=query)
class YoutubeShowIE(YoutubePlaylistsBaseInfoExtractor):
@ -3436,14 +3443,12 @@ class YoutubeShowIE(YoutubePlaylistsBaseInfoExtractor):
'https://www.youtube.com/show/%s/playlists' % playlist_id)
class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
class YoutubeFeedsInfoExtractor(YoutubePlaylistBaseInfoExtractor):
"""
Base class for feed extractors
Subclasses must define the _FEED_NAME and _PLAYLIST_TITLE properties.
"""
_LOGIN_REQUIRED = True
_FEED_DATA = r'(?:window\["ytInitialData"\]|ytInitialData)\W?=\W?({.*?});'
_YTCFG_DATA = r"ytcfg.set\(({.*?})\)"
@property
def IE_NAME(self):
@ -3452,96 +3457,35 @@ class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
def _real_initialize(self):
self._login()
def _find_videos_in_json(self, extracted):
videos = []
c = {}
def _process_entries(self, entries, seen):
new_info = []
for v in entries:
v_id = try_get(v, lambda x: x['videoId'])
if not v_id:
continue
def _real_find(obj):
if obj is None or isinstance(obj, str):
return
have_video = False
for old in seen:
if old['videoId'] == v_id:
have_video = True
break
if type(obj) is list:
for elem in obj:
_real_find(elem)
if not have_video:
new_info.append(v)
if type(obj) is dict:
if "videoId" in obj:
videos.append(obj)
return
if not new_info:
return
if "nextContinuationData" in obj:
c["continuation"] = obj["nextContinuationData"]
return
for _, o in obj.items():
_real_find(o)
_real_find(extracted)
return videos, try_get(c, lambda x: x["continuation"])
def _entries(self, page):
info = []
yt_conf = self._parse_json(self._search_regex(self._YTCFG_DATA, page, 'ytcfg.set', default="null"), None, fatal=False)
search_response = self._parse_json(self._search_regex(self._FEED_DATA, page, 'ytInitialData'), None)
for page_num in itertools.count(1):
video_info, continuation = self._find_videos_in_json(search_response)
new_info = []
for v in video_info:
v_id = try_get(v, lambda x: x['videoId'])
if not v_id:
continue
have_video = False
for old in info:
if old['videoId'] == v_id:
have_video = True
break
if not have_video:
new_info.append(v)
if not new_info:
break
info.extend(new_info)
for video in new_info:
yield self.url_result(try_get(video, lambda x: x['videoId']), YoutubeIE.ie_key(), video_title=try_get(video, lambda x: x['title']['runs'][0]['text']) or try_get(video, lambda x: x['title']['simpleText']))
if not continuation or not yt_conf:
break
search_response = self._download_json(
'https://www.youtube.com/browse_ajax', self._PLAYLIST_TITLE,
'Downloading page #%s' % page_num,
transform_source=uppercase_escape,
query={
"ctoken": try_get(continuation, lambda x: x["continuation"]),
"continuation": try_get(continuation, lambda x: x["continuation"]),
"itct": try_get(continuation, lambda x: x["clickTrackingParams"])
},
headers={
"X-YouTube-Client-Name": try_get(yt_conf, lambda x: x["INNERTUBE_CONTEXT_CLIENT_NAME"]),
"X-YouTube-Client-Version": try_get(yt_conf, lambda x: x["INNERTUBE_CONTEXT_CLIENT_VERSION"]),
"X-Youtube-Identity-Token": try_get(yt_conf, lambda x: x["ID_TOKEN"]),
"X-YouTube-Device": try_get(yt_conf, lambda x: x["DEVICE"]),
"X-YouTube-Page-CL": try_get(yt_conf, lambda x: x["PAGE_CL"]),
"X-YouTube-Page-Label": try_get(yt_conf, lambda x: x["PAGE_BUILD_LABEL"]),
"X-YouTube-Variants-Checksum": try_get(yt_conf, lambda x: x["VARIANTS_CHECKSUM"]),
})
seen.extend(new_info)
for video in new_info:
yield self.url_result(try_get(video, lambda x: x['videoId']), YoutubeIE.ie_key(), video_title=self._extract_title(video))
def _real_extract(self, url):
page = self._download_webpage(
'https://www.youtube.com/feed/%s' % self._FEED_NAME,
self._PLAYLIST_TITLE)
return self.playlist_result(
self._entries(page), playlist_title=self._PLAYLIST_TITLE)
return self.playlist_result(self._entries(page, self._PLAYLIST_TITLE),
playlist_title=self._PLAYLIST_TITLE)
class YoutubeWatchLaterIE(YoutubePlaylistIE):

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2020.10.25'
__version__ = '2020.11.11-1'