from langchain.document_loaders.generic import GenericLoader from langchain.document_loaders.parsers import OpenAIWhisperParser from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader
# 调用 GenericLoader Class 的函数 load对视频的音频文件进行加载 pages = loader.load()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[youtube] Extracting URL: https://www.youtube.com/watch?v=_PHdzsQaDgw [youtube] _PHdzsQaDgw: Downloading webpage [youtube] _PHdzsQaDgw: Downloading ios player API JSON [youtube] _PHdzsQaDgw: Downloading android player API JSON [youtube] _PHdzsQaDgw: Downloading m3u8 information WARNING: [youtube] Failed to download m3u8 information: HTTP Error 429: Too Many Requests [info] _PHdzsQaDgw: Downloading 1 format(s): 140 [download] docs/youtube-zh//【2023年7月最新】ChatGPT注册教程,国内详细注册流程,支持中文使用,chatgpt 中国怎么用?.m4a has already been downloaded [download] 100% of 7.72MiB [ExtractAudio] Not converting audio docs/youtube-zh//【2023年7月最新】ChatGPT注册教程,国内详细注册流程,支持中文使用,chatgpt 中国怎么用?.m4a; file is already in target format m4a Transcribing part 1!
网页文档
1 2 3 4 5 6 7 8 9 10 11 12 13
from langchain.document_loaders import WebBaseLoader