Compare commits

..

5 Commits

Author SHA1 Message Date
theliu d3f1949a6a v1.0 2026-04-25 14:23:16 +08:00
theliu e6308db035 v1.0 2026-04-25 14:21:44 +08:00
theliu a64a609257 Initial commit: V1 2026-04-25 14:17:44 +08:00
theliu ded466b38f Initial commit: V1 2026-04-25 14:11:49 +08:00
theliu 3fe9b00de7 Initial commit: V1 2026-04-25 14:10:09 +08:00
10 changed files with 288 additions and 281 deletions
+6 -4
View File
@@ -12,13 +12,15 @@ models/
# Workspace data (user-generated)
workspace/
# Virtual env
venv/
.venv/
# Backup
_backup/
# Environment
.env
venv/
.venv/
# Backup
_backup/
# IDE
.vscode/
+97 -93
View File
@@ -1,132 +1,136 @@
# Videoer
# VidMarmot
**AI-powered video generation pipeline**从文章到视频的一站式工具。
> **VidMarmot** — 为英语课本音频配画的 AI 工具。
一篇文文章(文本)和对应的朗读音频,自动完成:
给一篇文文本 + 对应的朗读音频,VidMarmot 会自动拆分场景、生成配图、对齐语音时间轴,最终合成一个带字幕的视频。
## 为什么做这个?
老师总让我帮忙做课文视频。一次两次还好,做多了真的烦。
所以我就写了这个工具——把整个流程自动化了:放进去文本和音频,点几下按钮,视频就出来了。
## 主要用途
- **英语课本课文** — 给每篇课文的朗读音频配上场景画面
- **故事类文章** — 自动拆分场景,逐张生成配图
- **教学演示** — 生成带字幕的场景切换视频
```
文本 + 朗读音频 → AI 场景划分 → 逐场景生成配图 → ASR 时间对齐 → 合成视频(含字幕)
文文本 + 朗读音频 → AI 拆分场景 → 逐场景生成配图 → 语音对齐时间轴 → 合成视频(含字幕)
```
## Preview
## 功能
- **AI 场景划分** — 支持 Qwen / GLM / DeepSeek / 阿里云百炼 / OpenAI 兼容接口
- **AI 文生图** — 支持 Kolors / Qwen-Image 模型,逐张生成场景配图
- **逐张审查** — 每张图生成后可以预览、确认、重新生成或跳过
- **语音对齐** — 基于 Qwen3-ForcedAligner 的 ASR 强制对齐
- **视频合成** — MoviePy 合成最终视频,自动添加字幕
## 预览
![Pipeline Overview](docs/pipeline.png)
## Features
## 快速开始
- **AI Scene Planning** — 基于 LLMQwen / GLM)智能划分场景,提取角色、画面描述
- **AI Image Generation** — 支持 Kolors / Qwen-Image 文生图模型,逐张生成场景配图
- **Interactive Review** — 逐张审查、确认/重新生成场景图
- **Forced Alignment** — 基于 Qwen3-ForcedAligner 的语音-文本时间对齐
- **Video Synthesis** — MoviePy 合成最终视频,自动添加字幕
## Architecture
```
release1/
├── gui.py # PyQt6 GUI (main entry)
├── scene_plan.py # LLM scene planning + prompt engineering
├── image_gen.py # Text-to-image API calls
├── asr.py # ASR forced alignment
├── make_video.py # Video synthesis + subtitle rendering
├── text_ai.py # Shared LLM API client
├── config.py # Model paths, API keys, defaults
├── run.bat # Windows launcher
└── qwen_download.py # One-time model download script
```
## Workflow
```
1. Select workspace (folder with article.txt + voice.mp3)
2. AI Scene Planning → scene_plan.json
3. Image Generation → scene_01.png, scene_02.png, ...
4. ASR Alignment → result.json + timestamps into scene_plan
5. Video Synthesis → output_video.mp4
```
## Quick Start
### Prerequisites
### 环境要求
- Python 3.12+
- Conda (recommended)
- NVIDIA GPU (for local ASR model)
- Conda
- NVIDIA GPU(本地 ASR 模型需要)
### Setup
### 安装
```bash
# Create conda environment
conda create -n Videoer python=3.12 -y
conda activate Videoer
# 创建环境
conda create -n VidMarmot python=3.12 -y
conda activate VidMarmot
# Install dependencies
pip install PyQt6 moviepy Pillow requests openai
pip install funasr modelscope torch torchaudio
# 安装依赖
pip install -r requirements.txt
# Download ASR model
# 下载 ASR 模型(约 1.2GB
python qwen_download.py
```
### Configuration
### 配置
Edit `config.py` to set your API keys:
编辑 `config.py`,在对应模型的 `api_key` 字段填入你的 Key。只需填你用到的服务即可。
```python
# LLM providers (scene planning)
LLM_PROVIDERS = {
"Qwen3.5-35B (ModelScope)": {
"api_key": "YOUR_KEY",
...
},
...
}
| 服务 | 用途 | 免费额度 |
|------|------|----------|
| ModelScope | LLM (Qwen3.5-35B) + 文生图 | 有 |
| 硅基流动 | LLM (GLM-4/Qwen3-32B) + 文生图 | 有 |
| 阿里云百炼 | LLM (Qwen3-235B) | 有 |
| DeepSeek | LLM (V3/R1) | 有 |
| OpenAI 兼容 | 自定义 Router | - |
# Image generation
SILICONFLOW_API_KEY = "YOUR_KEY"
MODELSCOPE_API_KEY = "YOUR_KEY"
```
> **Tip**: ModelScope and SiliconFlow both offer free-tier API keys.
### Run
### 运行
```bash
# GUI mode (recommended)
python gui.py
# Or on Windows
# Windows 双击
run.bat
```
### Workspace Structure
### 工作区结构
Each video project lives in a workspace folder:
每个视频项目是一个文件夹:
```
workspace/my_project/
├── article.txt # Source article text
├── voice.mp3 # Narration audio
├── scene_plan.json # Generated scene plan (auto)
├── result.json # ASR alignment result (auto)
├── scene_01.png # Generated images (auto)
├── scene_02.png
├── ...
└── output_video.mp4 # Final output (auto)
workspace/my_lesson/
├── article.txt # 课文文本
├── voice.mp3 # 朗读音频
├── scene_plan.json # 场景计划(自动生成)
├── result.json # ASR 对齐结果(自动生成)
├── scene/ # 生成的场景图
├── scene_001.png
│ ├── scene_002.png
│ └── ...
└── output_video.mp4 # 最终视频(自动生成)
```
## Dependencies
## 项目结构
| Package | Purpose |
|---------|---------|
| PyQt6 | GUI framework |
| moviepy | Video composition |
| Pillow | Image processing / subtitle rendering |
| requests | HTTP API calls |
| openai | Compatible LLM client (OpenAI API format) |
| funasr | ASR forced alignment |
| modelscope | Model loading |
| torch / torchaudio | GPU inference backend |
```
├── gui.py # PyQt6 GUI(主入口)
├── scene_plan.py # AI 场景划分 + Prompt 工程
├── image_gen.py # 文生图 API 调用
├── asr.py # ASR 强制对齐
├── make_video.py # 视频合成 + 字幕渲染
├── text_ai.py # LLM API 客户端
├── config.py # 全部配置(API Key、模型路径、参数)
├── qwen_download.py # ASR 模型下载脚本
├── requirements.txt # Python 依赖
├── run.bat # Windows 启动脚本
└── .gitignore
```
## 依赖
`requirements.txt`
| 包 | 用途 |
|----|------|
| PyQt6 | GUI 框架 |
| moviepy | 视频合成 |
| Pillow | 图片处理 / 字幕渲染 |
| numpy | 数值计算 |
| requests | HTTP API 调用 |
| openai | 兼容 OpenAI 格式的 LLM 客户端 |
| funasr | ASR 强制对齐 |
| modelscope | 模型加载 |
| torch / torchaudio | GPU 推理后端 |
| mutagen | 音频时长获取(可选 fallback |
## Roadmap
- [ ] **图生视频** — 用生成的场景图做图生视频,让每张静态图变成动态片段,最终拼接成真正的动态视频
- [ ] 更多文生图模型支持
- [ ] 批量处理多个课文
- [ ] 打包为可执行文件(pyinstaller
## License
+54 -39
View File
@@ -1,81 +1,96 @@
"""
release1 配置文件
集中管理所有模型路径、API Key、默认参数
config.py - All configuration for VidMarmot.
API keys, model paths, default parameters — everything lives here.
Edit this file directly to configure your setup.
"""
import os
# ========== 基础路径 ==========
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
VIDEO_PROJECT_DIR = os.path.dirname(BASE_DIR) # 上级 video/ 目录
# ========== ASR 模型(绝对路径指向 video/models/==========
ASR_MODEL_DIR = os.path.join(
r'C:\pythonproject\video', 'models', 'qwen', 'Qwen3-ForcedAligner-0.6B'
).replace('\\', '/')
# ========== LLM 提供商(划分场景/角色提取用)==========
# ========== ASR Model ==========
# Default: project_dir/models/qwen/Qwen3-ForcedAligner-0.6B
# Override via env var VIDMARMOT_ASR_MODEL_DIR
ASR_MODEL_DIR = os.path.join(BASE_DIR, "models", "qwen", "Qwen3-ForcedAligner-0.6B").replace("\\", "/")
# ========== LLM Providers (scene planning / text generation) ==========
# Fill in your API keys below. Only providers with keys will be usable.
LLM_PROVIDERS = {
"Qwen3.5-35B (ModelScope 免费)": {
"api_key": "ms-38de567b-cf88-4523-bac2-ff63d8f1e0f6",
"Qwen3.5-35B (ModelScope)": {
"api_key": "", # ← put your ModelScope API key here
"api_base": "https://api-inference.modelscope.cn/v1/",
"model": "Qwen/Qwen3.5-35B-A3B",
},
"GLM-4-9B (硅基流动 免费)": {
"api_key": "sk-mjqgwknbttvqnrjjfnxemtjgdivogjaqsftbvoifwjvruwsq",
"GLM-4-9B (SiliconFlow)": {
"api_key": "", # ← put your SiliconFlow API key here
"api_base": "https://api.siliconflow.cn/v1/",
"model": "THUDM/glm-4-9b-chat",
},
"Qwen3-32B (硅基流动 付费)": {
"api_key": "sk-mjqgwknbttvqnrjjfnxemtjgdivogjaqsftbvoifwjvruwsq",
"Qwen3-32B (SiliconFlow)": {
"api_key": "", # ← put your SiliconFlow API key here
"api_base": "https://api.siliconflow.cn/v1/",
"model": "Qwen/Qwen3-32B",
},
"GLM-5 (ModelScope 免费)": {
"api_key": "ms-38de567b-cf88-4523-bac2-ff63d8f1e0f6",
"GLM-5 (ModelScope)": {
"api_key": "", # ← put your ModelScope API key here
"api_base": "https://api-inference.modelscope.cn/v1/",
"model": "ZhipuAI/GLM-5",
},
"Qwen3-235B-A22B (Aliyun)": {
"api_key": "", # ← put your Aliyun DashScope API key here
"api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1/",
"model": "qwen3-235b-a22b",
},
"DeepSeek-V3": {
"api_key": "", # ← put your DeepSeek API key here
"api_base": "https://api.deepseek.com/v1/",
"model": "deepseek-chat",
},
"DeepSeek-R1": {
"api_key": "", # ← put your DeepSeek API key here
"api_base": "https://api.deepseek.com/v1/",
"model": "deepseek-reasoner",
},
"OpenAI (Custom Router)": {
"api_key": "", # ← put your OpenAI-compatible API key here
"api_base": "https://api.openai.com/v1/", # change if using a custom router
"model": "gpt-4o",
},
}
# 默认 LLM(兼容旧代码)
DEFAULT_LLM = "Qwen3.5-35B (ModelScope 免费)"
LLM_API_KEY = LLM_PROVIDERS[DEFAULT_LLM]["api_key"]
LLM_API_BASE = LLM_PROVIDERS[DEFAULT_LLM]["api_base"]
LLM_MODEL = LLM_PROVIDERS[DEFAULT_LLM]["model"]
# ========== SiliconFlow APIKolors 文生图)==========
SILICONFLOW_API_KEY = "sk-mjqgwknbttvqnrjjfnxemtjgdivogjaqsftbvoifwjvruwsq"
SILICONFLOW_API_BASE = "https://api.siliconflow.cn/v1/images/generations"
# ========== ModelScope APIQwen 文生图)==========
MODELSCOPE_API_KEY = "ms-38de567b-cf88-4523-bac2-ff63d8f1e0f6"
MODELSCOPE_API_BASE = "https://api-inference.modelscope.cn/v1/images/generations"
MODELSCOPE_POLL_INTERVAL = 3 # 轮询间隔(秒)
MODELSCOPE_MAX_WAIT = 180 # 最大等待时间(秒)
# ========== 文生图模型 ==========
# ========== Text-to-Image Models ==========
# Fill in your API keys below.
IMAGE_MODELS = {
"Kolors(便宜快速)": {
"Kolors (SiliconFlow)": {
"provider": "siliconflow",
"api_key": "", # ← put your SiliconFlow API key here
"api_base": "https://api.siliconflow.cn/v1/images/generations",
"model": "Kwai-Kolors/Kolors",
"default_size": "1280x720",
"guidance_scale": 7.5,
},
"Qwen-Image(高质量)": {
"Qwen-Image (ModelScope)": {
"provider": "modelscope",
"api_key": "", # ← put your ModelScope API key here
"api_base": "https://api-inference.modelscope.cn/v1/images/generations",
"poll_interval": 3,
"max_wait": 180,
"model": "Qwen/Qwen-Image-2512",
"default_size": "1280x720",
"guidance_scale": 7.5,
},
}
# 默认文生图模型
DEFAULT_IMAGE_MODEL = "Kolors(便宜快速)"
DEFAULT_IMAGE_MODEL = "Kolors (SiliconFlow)"
# ========== 默认参数 ==========
# ========== Defaults ==========
DEFAULT_FPS = 24
DEFAULT_VIDEO_SIZE = "1280x720"
# ========== 通用 negative prompt ==========
# Negative prompt for image generation
NEGATIVE_PROMPT = "blurry, low quality, deformed, text, letters, words, subtitle, logo, watermark, caption, label, number"
BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 491 KiB

After

Width:  |  Height:  |  Size: 471 KiB

+12 -12
View File
@@ -1,6 +1,6 @@
#!/usr/bin/env python3
"""
gui.py - 视频制作流水线 GUIrelease1
gui.py - VidMarmot GUI
唯一入口,PyQt6 暗色主题
流程:选工作区 → 划分场景 → 逐张生成+审查 → ASR → 合成视频
@@ -40,7 +40,8 @@ from PyQt6.QtCore import Qt, QThread, pyqtSignal, QMutex, QWaitCondition, QTimer
from PyQt6.QtGui import QPixmap, QImage, QFont, QColor, QIcon
from config import IMAGE_MODELS, DEFAULT_IMAGE_MODEL, DEFAULT_FPS, DEFAULT_VIDEO_SIZE, LLM_PROVIDERS, DEFAULT_LLM
from config import (DEFAULT_FPS, DEFAULT_VIDEO_SIZE,
LLM_PROVIDERS, IMAGE_MODELS, DEFAULT_IMAGE_MODEL)
# ============================================================
@@ -635,7 +636,7 @@ class GenerationWorker(QThread):
class VideoPipelineGUI(QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle("视频制作流水线 - Release 1")
self.setWindowTitle("VidMarmot")
self.setGeometry(80, 80, 1280, 820)
# 状态
@@ -670,7 +671,7 @@ class VideoPipelineGUI(QMainWindow):
main_layout.setSpacing(8)
# --- 标题 ---
title = QLabel("视频制作流水线")
title = QLabel("VidMarmot")
title.setObjectName("titleLabel")
title.setAlignment(Qt.AlignmentFlag.AlignCenter)
main_layout.addWidget(title)
@@ -690,22 +691,21 @@ class VideoPipelineGUI(QMainWindow):
top_bar.addSpacing(20)
# LLM 模型选择(场景划分用)
# LLM model selector — show all providers, default to first
top_bar.addWidget(QLabel("语言模型:"))
self.llm_combo = QComboBox()
self.llm_combo.addItems(LLM_PROVIDERS.keys())
idx = list(LLM_PROVIDERS.keys()).index(DEFAULT_LLM)
self.llm_combo.setCurrentIndex(idx)
top_bar.addWidget(self.llm_combo)
top_bar.addSpacing(20)
# 文生图模型选择
# Image model selector — show all models, default to first
top_bar.addWidget(QLabel("文生图模型:"))
self.model_combo = QComboBox()
self.model_combo.addItems(IMAGE_MODELS.keys())
idx = list(IMAGE_MODELS.keys()).index(DEFAULT_IMAGE_MODEL)
self.model_combo.setCurrentIndex(idx)
default_img = DEFAULT_IMAGE_MODEL
if default_img and default_img in IMAGE_MODELS:
self.model_combo.setCurrentText(default_img)
top_bar.addWidget(self.model_combo)
top_bar.addSpacing(20)
@@ -884,8 +884,8 @@ class VideoPipelineGUI(QMainWindow):
main_layout.addWidget(splitter, stretch=1)
# 初始日志
self.log("视频制作流水线 v1.0 已启动")
# Startup log
self.log("VidMarmot 已启动")
self.log("请先选择一个工作区文件夹(包含 article.txt")
# ============================================================
+70 -77
View File
@@ -1,28 +1,24 @@
"""
image_gen.py - 统一文生图接口
支持两个模型:
- Kolors(便宜快速)→ SiliconFlow API(同步)
- Qwen-Image(高质量)→ ModelScope API(异步轮询)
image_gen.py - Unified text-to-image interface.
Providers:
- SiliconFlow (Kolors) — sync API
- ModelScope (Qwen-Image) — async polling API
"""
import requests
import os
import time
from datetime import datetime
from config import (
SILICONFLOW_API_KEY,
SILICONFLOW_API_BASE,
MODELSCOPE_API_KEY,
MODELSCOPE_API_BASE,
MODELSCOPE_POLL_INTERVAL,
MODELSCOPE_MAX_WAIT,
IMAGE_MODELS,
NEGATIVE_PROMPT,
)
from config import IMAGE_MODELS, NEGATIVE_PROMPT
def _generate_siliconflow(prompt, model_id, size, guidance, neg, save_dir, filename):
"""SiliconFlow 同步 APIKolors"""
def _generate_siliconflow(prompt, model_id, size, guidance, neg, save_dir, filename, api_key, api_base):
"""SiliconFlow sync API"""
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
}
payload = {
"model": model_id,
"prompt": prompt,
@@ -33,37 +29,32 @@ def _generate_siliconflow(prompt, model_id, size, guidance, neg, save_dir, filen
"negative_prompt": neg,
}
headers = {
"Authorization": f"Bearer {SILICONFLOW_API_KEY}",
"Content-Type": "application/json",
}
print(f" [SiliconFlow] {prompt[:60]}{'...' if len(prompt) > 60 else ''}")
print(f" [SiliconFlow] 提交: {prompt[:60]}{'...' if len(prompt) > 60 else ''}")
for attempt in range(6): # 最多重试 5 次
resp = requests.post(SILICONFLOW_API_BASE, headers=headers, json=payload, timeout=120)
for attempt in range(6):
resp = requests.post(api_base, headers=headers, json=payload, timeout=120)
print(f" HTTP {resp.status_code}: {resp.text[:300]}")
if resp.status_code == 429:
wait = 15 * (attempt + 1) # 15s, 30s, 45s, 60s, 75s
print(f" [!] 限频,等待 {wait}s 后重试 ({attempt+1}/5)...")
wait = 15 * (attempt + 1)
print(f" [!] Rate limited, waiting {wait}s ({attempt+1}/5)...")
time.sleep(wait)
continue
if resp.status_code != 200:
raise Exception(f"SiliconFlow 生成失败 ({resp.status_code}): {resp.text[:300]}")
raise Exception(f"SiliconFlow error ({resp.status_code}): {resp.text[:300]}")
break
else:
raise Exception("SiliconFlow 持续限频,已重试 5 次,请稍后再试或切换模型")
raise Exception("SiliconFlow rate limit, retried 5 times.")
result = resp.json()
images = result.get("images", [])
if not images:
raise Exception(f"SiliconFlow 返回无图片: {result}")
raise Exception(f"SiliconFlow returned no images: {result}")
img_url = images[0].get("url")
if not img_url:
raise Exception(f"返回图片 URL 为空: {result}")
raise Exception(f"Empty image URL: {result}")
img_data = requests.get(img_url, timeout=60).content
@@ -78,12 +69,12 @@ def _generate_siliconflow(prompt, model_id, size, guidance, neg, save_dir, filen
return {"url": img_url, "filepath": filepath}
def _generate_modelscope(prompt, model_id, size, guidance, neg, save_dir, filename):
"""ModelScope 异步轮询 APIQwen-Image"""
def _generate_modelscope(prompt, model_id, size, guidance, neg, save_dir, filename, api_key, api_base):
"""ModelScope async polling API"""
submit_headers = {
"Authorization": f"Bearer {MODELSCOPE_API_KEY}",
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"X-ModelScope-Async-Mode": "true"
"X-ModelScope-Async-Mode": "true",
}
payload = {
"model": model_id,
@@ -94,31 +85,32 @@ def _generate_modelscope(prompt, model_id, size, guidance, neg, save_dir, filena
"negative_prompt": neg,
}
print(f" [ModelScope] 提交: {prompt[:60]}{'...' if len(prompt) > 60 else ''}")
resp = requests.post(MODELSCOPE_API_BASE, headers=submit_headers, json=payload, timeout=60)
print(f" [ModelScope] {prompt[:60]}{'...' if len(prompt) > 60 else ''}")
resp = requests.post(api_base, headers=submit_headers, json=payload, timeout=60)
if resp.status_code != 200:
raise Exception(f"ModelScope 提交失败 ({resp.status_code}): {resp.text[:300]}")
raise Exception(f"ModelScope submit failed ({resp.status_code}): {resp.text[:300]}")
result = resp.json()
task_id = result.get("task_id")
if not task_id:
raise Exception(f"未找到 task_id: {result}")
raise Exception(f"No task_id: {result}")
print(f" task_id: {task_id}")
# 轮询结果
query_headers = {
"Authorization": f"Bearer {MODELSCOPE_API_KEY}",
"X-ModelScope-Task-Type": "image_generation"
"Authorization": f"Bearer {api_key}",
"X-ModelScope-Task-Type": "image_generation",
}
status_url = f"https://api-inference.modelscope.cn/v1/tasks/{task_id}"
poll_interval = IMAGE_MODELS["Qwen-Image (ModelScope)"].get("poll_interval", 3)
max_wait = IMAGE_MODELS["Qwen-Image (ModelScope)"].get("max_wait", 180)
start = time.time()
for attempt in range(100):
if attempt > 0:
time.sleep(MODELSCOPE_POLL_INTERVAL)
time.sleep(poll_interval)
elapsed = int(time.time() - start)
if elapsed > MODELSCOPE_MAX_WAIT:
raise Exception(f"ModelScope 超时({MODELSCOPE_MAX_WAIT}s")
if elapsed > max_wait:
raise Exception(f"ModelScope timeout ({max_wait}s)")
qresp = requests.get(status_url, headers=query_headers, timeout=30)
if qresp.status_code != 200:
@@ -130,11 +122,13 @@ def _generate_modelscope(prompt, model_id, size, guidance, neg, save_dir, filena
print(f" [{elapsed}s] {task_status}")
if task_status == "SUCCEED":
output_images = (qresult.get("output_images")
or qresult.get("outputs", {}).get("output_images")
or [])
output_images = (
qresult.get("output_images")
or qresult.get("outputs", {}).get("output_images")
or []
)
if not output_images:
raise Exception(f"SUCCEED 但无图片: {qresult}")
raise Exception(f"SUCCEED but no images: {qresult}")
url = output_images[0]
img_data = requests.get(url, timeout=180).content
@@ -149,34 +143,30 @@ def _generate_modelscope(prompt, model_id, size, guidance, neg, save_dir, filena
return {"url": url, "filepath": filepath}
elif task_status == "FAILED":
raise Exception(f"ModelScope 任务失败: {qresult.get('errors', qresult)}")
raise Exception(f"ModelScope task failed: {qresult.get('errors', qresult)}")
raise Exception(f"ModelScope 超时({MODELSCOPE_MAX_WAIT}s")
raise Exception(f"ModelScope timeout ({max_wait}s)")
def image_generate(
prompt: str,
save_dir: str = "./generated_images",
model_name: str = None,
n: int = 1,
seed: int = None,
num_inference_steps: int = 20,
guidance_scale: float = None,
negative_prompt: str = None,
filename: str = None,
image_size: str = None,
guidance_scale: float = None,
negative_prompt: str = None,
) -> dict:
"""
统一文生图接口
"""Unified text-to-image interface.
Args:
prompt: 生成提示词
save_dir: 保存目录
model_name: 模型名称(IMAGE_MODELS 的 key),默认用 config 中的 DEFAULT_IMAGE_MODEL
image_size: 图片尺寸,默认 1280x72016:9
prompt: generation prompt
save_dir: output directory
model_name: model name (key in IMAGE_MODELS), None = default
filename: output filename, None = auto
image_size: image size, None = model default
Returns:
dict: {"url": str, "filepath": str}
{"url": str, "filepath": str}
"""
from config import DEFAULT_IMAGE_MODEL
@@ -185,31 +175,34 @@ def image_generate(
model_config = IMAGE_MODELS.get(model_name)
if not model_config:
raise ValueError(f"未知模型: {model_name},可选: {list(IMAGE_MODELS.keys())}")
raise ValueError(f"Unknown model: {model_name}, available: {list(IMAGE_MODELS.keys())}")
api_key = model_config.get("api_key", "")
if not api_key:
raise ValueError(
f"API key not configured for '{model_name}'. "
f"Edit config.py and fill in the api_key field."
)
model_id = model_config["model"]
size = image_size or model_config["default_size"]
guidance = guidance_scale if guidance_scale is not None else model_config["guidance_scale"]
neg = negative_prompt or NEGATIVE_PROMPT
provider = model_config["provider"]
api_base = model_config.get("api_base", "")
os.makedirs(save_dir, exist_ok=True)
provider = model_config["provider"]
if provider == "siliconflow":
return _generate_siliconflow(prompt, model_id, size, guidance, neg, save_dir, filename)
return _generate_siliconflow(prompt, model_id, size, guidance, neg, save_dir, filename, api_key, api_base)
elif provider == "modelscope":
return _generate_modelscope(prompt, model_id, size, guidance, neg, save_dir, filename)
return _generate_modelscope(prompt, model_id, size, guidance, neg, save_dir, filename, api_key, api_base)
else:
raise ValueError(f"未知 provider: {provider}")
def get_available_models() -> list[str]:
"""返回可用的文生图模型名称列表"""
return list(IMAGE_MODELS.keys())
raise ValueError(f"Unknown provider: {provider}")
if __name__ == "__main__":
for name in get_available_models():
print(f"\n测试模型: {name}")
result = image_generate("A cute cat sitting on a desk, 16:9 aspect ratio", model_name=name)
print(f" 路径: {result['filepath']}")
for name in list(IMAGE_MODELS.keys()):
print(f"\nTesting: {name}")
result = image_generate("A cute cat sitting on a desk, 16:9", model_name=name)
print(f" Path: {result['filepath']}")
+11
View File
@@ -0,0 +1,11 @@
PyQt6
moviepy
Pillow
numpy
requests
openai
funasr
modelscope
torch
torchaudio
mutagen
+3 -11
View File
@@ -3,23 +3,15 @@ chcp 65001 >nul
cd /d "%~dp0"
:: 清除缓存
:: clean cache
del /s /q __pycache__\*.pyc 2>nul
for /d %%d in (__pycache__) do rd /s /q "%%d" 2>nul
echo 正在激活 Videoer 环境并启动 GUI...
echo Starting VidMarmot...
call C:\ProgramData\anaconda3\Scripts\activate.bat Videoer
if errorlevel 1 (
echo [错误] 无法激活 Videoer 环境
pause
exit /b 1
)
cd /d "%~dp0"
python "%~dp0gui.py"
if errorlevel 1 (
echo.
echo [错误] 启动失败
echo [ERROR] Startup failed. Make sure Python and dependencies are installed.
pause
)
+1 -1
View File
@@ -84,7 +84,7 @@ def generate_single_scene(
return scene
def main(workspace: str = None, model_name: str = "Kolors(便宜快速)"):
def main(workspace: str = None, model_name: str = "Kolors (SiliconFlow)"):
"""主流程:生成所有 pending 场景"""
global WORKSPACE, PLAN_PATH, SCENE_IMG_DIR
+34 -44
View File
@@ -1,43 +1,44 @@
"""
text_ai.py - LLM 文本生成
用于场景划分等 AI 推理任务
支持多 LLM 提供商切换
text_ai.py - LLM text generation client.
Supports multiple providers defined in config.py.
"""
from openai import OpenAI
from config import LLM_PROVIDERS, DEFAULT_LLM, LLM_API_KEY, LLM_API_BASE, LLM_MODEL
from config import LLM_PROVIDERS
def text_ai(in_put: str, system_prompt: str = "You are a helpful assistant.",
provider: str = None) -> str:
"""
调用 LLM 生成文本
"""Call LLM to generate text.
Args:
in_put: 用户输入内容
system_prompt: 系统提示词
provider: LLM 提供商名称(对应 LLM_PROVIDERS 的 key),None 则用默认
in_put: user message
system_prompt: system prompt
provider: provider name (key in LLM_PROVIDERS), None = first in dict
Returns:
AI 生成的文本
generated text
"""
if provider and provider in LLM_PROVIDERS:
cfg = LLM_PROVIDERS[provider]
api_key = cfg["api_key"]
api_base = cfg["api_base"]
model = cfg["model"]
else:
api_key = LLM_API_KEY
api_base = LLM_API_BASE
model = LLM_MODEL
# Default to first provider in dict
cfg = next(iter(LLM_PROVIDERS.values()))
provider = next(iter(LLM_PROVIDERS))
client = OpenAI(
api_key=api_key,
base_url=api_base,
)
api_key = cfg["api_key"]
api_base = cfg["api_base"]
model = cfg["model"]
# ModelScope 的 Qwen3 系列和 GLM 系列默认开启 thinking,需要关掉
# 注意:MiniMax 系列不是 Qwen/GLM,不需要也不能传 enable_thinking
if not api_key:
raise ValueError(
f"API key not configured for '{provider}'. "
f"Edit config.py and fill in the api_key field."
)
client = OpenAI(api_key=api_key, base_url=api_base)
# ModelScope Qwen3/GLM default to thinking mode, disable it
extra_body = {}
is_modelscope = "modelscope" in api_base.lower()
is_qwen = "qwen" in model.lower()
@@ -49,38 +50,31 @@ def text_ai(in_put: str, system_prompt: str = "You are a helpful assistant.",
model=model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": in_put}
{"role": "user", "content": in_put},
],
max_tokens=16384,
stream=False,
extra_body=extra_body if extra_body else None,
extra_body=extra_body or None,
)
# 防御:choices 为空或 None
if not response.choices:
# 尝试从 response 对象提取有用信息
resp_dict = response.model_dump() if hasattr(response, "model_dump") else {}
error_msg = resp_dict.get("error", {})
if isinstance(error_msg, dict):
err_text = error_msg.get("message", str(error_msg))
else:
err_text = str(resp_dict)
err_text = error_msg.get("message", str(error_msg)) if isinstance(error_msg, dict) else str(resp_dict)
raise ValueError(
f"模型 '{model}' 返回了空的 choices\n"
f"响应内容: {err_text}\n"
f"可能是模型暂时不可用或请求被拒绝。"
f"Model '{model}' returned empty choices.\n"
f"Response: {err_text}\n"
f"Model may be unavailable or request was rejected."
)
msg = response.choices[0].message
content = msg.content
# 检测输出是否被截断
finish = response.choices[0].finish_reason
if finish == "length":
print(f"[WARN] LLM output truncated (finish_reason=length), max_tokens may be too small")
if response.choices[0].finish_reason == "length":
print(f"[WARN] LLM output truncated (finish_reason=length)")
if content is None:
# fallback:尝试多种字段名(不同 API 叫法不同)
# Fallback: try alternate field names
for attr in ("thinking_content", "reasoning_content", "text", "output"):
fallback = getattr(msg, attr, None)
if fallback:
@@ -88,7 +82,6 @@ def text_ai(in_put: str, system_prompt: str = "You are a helpful assistant.",
break
if content is None:
# 最后一搏:尝试把 message 对象当 dict 看
try:
msg_dict = msg.model_dump() if hasattr(msg, "model_dump") else vars(msg)
for v in msg_dict.values():
@@ -99,11 +92,8 @@ def text_ai(in_put: str, system_prompt: str = "You are a helpful assistant.",
pass
if content is None:
finish = response.choices[0].finish_reason
raise ValueError(
f"模型 '{model}' 返回内容为空(content=None),"
f"finish_reason={finish}\n"
f"如果使用 MiniMax 系列,请改用 Qwen3.5-35B (ModelScope 免费) 或其他 Qwen 模型。"
f"Model '{model}' returned None content (finish_reason={response.choices[0].finish_reason})."
)
return content