Initial commit: V1

2026-04-25 12:50:36 +08:00
commit 4c38e240dc
12 changed files with 3746 additions and 0 deletions
@@ -0,0 +1,133 @@
+# Videoer
+
+**AI-powered video generation pipeline** — 从文章到视频的一站式工具。
+
+给定一篇英文文章（文本）和对应的朗读音频，自动完成：
+
+```
+文章文本 + 朗读音频 → AI 场景划分 → 逐场景生成配图 → ASR 时间对齐 → 合成视频（含字幕）
+```
+
+## Preview
+
+![Pipeline Overview](docs/pipeline.png)
+
+## Features
+
+- **AI Scene Planning** — 基于 LLM（Qwen / GLM）智能划分场景，提取角色、画面描述
+- **AI Image Generation** — 支持 Kolors / Qwen-Image 文生图模型，逐张生成场景配图
+- **Interactive Review** — 逐张审查、确认/重新生成场景图
+- **Forced Alignment** — 基于 Qwen3-ForcedAligner 的语音-文本时间对齐
+- **Video Synthesis** — MoviePy 合成最终视频，自动添加字幕
+
+## Architecture
+
+```
+release1/
+├── gui.py            # PyQt6 GUI (main entry)
+├── scene_plan.py     # LLM scene planning + prompt engineering
+├── image_gen.py      # Text-to-image API calls
+├── asr.py            # ASR forced alignment
+├── make_video.py     # Video synthesis + subtitle rendering
+├── text_ai.py        # Shared LLM API client
+├── config.py         # Model paths, API keys, defaults
+├── run.bat           # Windows launcher
+└── qwen_download.py  # One-time model download script
+```
+
+## Workflow
+
+```
+1. Select workspace (folder with article.txt + voice.mp3)
+2. AI Scene Planning   → scene_plan.json
+3. Image Generation    → scene_01.png, scene_02.png, ...
+4. ASR Alignment       → result.json + timestamps into scene_plan
+5. Video Synthesis     → output_video.mp4
+```
+
+## Quick Start
+
+### Prerequisites
+
+- Python 3.12+
+- Conda (recommended)
+- NVIDIA GPU (for local ASR model)
+
+### Setup
+
+```bash
+# Create conda environment
+conda create -n Videoer python=3.12 -y
+conda activate Videoer
+
+# Install dependencies
+pip install PyQt6 moviepy Pillow requests openai
+pip install funasr modelscope torch torchaudio
+
+# Download ASR model
+python qwen_download.py
+```
+
+### Configuration
+
+Edit `config.py` to set your API keys:
+
+```python
+# LLM providers (scene planning)
+LLM_PROVIDERS = {
+    "Qwen3.5-35B (ModelScope)": {
+        "api_key": "YOUR_KEY",
+        ...
+    },
+    ...
+}
+
+# Image generation
+SILICONFLOW_API_KEY = "YOUR_KEY"
+MODELSCOPE_API_KEY = "YOUR_KEY"
+```
+
+> **Tip**: ModelScope and SiliconFlow both offer free-tier API keys.
+
+### Run
+
+```bash
+# GUI mode (recommended)
+python gui.py
+
+# Or on Windows
+run.bat
+```
+
+### Workspace Structure
+
+Each video project lives in a workspace folder:
+
+```
+workspace/my_project/
+├── article.txt          # Source article text
+├── voice.mp3            # Narration audio
+├── scene_plan.json      # Generated scene plan (auto)
+├── result.json          # ASR alignment result (auto)
+├── scene_01.png         # Generated images (auto)
+├── scene_02.png
+├── ...
+└── output_video.mp4     # Final output (auto)
+```
+
+## Dependencies
+
+| Package | Purpose |
+|---------|---------|
+| PyQt6 | GUI framework |
+| moviepy | Video composition |
+| Pillow | Image processing / subtitle rendering |
+| requests | HTTP API calls |
+| openai | Compatible LLM client (OpenAI API format) |
+| funasr | ASR forced alignment |
+| modelscope | Model loading |
+| torch / torchaudio | GPU inference backend |
+
+## License
+
+MIT