Files
video/README.md
T
2026-04-25 12:50:36 +08:00

134 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Videoer
**AI-powered video generation pipeline** — 从文章到视频的一站式工具。
给定一篇英文文章(文本)和对应的朗读音频,自动完成:
```
文章文本 + 朗读音频 → AI 场景划分 → 逐场景生成配图 → ASR 时间对齐 → 合成视频(含字幕)
```
## Preview
![Pipeline Overview](docs/pipeline.png)
## Features
- **AI Scene Planning** — 基于 LLMQwen / GLM)智能划分场景,提取角色、画面描述
- **AI Image Generation** — 支持 Kolors / Qwen-Image 文生图模型,逐张生成场景配图
- **Interactive Review** — 逐张审查、确认/重新生成场景图
- **Forced Alignment** — 基于 Qwen3-ForcedAligner 的语音-文本时间对齐
- **Video Synthesis** — MoviePy 合成最终视频,自动添加字幕
## Architecture
```
release1/
├── gui.py # PyQt6 GUI (main entry)
├── scene_plan.py # LLM scene planning + prompt engineering
├── image_gen.py # Text-to-image API calls
├── asr.py # ASR forced alignment
├── make_video.py # Video synthesis + subtitle rendering
├── text_ai.py # Shared LLM API client
├── config.py # Model paths, API keys, defaults
├── run.bat # Windows launcher
└── qwen_download.py # One-time model download script
```
## Workflow
```
1. Select workspace (folder with article.txt + voice.mp3)
2. AI Scene Planning → scene_plan.json
3. Image Generation → scene_01.png, scene_02.png, ...
4. ASR Alignment → result.json + timestamps into scene_plan
5. Video Synthesis → output_video.mp4
```
## Quick Start
### Prerequisites
- Python 3.12+
- Conda (recommended)
- NVIDIA GPU (for local ASR model)
### Setup
```bash
# Create conda environment
conda create -n Videoer python=3.12 -y
conda activate Videoer
# Install dependencies
pip install PyQt6 moviepy Pillow requests openai
pip install funasr modelscope torch torchaudio
# Download ASR model
python qwen_download.py
```
### Configuration
Edit `config.py` to set your API keys:
```python
# LLM providers (scene planning)
LLM_PROVIDERS = {
"Qwen3.5-35B (ModelScope)": {
"api_key": "YOUR_KEY",
...
},
...
}
# Image generation
SILICONFLOW_API_KEY = "YOUR_KEY"
MODELSCOPE_API_KEY = "YOUR_KEY"
```
> **Tip**: ModelScope and SiliconFlow both offer free-tier API keys.
### Run
```bash
# GUI mode (recommended)
python gui.py
# Or on Windows
run.bat
```
### Workspace Structure
Each video project lives in a workspace folder:
```
workspace/my_project/
├── article.txt # Source article text
├── voice.mp3 # Narration audio
├── scene_plan.json # Generated scene plan (auto)
├── result.json # ASR alignment result (auto)
├── scene_01.png # Generated images (auto)
├── scene_02.png
├── ...
└── output_video.mp4 # Final output (auto)
```
## Dependencies
| Package | Purpose |
|---------|---------|
| PyQt6 | GUI framework |
| moviepy | Video composition |
| Pillow | Image processing / subtitle rendering |
| requests | HTTP API calls |
| openai | Compatible LLM client (OpenAI API format) |
| funasr | ASR forced alignment |
| modelscope | Model loading |
| torch / torchaudio | GPU inference backend |
## License
MIT