Initial commit: V1

This commit is contained in:
theliu
2026-04-25 12:50:36 +08:00
commit 4c38e240dc
12 changed files with 3746 additions and 0 deletions
+133
View File
@@ -0,0 +1,133 @@
# Videoer
**AI-powered video generation pipeline** — 从文章到视频的一站式工具。
给定一篇英文文章(文本)和对应的朗读音频,自动完成:
```
文章文本 + 朗读音频 → AI 场景划分 → 逐场景生成配图 → ASR 时间对齐 → 合成视频(含字幕)
```
## Preview
![Pipeline Overview](docs/pipeline.png)
## Features
- **AI Scene Planning** — 基于 LLMQwen / GLM)智能划分场景,提取角色、画面描述
- **AI Image Generation** — 支持 Kolors / Qwen-Image 文生图模型,逐张生成场景配图
- **Interactive Review** — 逐张审查、确认/重新生成场景图
- **Forced Alignment** — 基于 Qwen3-ForcedAligner 的语音-文本时间对齐
- **Video Synthesis** — MoviePy 合成最终视频,自动添加字幕
## Architecture
```
release1/
├── gui.py # PyQt6 GUI (main entry)
├── scene_plan.py # LLM scene planning + prompt engineering
├── image_gen.py # Text-to-image API calls
├── asr.py # ASR forced alignment
├── make_video.py # Video synthesis + subtitle rendering
├── text_ai.py # Shared LLM API client
├── config.py # Model paths, API keys, defaults
├── run.bat # Windows launcher
└── qwen_download.py # One-time model download script
```
## Workflow
```
1. Select workspace (folder with article.txt + voice.mp3)
2. AI Scene Planning → scene_plan.json
3. Image Generation → scene_01.png, scene_02.png, ...
4. ASR Alignment → result.json + timestamps into scene_plan
5. Video Synthesis → output_video.mp4
```
## Quick Start
### Prerequisites
- Python 3.12+
- Conda (recommended)
- NVIDIA GPU (for local ASR model)
### Setup
```bash
# Create conda environment
conda create -n Videoer python=3.12 -y
conda activate Videoer
# Install dependencies
pip install PyQt6 moviepy Pillow requests openai
pip install funasr modelscope torch torchaudio
# Download ASR model
python qwen_download.py
```
### Configuration
Edit `config.py` to set your API keys:
```python
# LLM providers (scene planning)
LLM_PROVIDERS = {
"Qwen3.5-35B (ModelScope)": {
"api_key": "YOUR_KEY",
...
},
...
}
# Image generation
SILICONFLOW_API_KEY = "YOUR_KEY"
MODELSCOPE_API_KEY = "YOUR_KEY"
```
> **Tip**: ModelScope and SiliconFlow both offer free-tier API keys.
### Run
```bash
# GUI mode (recommended)
python gui.py
# Or on Windows
run.bat
```
### Workspace Structure
Each video project lives in a workspace folder:
```
workspace/my_project/
├── article.txt # Source article text
├── voice.mp3 # Narration audio
├── scene_plan.json # Generated scene plan (auto)
├── result.json # ASR alignment result (auto)
├── scene_01.png # Generated images (auto)
├── scene_02.png
├── ...
└── output_video.mp4 # Final output (auto)
```
## Dependencies
| Package | Purpose |
|---------|---------|
| PyQt6 | GUI framework |
| moviepy | Video composition |
| Pillow | Image processing / subtitle rendering |
| requests | HTTP API calls |
| openai | Compatible LLM client (OpenAI API format) |
| funasr | ASR forced alignment |
| modelscope | Model loading |
| torch / torchaudio | GPU inference backend |
## License
MIT