134 lines
3.3 KiB
Markdown
134 lines
3.3 KiB
Markdown
# Videoer
|
||
|
||
**AI-powered video generation pipeline** — 从文章到视频的一站式工具。
|
||
|
||
给定一篇英文文章(文本)和对应的朗读音频,自动完成:
|
||
|
||
```
|
||
文章文本 + 朗读音频 → AI 场景划分 → 逐场景生成配图 → ASR 时间对齐 → 合成视频(含字幕)
|
||
```
|
||
|
||
## Preview
|
||
|
||

|
||
|
||
## Features
|
||
|
||
- **AI Scene Planning** — 基于 LLM(Qwen / GLM)智能划分场景,提取角色、画面描述
|
||
- **AI Image Generation** — 支持 Kolors / Qwen-Image 文生图模型,逐张生成场景配图
|
||
- **Interactive Review** — 逐张审查、确认/重新生成场景图
|
||
- **Forced Alignment** — 基于 Qwen3-ForcedAligner 的语音-文本时间对齐
|
||
- **Video Synthesis** — MoviePy 合成最终视频,自动添加字幕
|
||
|
||
## Architecture
|
||
|
||
```
|
||
release1/
|
||
├── gui.py # PyQt6 GUI (main entry)
|
||
├── scene_plan.py # LLM scene planning + prompt engineering
|
||
├── image_gen.py # Text-to-image API calls
|
||
├── asr.py # ASR forced alignment
|
||
├── make_video.py # Video synthesis + subtitle rendering
|
||
├── text_ai.py # Shared LLM API client
|
||
├── config.py # Model paths, API keys, defaults
|
||
├── run.bat # Windows launcher
|
||
└── qwen_download.py # One-time model download script
|
||
```
|
||
|
||
## Workflow
|
||
|
||
```
|
||
1. Select workspace (folder with article.txt + voice.mp3)
|
||
2. AI Scene Planning → scene_plan.json
|
||
3. Image Generation → scene_01.png, scene_02.png, ...
|
||
4. ASR Alignment → result.json + timestamps into scene_plan
|
||
5. Video Synthesis → output_video.mp4
|
||
```
|
||
|
||
## Quick Start
|
||
|
||
### Prerequisites
|
||
|
||
- Python 3.12+
|
||
- Conda (recommended)
|
||
- NVIDIA GPU (for local ASR model)
|
||
|
||
### Setup
|
||
|
||
```bash
|
||
# Create conda environment
|
||
conda create -n Videoer python=3.12 -y
|
||
conda activate Videoer
|
||
|
||
# Install dependencies
|
||
pip install PyQt6 moviepy Pillow requests openai
|
||
pip install funasr modelscope torch torchaudio
|
||
|
||
# Download ASR model
|
||
python qwen_download.py
|
||
```
|
||
|
||
### Configuration
|
||
|
||
Edit `config.py` to set your API keys:
|
||
|
||
```python
|
||
# LLM providers (scene planning)
|
||
LLM_PROVIDERS = {
|
||
"Qwen3.5-35B (ModelScope)": {
|
||
"api_key": "YOUR_KEY",
|
||
...
|
||
},
|
||
...
|
||
}
|
||
|
||
# Image generation
|
||
SILICONFLOW_API_KEY = "YOUR_KEY"
|
||
MODELSCOPE_API_KEY = "YOUR_KEY"
|
||
```
|
||
|
||
> **Tip**: ModelScope and SiliconFlow both offer free-tier API keys.
|
||
|
||
### Run
|
||
|
||
```bash
|
||
# GUI mode (recommended)
|
||
python gui.py
|
||
|
||
# Or on Windows
|
||
run.bat
|
||
```
|
||
|
||
### Workspace Structure
|
||
|
||
Each video project lives in a workspace folder:
|
||
|
||
```
|
||
workspace/my_project/
|
||
├── article.txt # Source article text
|
||
├── voice.mp3 # Narration audio
|
||
├── scene_plan.json # Generated scene plan (auto)
|
||
├── result.json # ASR alignment result (auto)
|
||
├── scene_01.png # Generated images (auto)
|
||
├── scene_02.png
|
||
├── ...
|
||
└── output_video.mp4 # Final output (auto)
|
||
```
|
||
|
||
## Dependencies
|
||
|
||
| Package | Purpose |
|
||
|---------|---------|
|
||
| PyQt6 | GUI framework |
|
||
| moviepy | Video composition |
|
||
| Pillow | Image processing / subtitle rendering |
|
||
| requests | HTTP API calls |
|
||
| openai | Compatible LLM client (OpenAI API format) |
|
||
| funasr | ASR forced alignment |
|
||
| modelscope | Model loading |
|
||
| torch / torchaudio | GPU inference backend |
|
||
|
||
## License
|
||
|
||
MIT
|