4c38e240dc22c94ec4a6b92a9f5316f76775195c
Videoer
AI-powered video generation pipeline — 从文章到视频的一站式工具。
给定一篇英文文章(文本)和对应的朗读音频,自动完成:
文章文本 + 朗读音频 → AI 场景划分 → 逐场景生成配图 → ASR 时间对齐 → 合成视频(含字幕)
Preview
Features
- AI Scene Planning — 基于 LLM(Qwen / GLM)智能划分场景,提取角色、画面描述
- AI Image Generation — 支持 Kolors / Qwen-Image 文生图模型,逐张生成场景配图
- Interactive Review — 逐张审查、确认/重新生成场景图
- Forced Alignment — 基于 Qwen3-ForcedAligner 的语音-文本时间对齐
- Video Synthesis — MoviePy 合成最终视频,自动添加字幕
Architecture
release1/
├── gui.py # PyQt6 GUI (main entry)
├── scene_plan.py # LLM scene planning + prompt engineering
├── image_gen.py # Text-to-image API calls
├── asr.py # ASR forced alignment
├── make_video.py # Video synthesis + subtitle rendering
├── text_ai.py # Shared LLM API client
├── config.py # Model paths, API keys, defaults
├── run.bat # Windows launcher
└── qwen_download.py # One-time model download script
Workflow
1. Select workspace (folder with article.txt + voice.mp3)
2. AI Scene Planning → scene_plan.json
3. Image Generation → scene_01.png, scene_02.png, ...
4. ASR Alignment → result.json + timestamps into scene_plan
5. Video Synthesis → output_video.mp4
Quick Start
Prerequisites
- Python 3.12+
- Conda (recommended)
- NVIDIA GPU (for local ASR model)
Setup
# Create conda environment
conda create -n Videoer python=3.12 -y
conda activate Videoer
# Install dependencies
pip install PyQt6 moviepy Pillow requests openai
pip install funasr modelscope torch torchaudio
# Download ASR model
python qwen_download.py
Configuration
Edit config.py to set your API keys:
# LLM providers (scene planning)
LLM_PROVIDERS = {
"Qwen3.5-35B (ModelScope)": {
"api_key": "YOUR_KEY",
...
},
...
}
# Image generation
SILICONFLOW_API_KEY = "YOUR_KEY"
MODELSCOPE_API_KEY = "YOUR_KEY"
Tip
: ModelScope and SiliconFlow both offer free-tier API keys.
Run
# GUI mode (recommended)
python gui.py
# Or on Windows
run.bat
Workspace Structure
Each video project lives in a workspace folder:
workspace/my_project/
├── article.txt # Source article text
├── voice.mp3 # Narration audio
├── scene_plan.json # Generated scene plan (auto)
├── result.json # ASR alignment result (auto)
├── scene_01.png # Generated images (auto)
├── scene_02.png
├── ...
└── output_video.mp4 # Final output (auto)
Dependencies
| Package | Purpose |
|---|---|
| PyQt6 | GUI framework |
| moviepy | Video composition |
| Pillow | Image processing / subtitle rendering |
| requests | HTTP API calls |
| openai | Compatible LLM client (OpenAI API format) |
| funasr | ASR forced alignment |
| modelscope | Model loading |
| torch / torchaudio | GPU inference backend |
License
MIT
Description
Languages
Python
99.8%
Batchfile
0.2%
