ljz/video

Fork 0

T

theliu 4c38e240dc Initial commit: V1

2026-04-25 12:50:36 +08:00

.gitignore

Initial commit: V1

2026-04-25 12:50:36 +08:00

asr.py

Initial commit: V1

2026-04-25 12:50:36 +08:00

config.py

Initial commit: V1

2026-04-25 12:50:36 +08:00

gui.py

Initial commit: V1

2026-04-25 12:50:36 +08:00

image_gen.py

Initial commit: V1

2026-04-25 12:50:36 +08:00

make_video.py

Initial commit: V1

2026-04-25 12:50:36 +08:00

qwen_download.py

Initial commit: V1

2026-04-25 12:50:36 +08:00

README.md

Initial commit: V1

2026-04-25 12:50:36 +08:00

run.bat

Initial commit: V1

2026-04-25 12:50:36 +08:00

scene_generate.py

Initial commit: V1

2026-04-25 12:50:36 +08:00

scene_plan.py

Initial commit: V1

2026-04-25 12:50:36 +08:00

text_ai.py

Initial commit: V1

2026-04-25 12:50:36 +08:00

README.md

Videoer

AI-powered video generation pipeline — 从文章到视频的一站式工具。

给定一篇英文文章（文本）和对应的朗读音频，自动完成：

文章文本 + 朗读音频 → AI 场景划分 → 逐场景生成配图 → ASR 时间对齐 → 合成视频（含字幕）

Preview

Features

AI Scene Planning — 基于 LLM（Qwen / GLM）智能划分场景，提取角色、画面描述
AI Image Generation — 支持 Kolors / Qwen-Image 文生图模型，逐张生成场景配图
Interactive Review — 逐张审查、确认/重新生成场景图
Forced Alignment — 基于 Qwen3-ForcedAligner 的语音-文本时间对齐
Video Synthesis — MoviePy 合成最终视频，自动添加字幕

Architecture

release1/
├── gui.py            # PyQt6 GUI (main entry)
├── scene_plan.py     # LLM scene planning + prompt engineering
├── image_gen.py      # Text-to-image API calls
├── asr.py            # ASR forced alignment
├── make_video.py     # Video synthesis + subtitle rendering
├── text_ai.py        # Shared LLM API client
├── config.py         # Model paths, API keys, defaults
├── run.bat           # Windows launcher
└── qwen_download.py  # One-time model download script

Workflow

1. Select workspace (folder with article.txt + voice.mp3)
2. AI Scene Planning   → scene_plan.json
3. Image Generation    → scene_01.png, scene_02.png, ...
4. ASR Alignment       → result.json + timestamps into scene_plan
5. Video Synthesis     → output_video.mp4

Quick Start

Prerequisites

Python 3.12+
Conda (recommended)
NVIDIA GPU (for local ASR model)

Setup

# Create conda environment
conda create -n Videoer python=3.12 -y
conda activate Videoer

# Install dependencies
pip install PyQt6 moviepy Pillow requests openai
pip install funasr modelscope torch torchaudio

# Download ASR model
python qwen_download.py

Configuration

Edit config.py to set your API keys:

# LLM providers (scene planning)
LLM_PROVIDERS = {
    "Qwen3.5-35B (ModelScope)": {
        "api_key": "YOUR_KEY",
        ...
    },
    ...
}

# Image generation
SILICONFLOW_API_KEY = "YOUR_KEY"
MODELSCOPE_API_KEY = "YOUR_KEY"

Tip

: ModelScope and SiliconFlow both offer free-tier API keys.

Run

# GUI mode (recommended)
python gui.py

# Or on Windows
run.bat

Workspace Structure

Each video project lives in a workspace folder:

workspace/my_project/
├── article.txt          # Source article text
├── voice.mp3            # Narration audio
├── scene_plan.json      # Generated scene plan (auto)
├── result.json          # ASR alignment result (auto)
├── scene_01.png         # Generated images (auto)
├── scene_02.png
├── ...
└── output_video.mp4     # Final output (auto)

Dependencies

Package	Purpose
PyQt6	GUI framework
moviepy	Video composition
Pillow	Image processing / subtitle rendering
requests	HTTP API calls
openai	Compatible LLM client (OpenAI API format)
funasr	ASR forced alignment
modelscope	Model loading
torch / torchaudio	GPU inference backend

License

MIT