Common questions about character-consistent video generation
We use your character image as a reference for all scene generations. Nanobanana AI analyzes the character features and ensures they remain consistent across all scenes while adapting to different contexts and actions.
Yes! You can upload style reference images to control the visual aesthetic. Want anime style? Realistic? Cartoon? Simply provide a style reference image and our AI will match that style while maintaining character consistency.
VEO 3.1 built-in voice generates natural speech directly in the video generation process. Custom voice uses ElevenLabs TTS for high-quality speech generation with more voice options and better lip-sync capabilities.
Generation time depends on the number of scenes and settings. Typically, it takes 3-10 minutes for a complete multi-scene video. Individual scenes can be previewed during the process.
Yes! You can edit scene prompts, regenerate individual scenes, upload custom images, or modify voiceover text at any time before final generation.
Final videos are exported in MP4 format with 9:16 (vertical) or 16:9 (horizontal) aspect ratios, perfect for social media platforms like TikTok, Instagram Reels, and YouTube.
Background music is optional. You can generate AI music based on prompts, and we automatically mix it with your video at the right volume levels for professional results.
You can create videos with 2 to 6 scenes. Each scene has its own unique image, voiceover, and video clip that are automatically merged into a cohesive story.