DeepSeek Unveils Janus-Pro: A New AI Image Generator

DeepSeek launches Janus-Pro—an open-source AI image model competing with Dall-E 3 and Stable Diffusion. Despite mixed test results, it aims to deliver high-quality visuals, improved text rendering, and robust multimodal features.
A young woman’s face split down the middle, with one side set against a dark background and the other transitioning into a lush, green landscape. A young woman’s face split down the middle, with one side set against a dark background and the other transitioning into a lush, green landscape.
“A striking half-and-half composition merging a serene countryside backdrop with the natural beauty of a woman’s face.

DeepSeek Unveils Janus-Pro: A New AI Image Generator

DeepSeek has garnered global attention in recent days for its open-source R-1 model, a more affordable alternative to OpenAI’s o1. Even before the excitement around R-1 has waned, the Chinese startup has introduced yet another open-source AI image model, Janus-Pro. According to DeepSeek, this new system surpasses OpenAI’s Dall-E 3 and Stability AI’s Stable Diffusion in several benchmark tests. The question is whether Janus-Pro truly lives up to these claims or if it is simply another AI model riding the hype wave.


What Is Janus-Pro?

Janus-Pro can both understand and generate images from text prompts. Built as an enhanced version of the original Janus model, Janus-Pro incorporates improved training methods, a larger dataset, and a more extensive architecture. Notably, it produces more stable outputs in response to short prompts and claims to deliver higher visual quality, greater detail, and limited text-generation capabilities within images.

Demonstrating Performance

  • Prompt: “The face of a beautiful girl”
    • Comparison images published by DeepSeek suggest that Janus-Pro 7B yields more convincing facial features than the older Janus release.
  • Prompt: “A clear image of a blackboard with a clean, dark green surface and the word ‘Hello’ written precisely and legibly in the center with bold, white chalk letters.”
    • The Janus-Pro version appears to handle text within images more effectively than its predecessor, although it may still face limitations.

Janus-Pro is available in two sizes—1 billion and 7 billion parameters—both generating images at a 384×384 resolution. Commercial users can access it under a permissive license.

Advertisement


Technical Overview

Janus-Pro distinguishes between multimodal understanding (analyzing images) and visual generation (creating images), aiming to prevent conflicts between these two tasks.

  1. Multimodal Understanding
    • SigLIP Encoder: Extracts high-dimensional semantic features from images.
    • Understanding Adaptor: Maps these semantic features to the large language model’s (LLM) input space.
  2. Visual Generation
    • VQ Tokenizer: Converts images into discrete IDs.
    • Generation Adaptor: Translates those token IDs back into the LLM’s input space for final image creation.

Benchmark Scores

  • GenEval: Janus-Pro 7B reportedly scores 0.80, outperforming Dall-E 3 and Stable Diffusion 3 Medium.
  • DPG-Bench: Achieves 84.19, surpassing other methods and indicating strong capability in following dense text-to-image instructions.

How Does Janus-Pro Compare to Dall-E 3 or Stable Diffusion?

DeepSeek’s internal benchmarks suggest that Janus-Pro outperforms Dall-E 3 and Stable Diffusion. However, sample side-by-side comparisons often show that Dall-E 3 produces more accurate faces, body proportions, and text in images:

  • Prompt: “A photo of a herd of red sheep on a green field.”
    • The Dall-E 3 output appeared more coherent than the Janus-Pro image.
  • Prompt: “A beautiful 35 year old woman of average build wearing a pink tulle dress sits on the ground in front of the Eiffel Tower…”
    • Janus-Pro struggled with proportions, whereas Dall-E 3 displayed more precise visual details.
  • Prompt: “An image of a little boy holding a white board with the text ‘AI is awesome!’”
    • Dall-E 3 produced clearer text, while Janus-Pro’s letters were somewhat distorted.

It is possible that specific fine-tuning or parameters might improve Janus-Pro’s outputs. By default settings, however, Dall-E 3 often seems to provide more polished results.

For those seeking a superior AI image generator, the Flux Pro 1.1 Ultra within Flux Labs AI is frequently cited as among the best. This open-weight model allows custom fine-tuning on user-provided images.


Getting Started with Janus-Pro

DeepSeek has made Janus models freely available on HuggingFace, supporting broader academic and commercial research:

  • Janus-1.3B
  • JanusFlow-1.3B
  • Janus-Pro-1B
  • Janus-Pro-7B

Note that Janus-Pro 7B uses nearly 15GB of memory. For those not wishing to run the model locally, a Gradio demo is provided on HuggingFace, enabling text-to-image and image captioning directly in the browser.

Example: Multimodal Understanding

Users can upload an image and prompt Janus-Pro to explain it. For instance, providing a “buff Doge vs. Cheems” meme yields a breakdown of how each Doge represents advanced or simple visual encoding approaches, respectively. This feature has potential for auto-captioning or generating alternative text.


Sample Code Snippet

DeepSeek offers an inference script to generate images from text. The process involves:

  1. Loading the Janus-Pro-7B model into memory.
  2. Encoding text prompts using VLChatProcessor.
  3. Storing output tokens for each generated image.
  4. Decoding tokens into a final 384×384 image.

Users can adapt this script for custom workflows or integrate Janus-Pro into existing pipelines.


Final Thoughts

Although DeepSeek promotes Janus-Pro as a competitor to Dall-E 3, real-world testing indicates that Janus-Pro may lag behind in generating consistently high-quality images. Its 384×384 resolution and associated reconstruction losses can result in outputs with less detail than some might expect. Nonetheless, Janus-Pro’s open-source availability underscores DeepSeek’s intent to innovate and drive competition in the AI image arena. As the company continues refining its technology, its commitment to accessible, open development could pose a disruptive force in the broader marketplace.

Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Advertisement