Gate News message, April 23 — Google researchers, including He Kaiming and Xie Saining, published a paper introducing Vision Banana, a general-purpose vision understanding model created through lightweight instruction fine-tuning of the company’s Nano Banana Pro (Gemini 3 Pro Image) image generation model. The key innovation unifies outputs of all vision tasks as RGB images, enabling segmentation, depth estimation, and surface normal prediction through image generation without task-specific architectures or loss functions.
In semantic segmentation, Vision Banana outperformed the specialized model SAM 3 by 4.7 percentage points on Cityscapes; in referring expression segmentation, it surpassed SAM 3 Agent. However, it lagged behind SAM 3 in instance segmentation. For 3D tasks, metric depth estimation achieved 0.929 average accuracy across four standard datasets, exceeding Depth Anything V3’s 0.918, using only synthetic data without real depth information or camera parameters at inference. Surface normal estimation achieved state-of-the-art results on three indoor benchmarks.
Fine-tuning involved minimal vision task data mixed into original image generation training, preserving the model’s generation capabilities—performance matched the original Nano Banana Pro in generation quality tests. The paper proposes that image generation pretraining in vision parallels text generation pretraining in language: models learn the internal representations needed for image understanding during generation, with instruction fine-tuning merely releasing this capability.
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
Top law firms charge more than $2,000 per hour; court documents were exposed for “AI hallucinations and a string of errors.”
A court document filed by top U.S. law firm Sullivan & Cromwell in a bankruptcy case in Manhattan contained about thirty instances of AI-generated errors, false case citations, and fabricated provisions, prompting an apology to the judge. Despite the high hourly rates and internal training policies, the review was not actually implemented during preparation, and the incident has once again sparked debate over the use of AI in the legal profession and ethical responsibility.
ChainNewsAbmedia3m ago
DeepSeek Open-Sources TileKernels, GPU Kernel Library for Large Model Training and Inference
Gate News message, April 23 — DeepSeek has open-sourced TileKernels under the MIT license, a GPU kernel library written in TileLang for large language model training and inference. TileLang is a domain-specific language developed by the tile-ai team for expressing high-performance GPU kernels in
GateNews12m ago
Samsung SDS Expands Google Cloud Partnership to Serve Regulated Sectors with AI and Security Services
Gate News message, April 23 — Samsung SDS expanded its partnership with Google Cloud to deliver AI, cloud computing, and security services to regulated industries including government and financial services.
The companies will deploy Google Distributed Cloud for customers requiring data
GateNews35m ago
Sullivan & Cromwell Apologizes for AI Hallucinations in Court Filing with 40 Erroneous Citations
Gate News message, April 23 — Sullivan & Cromwell, a major Wall Street law firm, apologized to a federal judge after submitting a court filing containing approximately 40 incorrect citations and other errors caused by AI hallucinations. Andrew Dietderich, co-head of the firm's global restructuring t
GateNews51m ago
Tencent Releases and Open-Sources Hunyuan Hy3 Preview with 295B Parameters
Gate News message, April 23 — Tencent unveiled and open-sourced Hunyuan Hy3 preview, a hybrid mixture-of-experts language model featuring fast and slow thinking fusion. The model comprises 295 billion total parameters with 21 billion active parameters, supporting a maximum context length of 256K
GateNews1h ago
South Korea, Vietnam Sign 70+ MOUs on AI, Energy, and Data Infrastructure
Gate News message, April 23 — South Korea and Vietnam signed more than 70 memoranda of understanding (MOUs) during President Lee Jae Myung's state visit to Hanoi on April 23, covering AI, energy, infrastructure, and telecommunications. A business forum attended by over 500 executives discussed AI an
GateNews1h ago