Search results for "MODEL"
Today
04:54

Perplexity Discloses Web Search Agent Post-Training Method; Qwen3.5-Based Model Outperforms GPT-5.4 on Accuracy and Cost

Perplexity uses SFT followed by RL with Qwen3.5 models, leveraging a multi-hop QA dataset and rubric checks to boost search accuracy and efficiency, achieving best-in-class FRAMES performance. Abstract: Perplexity's post-training workflow for web-search agents combines supervised fine-tuning (SFT) to enforce instruction-following and language consistency with online reinforcement learning (RL) via the GRPO algorithm. The RL stage uses a proprietary multi-hop verifiable QA dataset and rubric-based conversational data to prevent SFT drift, with reward gating and within-group efficiency penalties. Evaluation shows Qwen3.5-397B-SFT-RL achieving top FRAMES performance, 57.3% accuracy with a single tool call and 73.9% with four calls at $0.02 per query, outperforming GPT-5.4 and Claude Sonnet 4.6 on these metrics. Pricing is API-based and excludes caching.
More
15:31

OpenAI Releases Open-Source Privacy Filter Model for PII Detection and Redaction

Abstract: OpenAI's Privacy Filter is an open-source, locally executable model that detects and redacts PII in text. It supports large contexts, identifies many PII categories, and is intended for privacy-preserving workflows such as data preparation, indexing, logging, and moderation. OpenAI's Privacy Filter is a locally run, open-source model (128k-token context) that detects and redacts PII in text, covering contact, financial, and credential data for privacy workflows.
More
12:05

Kimi K2.6 Tops OpenRouter Programming Benchmark, Outperforms Claude and GPT Series

Kimi K2.6 tops OpenRouter leaderboard, outperforming Claude, GPT, and open-source rivals, signaling domestic AI progress and narrowing the gap with global leaders. Abstract: Kimi.ai announced that its latest model, Kimi K2.6, ranked first on the OpenRouter programming-ability leaderboard, leading developer evaluations. Benchmarks indicate K2.6 delivers superior performance across programming tasks relative to Claude, GPT-series, and other open-source models, highlighting gains in code generation and development-task handling and signaling progress for domestic AI toward international leaders.
More
09:53

Sam Altman Details Failed Negotiations with Elon Musk Over OpenAI Control, Lawsuit Set for April 27

Altman on Core Memory recounts failed OpenAI governance talks with Elon Musk: stages of compromise toward a for-profit model, Musk's demands for majority stake and CEO control, Altman opposing absolute power, with trial looming. Abstract: Sam Altman details, on Core Memory, failed negotiations with Elon Musk over OpenAI governance, outlining moves toward a for-profit model, Musk's demands for majority stake and CEO authority, and Altman's rejection of absolute control; the litigation looming with a trial set for April 27.
More