⏩ Magentic-UI, an experimental human-centered web agent: Collaborate with AI to complete complex web tasks using Magentic-UI, a human-centered, open-source agent system. Built by Microsoft Research, Magentic-UI blends transparency with control, enabling real-time task execution in your browser with features like co-planning, co-tasking, action guards, and plan learning. Unlike fully autonomous agents, it invites users into the process, offering oversight, adaptability, and safety as core design principles. This blog explores its capabilities, architecture, and how it supports researchers and developers in building more intuitive and responsible AI interactions on the web.
⏩ Predicting and explaining AI model performance: A new approach to evaluation. Predict and explain AI model performance before deployment using ADeLe, a new evaluation framework from Microsoft Research. This blog introduces a novel ability-based approach that rates the cognitive and knowledge demands of tasks, matches them to model capabilities, and forecasts success or failure with high accuracy. By generating detailed ability profiles across 18 scales, ADeLe not only reveals model strengths and weaknesses but also explains why performance varies, offering a powerful tool for developers, researchers, and policymakers seeking more transparent, reliable AI evaluation.
⏩ Introducing Codex: Delegate coding tasks to Codex, a cloud-based AI software engineering agent now available in ChatGPT. Powered by codex‑1 and trained on real-world coding challenges, Codex can write features, fix bugs, propose pull requests, and answer codebase questions, all in parallel cloud environments tailored to your repo. This blog introduces how Codex works, its built-in safeguards, use cases from companies like Cisco and Superhuman, and how developers can begin experimenting today. With task tracking, test logs, and customizable guidance files, Codex brings scalable, asynchronous collaboration to modern software workflows.
⏩ AI Studio to Cloud Run and Cloud Run MCP server: Deploy AI apps in seconds with Cloud Run’s new integration with Google AI Studio and MCP-compatible agents. This blog introduces streamlined tools that let you launch apps with one click from AI Studio, scale Gemma 3 models instantly on Cloud Run with GPU support, and enable AI agents to deploy via the new Cloud Run MCP server. Whether you're prototyping in Gemini, coding in VS Code, or building with agent SDKs, these updates make it easier than ever to build, deploy, and scale AI-powered applications with secure, cost-effective infrastructure.
⏩ Expanding Gemini 2.5 Flash and Pro capabilities: Build smarter, more secure AI solutions with Gemini 2.5 Flash and Pro on Vertex AI. Unveiled at Google I/O, these advanced models introduce features like thought summaries for transparency, Deep Think mode for complex reasoning, and enhanced defenses against prompt injection, making them ideal for enterprise use. Gemini 2.5 is already helping companies like Geotab, Box, and LiveRamp reduce costs, boost accuracy, and scale insights from unstructured data. With generous free credits and seamless integration on Vertex AI, it's now easier than ever to deploy powerful AI across your business.