DeepSeek open sources 5 repos for AGI, Helix and Engine AI’s humanoids gain more power, Agents in acAI_Distilled #84: Your AI News Fix!You can now train your own Reasoning model like DeepSeek-R1 locally with just 5GB VRAM. Unsloth is fully open-source and allows you to transform any open LLM like Llama 3.1 (8B) or Phi-4 (14B) into a reasoning model.GitHub repo: https://github.com/unslothai/unslothDeepSeek’s R1 research revealed an “aha moment” where R1-Zero autonomously learned to allocate more thinking time without human feedback by using Group Relative Policy Optimization (GRPO). Unsloth enhanced the entire GRPO process, making it use 90% less VRAM than all other implementations. This allows you to reproduce R1-Zero's "aha moment" on just 5GB of VRAM using Qwen2.5 (1.5B).Try Unsloth's free GRPO notebook with a free 16GB GPU: Llama 3.1 (8B) on ColabFor a Tutorial and GRPO notebooks featuring other models like Phi-4, visit Unsloth's docsIt looks like the AI giants are battling it out, with announcements on new models, Gen-AI capabilities for their flagship products, and research breakthroughs. But don’t you worry, we’ve got you. Here is your weekly digest!LLM Expert Insights Team,Packt📰 NewsDeepSeek open sources five repos for AGI in its OpenSourceWeekIn its OpenSource week, DeepSeek is making available five repos that form the building blocks of their online service. These repos include FlashMLA (efficient MLA decoding kernel for Hooper GPUs), DeepEP (EP communication library for MoE model training and inference), DeepGEMM (FP8 library supporting dense and MoE GEMMs), and DualPipe (a bidirectional parallelism algorithm), and Fire-Flyer File System (a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks).Microsoft’s next generation of Phi-4 modelMicrosoft introduced Phi-4-multimodal and Phi-4-mini, the latest additions to Microsoft's Phi family of small language models (SLMs). Phi-4-multimodal handles speech, vision, and text concurrently, while Phi-4-mini is proficient in text-based tasks. Phi-4-multimodal is a 5.6B parameter model, and Phi-4-mini is a 3.8B parameter model. Both models are suitable for compute-constrained inference environments.Google announces public preview of Gemini Code AssistGoogle has made Gemini Code Assist available to individual developers for free public preview, with a liberal token window of 128K. This AI-coding assistant offers code completion, generation, and chat features in Visual Studio Code and JetBrains IDEs, similar to thosealready available in Firebase and Android Studio. And guess what, you have about 180,000 code completions every month! Insane! isn’t it? A similar tool, Gemini Code Assist for GitHub, is also available, providing AI-powered code reviews.Amazon introduces Gen-AI infused Alexa – Alexa+Amazon introduced Gen-AI-powered Alexa+ this week. It features agentic capabilities and is designed to be smarter than the original Alexa, with LLMs powering up its knowledge base. Designed to take actions, it can remember your specific needs and requirements, making your experiences more useful and personalized. Available on Echo devices, a new mobile app, and Alexa.com, it costs $19.99 per month but is free for Prime members.Claude’s 3.7 Sonnet Hybrid reasoning with extended thinking and Claude CodeAnthropic has announced Claude 3.7 Sonnet with hybrid reasoning capabilities. Users can now toggle between fast responses and extended thinking modes, with a budget of up to 128K tokens. Unlike other reasoning models, Claude is more focused on the real-world business applications of LLMs, rather than math and computer science competition tasks. Anthropic also introduced Claude Code, a command-line collaborative tool for agentic coding, currently available as a limited research preview.Alibaba’s open-sources thinking model QwQ-Max-PreviewAlibaba, through an announcement blog post created by QWQ-Max-Preview, unveiled the newest model in the Qwen series: QwQ-Max-Preview. It is built upon Qwen2.5-Max and excels in mathematics, coding, general tasks, and agentic workflows. The post also mentions future plans, which include the development of a dedicated app for Qwen Chat and smaller QwQ variants for local device deployment.Comet an agentic search browser by PerplexityPerplexity announced its agentic browser Comet in an X post. Built on the Chromium framework, Comet will integrate search and automate related tasks. It will also integrate deep research and real-time information processing. You can join the waitlist here.Perplexity also announced voice mode for its iOS app. Voice mode is expected to be shipped for Android and Mac apps in the coming days.Microsoft cancelling U.S. data center leases amid CEO Satya Nadella’s concerns about AGI milestonesA TD Cowen report states that Microsoft has pulled the plug on 200MW leases for at least two private data centers, withdrawn from around 500 leases, and reallocated a sizeable portion of its international spend to the US. In another development, CEO Satya Nadella, shared his thoughts on AGI hype. He opined that self-proclamation of AGI is useless and the true revolution, the real benchmark will be when we see growth in the GDP. “It can’t be just supply side,..,when the productivity goes up, and the economy is growing at a faster rate. When that happens… that’s to me is the moment,” he said.Alibaba to invest RMB 280 billion in AI and cloud computing infrastructureAlibaba plans to invest USD 53 billion over the next three years to scale up AI capabilities and cloud infrastructure, providing businesses with tools for innovation. CEO Eddie Wu sees AI as a "once-in-a-generation" opportunity. Cloud computing is Alibaba's main revenue driver in AI, with high demand for AI hosting services. Alibaba is integrating AI across its ecosystem to improve customer experiences, optimize business operations, and drive long-term growth.Apple makes $500 billion commitment to US’s future – Tim Cook, CEO, AppleApple plans to invest over $500 billion in the U.S. in the next four years, focusing on investments in AI, silicon engineering, manufacturing, and skills development. A new manufacturing facility will be opened in Houston for Apple Intelligence servers and the U.S. Advanced Manufacturing Fund will be doubled to $10 billion. A manufacturing academy will be established in Michigan, and R&D investments will expand across the U.S., creating about 20,000 jobs. Apple continues to support educational programs for hardware engineering and silicon chip design.SamA announces two new features for ChatGPT Plus and free usersOpenAI released research preview for GPT 4.5 this week to understand its strenght and limitations.In his X posts, OpenAI CEO, Sam Altman, announced DeepResearch for ChatGPT Plus users and Advanced Voice for GPT-4o mini.In another development, The Information reported that OpenAI plans to shift 75% of its data center capacity to StarGate, financed by SoftBank. This transition from Microsoft-owned data centers is expected to occur over the next five years.Meta for Education, a new mixed and virtual reality (VR/MR) offering, is now generally available. It provides educators with Meta Horizon-managed solutions, aimed at enhancing student engagement and knowledge retention through interactive VR/MR experiences.💻 Awesome AI: Tools for WorkAlibaba releases wan 2.1 family of video modelswan2.1 presents two versions of video generation models: a lightweight 1.3 billion parameter model suitable for laptops, and a robust 14 billion parameter model for higher performance. wan2.1 handles both text-to-video and image-to-video generation, providing resolution choices of 720p or 480p. It can simulate complex motion, capture intricate details, and generate multilingual text effects.Pika announces Pika 2.2, PikaFrames, andPikaswaps on XPikaswaps allows users to modify and replace objects in videos using video inpainting. It enables the swapping, erasing, and altering of objects while maintaining realistic visual consistency. Features include a brush tool, reference image uploads, and options to re-prompt or retry.Engine AI’s humanoid can perform complete front flipEngineAI has unveiled the world's first humanoid robot capable of performing a front flip. This achievement marks a significant advancement in humanoid robotics, showcasing improved agility and control. The robot's ability to execute complex acrobatic movements demonstrates advancements in AI-driven motion planning and real-time control systems.Grok3 voiceIn his X post, CEO, Elon Musk announced that xAI’s Grok3 has enabled conversation mode for Premium and SuperGrok users..Helix – A vision language action modelFigure AI’s Helix model is designed to bring humanoid robots into homes. It blends computer vision, language comprehension, and real-time motor control. Helix can adapt on the go, learn quickly with minimal training data, control multiple robots simultaneously, and handle thousands of household items. It runs on embedded low-power GPUs And can pick up virtually any small household object by voice command. 🛠️ HackhubMagma: A foundation model for multimodal AI agents across digital and physical worlds - Microsoft ResearchMicrosoft Research has introduced Magma, a foundation model for multimodal AI agents, to bridge the digital and physical worlds. Magma integrates diverse sensor data—such as vision, audio, and depth—enabling agents to perceive and interact with complex environments. It supports a wide range of tasks, from simple object recognition to intricate navigation and manipulation. It can create adaptable agents that can learn and generalize across various scenarios, enhancing robotics, AR/VR, and human–computer interaction.Meta’s ML GymMLGym is an open-source framework and benchmark designed to accelerate AI agent research. It aims to simplify the development, evaluation, and comparison of AI agents across diverse environments. By offering a standardized platform for researchers to conduct experiments, share results, and collaborate, MLGym will enable more efficient and reproducible research.PaliGemma 2 - New Instruction Vision Language Models by GooglePaliGemma2-Mix is a vision-language model based on the Gemma language model and SigLIP vision model. Optimized for efficiency and performance, the model is available on Hugging Face. It's designed for tasks requiring visual understanding and language generation, such as image captioning and visual question answering. The "mix" version provides a blend of pre-training and fine-tuning, offering a versatile and robust model.⚙️TechhubGibber link – AI Agent communication protocolGibber Link is an agent communication protocol that proposes the use of sound-level protocols instead of speech for efficient communication. This reduces compute costs by 90%, speeds up data transfer by 80%, and minimizes errors. The protocol automatically switches from speech to sound upon detecting another AI agent, enhancing clarity and enabling multimodal data exchange.Meta MotivoMeta Motivo is a tool by Meta Demolab that can be used for creating 3D character animations from audio inputs. It uses audio-driven motion generation and analyzes speech patterns to produce realistic facial expressions and body movements. Motivo employs a neural network trained on a large dataset of speech and motion capture data, enabling it to synthesize animations that synchronize with the audio.Introducing the SWE-Lancer benchmark | OpenAIOpen AI’s SWE-Lancer is a benchmark of over 1,400 freelance software engineering tasks from Upwork valued at $1 million. It features bug fixes, feature implementations, and managerial tasks graded by experienced engineers. Designed to study the economic impact of AI models, SWE-Lancer offers a unified Docker image and the open-sourced SWE-Lancer Diamond for future research.🧠MasterclassGenerative Ghosts: Anticipating benefits and risks of AI afterlives - Google DeepMindGoogle DeepMind is working on "generative ghosts," AI agents representing deceased individuals, which are becoming increasingly common due to advances in generative AI. The research work explores design of these agents, considering factors like provenance, embodiment, and representee type. This paper also investigates inner AI misalignment, focusing on how training steering signals can cause harmful behaviors. It introduces “evil steering,” where innocuous steering creates aligned-but-malevolent agents, even with proper reward design for helpfulness. Grid world experiments demonstrate that steering during learning can cause negative outcomes despite well-designed rewards. Latent space analysis reveals “evil steering” mechanisms.Findings emphasize carefully considering steering, not just rewards, for AI safety, preventing unintended emergent behaviors.Delta Variances - Google DeepMindGoogle’s recent work introduces Delta Variance, an efficient algorithm for quantifying epistemic uncertainty in neural networks. It addresses the challenge of estimating uncertainty arising from limited data, which is crucial for reliable decision-making. The algorithm requires no modifications to network architecture or training. It offers a unified view of related methods and showcases improved performance through empirical results, including a weather simulation example.Test time scaling -zero risk response – John Hopkins UniversityThis work investigates whether increasing the inference-time compute budget improves model confidence in its answers. Models are evaluated in a selective question answering setting, where they can choose to abstain from answering.The results indicate that with increasing compute budget, the confidence in correct answers improves, but the confidence in incorrect answers decreases. They propose a new evaluation metric, utility, that considers both accuracy and confidence and show that the approach improves performance on Jeopardy Odds and Exam Odds benchmarks.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!👉Tell us more about your content needs We would love to hear from you! Fill out this form to tell us what you’d like to read in AI Distilled next. *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more