





















































🚀 The Most Awaited 2-for-1 Deal Drops Tomorrow! 🚀
Unlock our 2-for-1 offer at Generative AI in Action (Nov 11-13) and bring a friend, colleague, or your team to double the learning experience.
🗓 Sale Starts: Tomorrow, Friday, Oct 25, 10 AM ET
⏳ Duration: 24 hours only
Don’t miss out—mark your calendar and get ready to grab this exclusive deal!
Welcome to AI_Distilled. Today, we’ll talk about:
Techwave:
xAI, Elon Musk's AI startup, launches an API
Introducing Stable Diffusion 3.5
Introducing the new Claude 3.5 Sonnet, and Claude 3.5 Haiku and “Computer Use”
Meta releases Spirit LM, open-source multimodal modelintegrating text and speech seamlessly
New autonomous agents scale your team like never before
Awesome AI:
guidde・Magically create video documentation with AI
Feta - Better stand-ups, retros, syncs and more
BrowserCopilot AI - Your AI Companion Across the Web
MyLens.ai: Key Points of any Webpage & Youtube with one click
Trag: Superlinter for any stack
Masterclass:
Solving complex problems with OpenAI o1 models
Thinking LLMs:General Instruction Following with Thought Generation
Agent-as-a-Judge: Evaluate Agents with Agents
Learn dynamic few-shot prompting with LlamaIndexworkflows for enhanced LLM performance
Fine-tuning LLMs to 1.58-bit: compress models without sacrificing performance
HackHub
3b1b/videos: Code for the manim-generated scenes used in 3blue1brown videos
Janus: Any-to-Anyautoregressive frameworkfor multimodal AI.
Ichigo: Llama learns to talk - Homebrew
Cheers!
Shreyans Singh
Editor-in-Chief, Packt
xAI, Elon Musk's AI startup, launches an API
Elon Musk’s AI startup, xAI, has launched an API for its generative AI model, Grok, allowing developers to integrate Grok’s features into their applications. The API currently offers a single model, "grok-beta," priced at $5 per million input tokens and $15 per million output tokens. Grok, which powers various features on X (formerly Twitter), is known for its rebellious, uncensored responses and image generation capabilities. Although still developing, xAI aims to catch up to competitors like OpenAI and Anthropic, using data from Musk's companies and X to train future models.
Introducing Stable Diffusion 3.5
Stable Diffusion 3.5 is the latest release from Stability AI, offering multiple highly customizable models designed to run on consumer hardware. These models, including Stable Diffusion 3.5 Large and Large Turbo, are available for free for most uses under a permissive license. They offer a balance of high image quality, fast performance, and flexibility, making them ideal for creators, researchers, and businesses. The models can generate diverse images in various styles and are available for download on platforms like Hugging Face.
Introducing the new Claude 3.5 Sonnet, and Claude 3.5 Haiku and “Computer Use”
Anthropic has announced updates to its Claude 3.5 models, including the upgraded Claude 3.5 Sonnet, which excels in coding and tool use, and the new Claude 3.5 Haiku, which offers similar performance to previous top-tier models at a lower cost and faster speed. They’ve also introduced a groundbreaking “computer use” capability in public beta, allowing Claude to interact with computers like a human by navigating interfaces, clicking buttons, and typing. This feature is still experimental but has potential for automating complex tasks.
Meta releases Spirit LM, open-source multimodal modelintegrating text and speech seamlessly
Meta has released Spirit LM, a model for handling both spoken and written language in an interleaved manner. The repository contains model weights, inference code, and evaluation scripts for the Spirit LM model, which can be set up using Conda or pip. It includes tools for speech tokenization and text generation, with an emphasis on preserving speech-text sentiment in its outputs.
New autonomous agents scale your team like never before
Microsoft announced new autonomous agent capabilities in Copilot Studio to help businesses scale more efficiently. Starting next month, businesses will be able to create their own agents, designed to handle tasks like sales, supply chain management, and customer service. These agents, integrated into Dynamics 365, can automate complex processes such as lead generation, supplier communication, and customer support.
guidde・Magically create video documentation with AI
Guidde is an AI-powered platform designed to help businesses quickly create video documentation, making complex workflows easier to explain. It enables users to capture processes using a browser extension or desktop app and automatically generates step-by-step instructions with customizable AI-generated voiceovers.
Feta - Better stand-ups, retros, syncs and more
Feta is a platform designed to help product and engineering teams run more efficient meetings by streamlining tasks and capturing key insights. It auto-compiles updates for standups, integrates with tools like Jira and GitHub, and generates actionable meeting summaries and notes.
BrowserCopilot AI - Your AI Companion Across the Web
Yaseen AI is a browser-based AI companion that helps professionals work more efficiently by providing real-time assistance on any website. It integrates seamlessly with workflows, offering personalized responses and support through its Copilot feature.
MyLens.ai: Key Points of any Webpage & Youtube with one click
MyLens.ai is a Chrome extension that transforms any webpage or YouTube video into visual summaries like mindmaps, timelines, tables, and flowcharts with just one click. It helps users quickly extract key insights from long articles, reports, or videos, saving time by breaking down complex content into clear, shareable visuals.
Trag: Superlinter for any stack
Superlinter, powered by Trag, is a versatile tool that allows developers to replace traditional linters and code analysis tools with a natural language-based linter that works for any programming language. Users can describe specific code patterns or rules in plain English, which the linter then enforces within their code.
Solving complex problems with OpenAI o1 models
Thinking LLMs:General Instruction Following with Thought Generation
Large Language Models are typically trained to respond to user instructions based on patterns in data, but they lack the ability to think explicitly before answering. This is important for complex tasks that require reasoning or planning. To address this, a method called Thought Preference Optimization (TPO) allows LLMs to develop thinking abilities without additional human data. The process involves generating multiple potential thoughts, evaluating the quality of the final responses, and optimizing them through reinforcement learning.
Agent-as-a-Judge: Evaluate Agents with Agents
The "Agent-as-a-Judge" framework is a new method for evaluating agentic systems, where agents are used to evaluate other agents instead of relying on human evaluators or traditional methods that only consider final outcomes. This framework provides feedback throughout the task-solving process, which is important for agentic systems that act step-by-step, like humans. Applied to code generation, "Agent-as-a-Judge" proved more effective and reliable than the existing LLM-as-a-Judge framework and performed similarly to human evaluators, but at a much lower cost and time.
Learn dynamic few-shot prompting with LlamaIndexworkflows for enhanced LLM performance
In LlamaIndex, workflows are event-driven systems where functions are chained together as steps, each handling specific event types. By using the `@step` decorator, the system ensures that steps only run when a valid event is received, and each step can emit new events for the next. Workflows enable creating processes like agents, document extraction, or retrieval-augmented generation (RAG) pipelines. They are fully asynchronous, allowing efficient parallel processing, and come with built-in observability. Users can integrate global contexts, handle multiple events, and even retry steps in case of failures.
Fine-tuning LLMs to 1.58-bit: compress models without sacrificing performance
Fine-tuning large language models (LLMs) to use only 1.58 bits per parameter (based on the BitNet architecture) dramatically reduces their computational and memory requirements by using extreme quantization. This process limits the values of each parameter to just three options: -1, 0, and 1. Although such quantization typically requires training a model from scratch, the authors have found ways to fine-tune pre-trained models to achieve similar efficiency without losing significant performance.
3b1b/videos: Code for the manim-generated scenes used in 3blue1brown videos
This project contains the code used to create the math videos by 3Blue1Brown, primarily using the Manim library, a tool for generating mathematical animations. While the Manim library itself is open source under the MIT license, the content in this repository is under a Creative Commons license (CC BY-NC-SA 4.0), which allows sharing and adapting with credit but not for commercial purposes.
Phidata is a framework for building intelligent agents equipped with memory, knowledge, tools, and reasoning capabilities. You can create agents for various tasks, like web search or financial data analysis, and even combine them into teams to work together.
Composio is a toolset that helps developers build AI agents equipped with a wide range of pre-configured tools and integrations with minimal effort. It simplifies tasks like authentication, accuracy, and reliability, enabling developers to create agents that can interact with platforms like GitHub, Notion, Slack, and more.
Janus: Any-to-Anyautoregressive frameworkfor multimodal AI.
Janus is an advanced multimodal framework that improves the way AI models understand and generate both visual and textual content. It separates the visual encoding process into distinct pathways but maintains a unified transformer architecture, which increases flexibility and performance for various tasks.
Ichigo: Llama learns to talk - Homebrew
Ichigo is a new speech and text multimodal model built on Llama3-s, designed for understanding and generating both audio and text. Developed through open research by the Homebrew Computer Company, Ichigo addresses key limitations in earlier models, such as limited multilingual capabilities and issues with recognizing nonspeech inputs.
📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.
If you have any comments or feedback, just reply back to this email.
Thanks for reading and have a great day!