introducing-the-new-claude-35-sonnet-and-claude-35-haiku-and-computer-use-img-0

AI_Distilled #73: Introducing the new Claude 3.5 Sonnet, and Claude 3.5 Haiku and “Computer Use”

introducing-the-new-claude-35-sonnet-and-claude-35-haiku-and-computer-use-img-1

🚀 The Most Awaited 2-for-1 Deal Drops Tomorrow! 🚀

Unlock our 2-for-1 offer at Generative AI in Action (Nov 11-13) and bring a friend, colleague, or your team to double the learning experience.

🗓 Sale Starts: Tomorrow, Friday, Oct 25, 10 AM ET
⏳ Duration: 24 hours only

Don’t miss out—mark your calendar and get ready to grab this exclusive deal!

Join 25+ AI Experts, 30+ Sessions & 1000+ Tech Pros

Welcome to AI_Distilled. Today, we’ll talk about:

Techwave:

xAI, Elon Musk's AI startup, launches an API

Introducing Stable Diffusion 3.5

Introducing the new Claude 3.5 Sonnet, and Claude 3.5 Haiku and “Computer Use”

Meta releases Spirit LM, open-source multimodal modelintegrating text and speech seamlessly

New autonomous agents scale your team like never before

Awesome AI:

guidde・Magically create video documentation with AI

Feta - Better stand-ups, retros, syncs and more

BrowserCopilot AI - Your AI Companion Across the Web

MyLens.ai: Key Points of any Webpage & Youtube with one click

Trag: Superlinter for any stack

Masterclass:

Solving complex problems with OpenAI o1 models

Thinking LLMs:General Instruction Following with Thought Generation

Agent-as-a-Judge: Evaluate Agents with Agents

Learn dynamic few-shot prompting with LlamaIndexworkflows for enhanced LLM performance

Fine-tuning LLMs to 1.58-bit: compress models without sacrificing performance

HackHub

3b1b/videos: Code for the manim-generated scenes used in 3blue1brown videos

phidatahq/phidata: Build AI Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI.

ComposioHQ/composio: Composio equip's your AI agents & LLMs with 100+ high-quality integrations via function calling

Janus: Any-to-Anyautoregressive framework for multimodal AI.

Ichigo: Llama learns to talk - Homebrew

Cheers!

Shreyans Singh

Editor-in-Chief, Packt

⚡ TechWave: AI/GPT News & Analysis

xAI, Elon Musk's AI startup, launches an API

Elon Musk’s AI startup, xAI, has launched an API for its generative AI model, Grok, allowing developers to integrate Grok’s features into their applications. The API currently offers a single model, "grok-beta," priced at $5 per million input tokens and $15 per million output tokens. Grok, which powers various features on X (formerly Twitter), is known for its rebellious, uncensored responses and image generation capabilities. Although still developing, xAI aims to catch up to competitors like OpenAI and Anthropic, using data from Musk's companies and X to train future models.

Introducing Stable Diffusion 3.5

Stable Diffusion 3.5 is the latest release from Stability AI, offering multiple highly customizable models designed to run on consumer hardware. These models, including Stable Diffusion 3.5 Large and Large Turbo, are available for free for most uses under a permissive license. They offer a balance of high image quality, fast performance, and flexibility, making them ideal for creators, researchers, and businesses. The models can generate diverse images in various styles and are available for download on platforms like Hugging Face.

Introducing the new Claude 3.5 Sonnet, and Claude 3.5 Haiku and “Computer Use”

Anthropic has announced updates to its Claude 3.5 models, including the upgraded Claude 3.5 Sonnet, which excels in coding and tool use, and the new Claude 3.5 Haiku, which offers similar performance to previous top-tier models at a lower cost and faster speed. They’ve also introduced a groundbreaking “computer use” capability in public beta, allowing Claude to interact with computers like a human by navigating interfaces, clicking buttons, and typing. This feature is still experimental but has potential for automating complex tasks.

Meta releases Spirit LM, open-source multimodal modelintegrating text and speech seamlessly

Meta has released Spirit LM, a model for handling both spoken and written language in an interleaved manner. The repository contains model weights, inference code, and evaluation scripts for the Spirit LM model, which can be set up using Conda or pip. It includes tools for speech tokenization and text generation, with an emphasis on preserving speech-text sentiment in its outputs.

New autonomous agents scale your team like never before

Microsoft announced new autonomous agent capabilities in Copilot Studio to help businesses scale more efficiently. Starting next month, businesses will be able to create their own agents, designed to handle tasks like sales, supply chain management, and customer service. These agents, integrated into Dynamics 365, can automate complex processes such as lead generation, supplier communication, and customer support.

💻 Awesome AI: Tools for Work

guidde・Magically create video documentation with AI

Guidde is an AI-powered platform designed to help businesses quickly create video documentation, making complex workflows easier to explain. It enables users to capture processes using a browser extension or desktop app and automatically generates step-by-step instructions with customizable AI-generated voiceovers.

Feta - Better stand-ups, retros, syncs and more

Feta is a platform designed to help product and engineering teams run more efficient meetings by streamlining tasks and capturing key insights. It auto-compiles updates for standups, integrates with tools like Jira and GitHub, and generates actionable meeting summaries and notes.

BrowserCopilot AI - Your AI Companion Across the Web

Yaseen AI is a browser-based AI companion that helps professionals work more efficiently by providing real-time assistance on any website. It integrates seamlessly with workflows, offering personalized responses and support through its Copilot feature.

MyLens.ai: Key Points of any Webpage & Youtube with one click

MyLens.ai is a Chrome extension that transforms any webpage or YouTube video into visual summaries like mindmaps, timelines, tables, and flowcharts with just one click. It helps users quickly extract key insights from long articles, reports, or videos, saving time by breaking down complex content into clear, shareable visuals.

Trag: Superlinter for any stack

Superlinter, powered by Trag, is a versatile tool that allows developers to replace traditional linters and code analysis tools with a natural language-based linter that works for any programming language. Users can describe specific code patterns or rules in plain English, which the linter then enforces within their code.

🔛 Masterclass: AI/LLM Tutorials

Solving complex problems with OpenAI o1 models

Thinking LLMs:General Instruction Following with Thought Generation

Large Language Models are typically trained to respond to user instructions based on patterns in data, but they lack the ability to think explicitly before answering. This is important for complex tasks that require reasoning or planning. To address this, a method called Thought Preference Optimization (TPO) allows LLMs to develop thinking abilities without additional human data. The process involves generating multiple potential thoughts, evaluating the quality of the final responses, and optimizing them through reinforcement learning.

Agent-as-a-Judge: Evaluate Agents with Agents

The "Agent-as-a-Judge" framework is a new method for evaluating agentic systems, where agents are used to evaluate other agents instead of relying on human evaluators or traditional methods that only consider final outcomes. This framework provides feedback throughout the task-solving process, which is important for agentic systems that act step-by-step, like humans. Applied to code generation, "Agent-as-a-Judge" proved more effective and reliable than the existing LLM-as-a-Judge framework and performed similarly to human evaluators, but at a much lower cost and time.

Learn dynamic few-shot prompting with LlamaIndexworkflows for enhanced LLM performance

In LlamaIndex, workflows are event-driven systems where functions are chained together as steps, each handling specific event types. By using the `@step` decorator, the system ensures that steps only run when a valid event is received, and each step can emit new events for the next. Workflows enable creating processes like agents, document extraction, or retrieval-augmented generation (RAG) pipelines. They are fully asynchronous, allowing efficient parallel processing, and come with built-in observability. Users can integrate global contexts, handle multiple events, and even retry steps in case of failures.

Fine-tuning LLMs to 1.58-bit: compress models without sacrificing performance

Fine-tuning large language models (LLMs) to use only 1.58 bits per parameter (based on the BitNet architecture) dramatically reduces their computational and memory requirements by using extreme quantization. This process limits the values of each parameter to just three options: -1, 0, and 1. Although such quantization typically requires training a model from scratch, the authors have found ways to fine-tune pre-trained models to achieve similar efficiency without losing significant performance.

🚀 HackHub: AI Tools

3b1b/videos: Code for the manim-generated scenes used in 3blue1brown videos

This project contains the code used to create the math videos by 3Blue1Brown, primarily using the Manim library, a tool for generating mathematical animations. While the Manim library itself is open source under the MIT license, the content in this repository is under a Creative Commons license (CC BY-NC-SA 4.0), which allows sharing and adapting with credit but not for commercial purposes.

phidatahq/phidata: Build AI Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI.

Phidata is a framework for building intelligent agents equipped with memory, knowledge, tools, and reasoning capabilities. You can create agents for various tasks, like web search or financial data analysis, and even combine them into teams to work together.

ComposioHQ/composio: Composio equip's your AI agents & LLMs with 100+ high-quality integrations via function calling

Composio is a toolset that helps developers build AI agents equipped with a wide range of pre-configured tools and integrations with minimal effort. It simplifies tasks like authentication, accuracy, and reliability, enabling developers to create agents that can interact with platforms like GitHub, Notion, Slack, and more.

Janus: Any-to-Anyautoregressive frameworkfor multimodal AI.

Janus is an advanced multimodal framework that improves the way AI models understand and generate both visual and textual content. It separates the visual encoding process into distinct pathways but maintains a unified transformer architecture, which increases flexibility and performance for various tasks.

Ichigo: Llama learns to talk - Homebrew

Ichigo is a new speech and text multimodal model built on Llama3-s, designed for understanding and generating both audio and text. Developed through open research by the Homebrew Computer Company, Ichigo addresses key limitations in earlier models, such as limited multilingual capabilities and issues with recognizing nonspeech inputs.

📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.

If you have any comments or feedback, just reply back to this email.

Thanks for reading and have a great day!