





















































Thousands of startups use Notion as a connected workspace to create and share docs, take notes, manage projects, and organize knowledge—all in one place. We’re offering 6 months of new Plus plans, including unlimited Notion AI so you can try it all for free!
Redemption Instructions
To redeem the Notion for Startups offer:
1. Submit an application using our custom link: https://ntn.so/packt and select Packt on the partner list.
2. Include our partner key, STARTUP4110P19151.
Sponsored
Welcome to DataPro #115 – Your Weekly Data Science & ML Wizardry! 🌟
Stay ahead in AI and ML with the latest strategies, tools, and insights. This week, we’re serving up top picks to supercharge your projects, enhance accuracy, and optimize performance. Let’s dive in! 🚀
Stay at the forefront of AI innovation! 🚀 Join us for 3 action-packed days of LIVE sessions with 20+ top experts and unleash the full power of Generative AI at our upcoming conference. Don’t miss out - Claim your spot today!
🔍 Algorithm Spotlight: Must-Know Models
✦ AgentPrune: A cost-saving multi-agent communication framework for LLMs that filters redundant and malicious content.
✦ Anthropic's Message Batches API: Efficient, asynchronous query processing at scale.
✦ EuroLLM Released: Multilingual models for EU languages, open-weight and powerful.
✦ Meta’s MovieGen: Next-gen media foundation models from Meta AI.
🚀 Future Trends You Can’t Miss
✦ AutoArena: Open-source AI tool for automated GenAI system evaluations.
✦ Reverb AI Models: State-of-the-art speech transcription and diarization outperforming top models.
✦ ML Deployment with Docker: A step-by-step guide.
✦ 10 Critical AI Concepts in 5 Minutes: Your quick learning boost.
🛠️ ML Tools Showdown: What’s Hot
✦ TxT360 by LLM360: A 15T-token pre-training dataset setting new standards.
✦ Google’s Gemma-2-JPN: A finely tuned AI model for Japanese text.
✦ Dataplex: Modern data governance for the AI-driven era.
✦ London Summit: UK businesses embrace Google Cloud AI solutions.
📊 Real-World Wins: ML Case Studies
✦ ZODIAC: Revolutionizing cardiology with LLM-powered diagnostics.
✦ Canvas: A new collaborative way to write and code with ChatGPT.
✦ Decision Tree Regressor: A hands-on visual guide with code.
✦ 5 AI Weekend Projects: Fast, fun, and built in Python.
✦ Domino Data Lab on AWS: Streamlining AI governance from policy to practice.
🌍 Industry Buzz: Latest Discoveries
✦ 10 Essential GitHub Features: Don’t miss out on these time-savers.
✦ Prompt Caching in LLMs: Unlocking efficiency and intuition.
✦ Slack Meets Amazon Q Business: Simplify your internal data sharing.
✦ Virgin Media O2 & BigQuery: Streamlined data sharing success.
Happy coding, data warriors! 🎯
Take our weekly survey and get a free PDF copy of our best-selling book,"Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!
Share Your Insights and Shine! 🌟💬
Cheers,
Merlyn Shelley,
Editor-in-Chief, Packt.
Secure and Simplify: Salesforce Data Protection with Rubrik
What if your Salesforce data was suddenly lost or corrupted? Human errors, accidental deletions, misconfigurations can all contribute to data loss. 1 of 2 SaaS users that did not implement SaaS data protection experienced data loss or corruption in the last 12 months.
Check out this exclusive webinar where we reveal Rubrik's new integration with Salesforce, designed to tackle this exact issue.
Sponsored
➽ RAG-Driven Generative AI: This new title, RAG-Driven Generative AI, is perfect for engineers and database developers looking to build AI systems that give accurate, reliable answers by connecting responses to their source documents. It helps you reduce hallucinations, balance cost and performance, and improve accuracy using real-time feedback and tools like Pinecone and Deep Lake. By the end, you’ll know how to design AI that makes smart decisions based on real-world data—perfect for scaling projects and staying competitive! Start your free trial for access, renewing at $19.99/month.
➽ Building Production-Grade Web Applications with Supabase: This new book is all about helping you master Supabase and Next.js to build scalable, secure web apps. It’s perfect for solving tech challenges like real-time data handling, file storage, and enhancing app security. You'll even learn how to automate tasks and work with multi-tenant systems, making your projects more efficient. By the end, you'll be a Supabase pro! Start your free trial for access, renewing at $19.99/month.
➽ Python Data Cleaning and Preparation Best Practices: This new book is a great guide for improving data quality and handling. It helps solve common tech issues like messy, incomplete data and missing out on insights from unstructured data. You’ll learn how to clean, validate, and transform both structured and unstructured data—think text, images, and audio—making your data pipelines reliable and your results more meaningful. Perfect for sharpening your data skills! Start your free trial for access, renewing at $19.99/month.
➽ Agent Prune: A Robust and Economic Multi-Agent Communication Framework for LLMs that Saves Cost and Removes Redundant and Malicious Contents. AgentPrune reduces token consumption in multi-agent systems by pruning redundant spatial and temporal communications. Developed by Tongji University researchers, it maintains accuracy, cuts costs, and enhances robustness against adversarial attacks in GPT-4 models.
➽ Anthropic AI Introduces the Message Batches API: A Powerful and Cost-Effective Way to Process Large Volumes of Queries Asynchronously. Anthropic's Message Batches API allows developers to process up to 10,000 queries asynchronously, ideal for bulk tasks. It offers 50% cost savings, 24-hour processing, and supports Claude models for scalable data analysis and content moderation.
➽ EuroLLM Released: A Suite of Open-Weight Multilingual Language Models (EuroLLM-1.7B and EuroLLM-1.7B-Instruct) Capable of Understanding and Generating Text in All Official European Union languages. The EuroLLM project, involving multiple institutions, developed multilingual language models to support all EU languages, addressing the English-language bias in AI. EuroLLM-1.7B and EuroLLM-1.7B-Instruct demonstrated strong performance in multilingual tasks and machine translation.
➽ Meta AI Unveils MovieGen: A Series of New Advanced Media Foundation AI Models. This blog introduces Meta AI's MovieGen, a cutting-edge media generation suite enabling high-resolution text-to-video, personalized video creation, and advanced audio synthesis, revolutionizing content creation with scalable, high-quality media generation techniques.
➽ AutoArena: An Open-Source AI Tool that Automates Head-to-Head Evaluations Using LLM Judges to Rank GenAI Systems. Kolena AI's AutoArena automates the evaluation of generative AI systems, using LLM judges to provide objective, scalable, and consistent model comparisons. It reduces human effort, costs, and subjectivity, accelerating AI innovation and decision-making.
➽ Rev Releases Reverb AI Models: Open Weight Speech Transcription and Diarization Model Beating the Current SoTA Models. This post introduces Rev's Reverb ASR and Diarization models, which offer state-of-the-art accuracy in speech transcription and speaker identification. These models outperform traditional systems, addressing challenges like long-form speech recognition and speaker attribution.
➽ Step-by-Step Guide to Deploying ML Models with Docker: This post explains how to deploy machine learning models using Docker, ensuring consistent environments across platforms. It covers setting up Docker, building a model, creating a Dockerfile, and pushing the container to Docker Hub for scalable deployment.
➽ 10 Critical AI Concepts Explained in 5 Minutes: This article offers a quick guide to 10 essential AI concepts, covering topics like algorithms, machine learning, generative AI, and responsible AI, providing a foundational understanding of today's AI advancements and ethical considerations.
➽ LLM360 Group Introduces TxT360: A Top-Quality LLM Pre-Training Dataset with 15T Tokens. LLM360's TxT360 is a 15-trillion-token pre-training dataset built from diverse, high-quality sources like FreeLaw and Wikipedia. Rigorous filtering and deduplication ensure clean, coherent data for developing advanced, open-source language models.
➽ Google Releases Gemma-2-JPN: A 2B AI Model Fine-Tuned on Japanese Text. Google's new "gemma-2-2b-jpn-it" model is a Japanese-focused, decoder-only LLM with open weights, designed for tasks like text generation and summarization. It offers high performance, compatibility with TPU hardware, and emphasizes ethical considerations.
➽ How Dataplex provides data governance for the AI era? This post introduces Dataplex, a data governance platform that automates discovery, curation, and management of distributed data. It offers features like automated cataloging, lineage tracking, intelligent search, and governance rules, enhancing data quality for generative AI.
➽ London Summit: UK businesses turn to Google Cloud AI. This blog highlights Google's AI advancements in the UK, focusing on its new Gemini model's impact across sectors. It covers Google Cloud Summit announcements, partnerships like Vodafone, investments in UK data centers, and support for startups through the new Google Cloud Startup Hub and AI Playground.
➽ ZODIAC: Bridging LLMs and Cardiological Diagnostics for Enhanced Clinical Precision. This blog discusses the use of LLMs in healthcare, focusing on ZODIAC, an advanced cardiology diagnostic system. It highlights ZODIAC's multi-agent framework, regulatory compliance, and superior performance in clinical settings, surpassing models like GPT-4o and BioGPT.
➽ Canvas is a new way to write and code with ChatGPT: This blog introduces Canvas, a new ChatGPT interface for writing and coding projects. Canvas enables collaborative editing, offering feedback, revisions, and shortcuts for tasks like adjusting length or debugging code. It's available to select users during beta.
➽ Decision Tree Regressor, Explained: A Visual Guide with Code Examples. This blog introduces Decision Tree Regressors, which predict numerical values using tree structures. It explains their mechanics, construction, and pruning techniques, focusing on post-pruning through cost complexity pruning to prevent overfitting and improve accuracy.
➽ 5 AI Projects You Can Build This Weekend (with Python): This blog suggests five AI project ideas for beginners and intermediate developers, emphasizing a problem-first approach. It provides step-by-step guidance and Python libraries for implementing projects like resume optimization, YouTube summarization, and PDF organization.
➽ AI Governance with Domino Data Lab on AWS: From Policies to Practices: This blog discusses the importance of AI governance in today's complex regulatory environment, highlighting Domino Data Lab's partnership with AWS. It emphasizes automating AI governance to ensure compliance, mitigate risks, and drive innovation.
➽ 10 GitHub Features That You Are Missing Out On: This blog explores GitHub's advanced features that enhance coding workflows, including GitHub Codespaces for cloud-based development, Copilot for AI coding assistance, Actions for automation, Pages for website hosting, and tools for collaboration, security, and project management.
➽ Prompt Caching in LLMs: Intuition. This blog explains how prompt caching reduces computational overhead in AI models by reusing preprocessed prompt segments. It covers the mechanics of caching tokens, embeddings, and internal states, improving efficiency in handling long prompts.
➽ Unlock the knowledge in your Slack workspace with Slack connector for Amazon Q Business: This blog introduces Amazon Q Business, an AI-powered assistant that integrates with enterprise applications like Slack. It covers configuring Slack connectors, syncing public and private communications, managing user authentication via AWS IAM, and using retrieval-augmented generation (RAG) for efficient query responses.
➽ How Virgin Media O2 simplified internal data sharing with BigQuery Analytics Hub? Virgin Media O2 implemented BigQuery's Analytics Hub to address data-sharing challenges, improving version control, governance, and real-time access. This solution reduced latency, manual effort, and errors, enabling efficient decision-making across teams and saving significant time and resources.