





















































Zapier connects the apps you use every day, so you can focus on what matters most.
Start working more efficiently - Create your free account today.
Sponsored
🗞️ Welcome to DataPro #124 – Your Weekly Data Science & ML Wizardry! 🌟
Stay on top of the AI and ML game with cutting-edge tools, insights, and strategies. This week, we’re bringing you trending resources to supercharge your projects, enhance accuracy, and drive innovation. Let’s dive in!
🔍 Algorithm Spotlight: Models Making Waves
✦ Google Gemini 2.0: Ushering in the agentic AI era.
✦ AlphaQubit: Google’s breakthrough in quantum error correction.
✦ Genie 2: A massive foundation world model.
✦ OpenAI’s GPT-4o-mini: Transforming retail experiences.
✦ Microsoft's AI Carbon Tracker: Real-time global emission monitoring.
✦ Quartz Atlas AI: Accelerating drug discovery.
🚀 Trend Watch: What’s Hot in Tech
✦ Top 5 Tips for Fine-Tuning LLMs.
✦ AI Implementation Lessons from Early Adopters.
✦ DeepSeek V2.5: Next-gen insights.
✦ MAG-V by Splunk: AI innovation decoded.
✦ Stability AI’s Arabic Stable LM 1.6B: A new language model frontier.
🛠️ Tool Picks: ML Services in the Spotlight
✦ 7 Python Libraries Every MLOps Pro Needs.
✦ The Dark Side of Tech: Misuse in Education.
✦ EXAONE 3.5 by LG AI Research: Advancing AI capabilities.
✦ CePO by Cerebras: Smart planning and optimization.
✦ Hugging Face TGI v3.0: Revolutionizing text generation.
✦ Meta AI SPDL: Efficient data loading at scale.
📊 ML in Action: Stories That Inspire
✦ Gemini 1.5 Pro: Building a podcast powerhouse.
✦ Text Classification 101 with Hugging Face Transformers.
✦ 3 Key Business Skills for Data Science Careers in 2025.
✦ LLM-as-a-Judge: Structured Generation in Practice.
✦ Shopify Case Study: Using synthetic data effectively.
✦ Combining Big and Small LLMs for Faster, Better Inference.
✦ Building a Versatile LLM Agent: Step by Step.
Enjoy exploring, learning, and building this week!
Stay tuned and stay inspired – there’s always something new to discover in the ever-evolving world of Data Science and Machine Learning!
Take our weekly survey and get a free PDF copy of our best-selling book,"Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!
This is our final edition of DataPro for 2024, but don’t worry—we’ll be back with more insights and updates in January 2025. In the meantime, we’ve got a little holiday treat for you!
Packt has some exciting offers lined up to help you boost your tech skills and get ready for an amazing new year! It’s the perfect opportunity to relax, learn something new, and stay ahead in your field. Keep an eye out for these special holiday deals!
From all of us at the Packt Newsletters team, we wish you a joyful holiday season and a fantastic start to 2025. See you next year! 🎄✨
Cheers,
Merlyn Shelley
Editor-in-Chief, Packt.
Software delivery to dedicated edge devices is one of the most complex challenges faced by IT professionals today. While edge deployments come with inherent complications, it’s possible to avoid the pitfalls. With this guide in hand, a little planning, and the right tools and strategies in place, you can be confident you’ll never push a faulty update at scale.
Sponsored
➽ RAG-Driven Generative AI: This new title, RAG-Driven Generative AI, is perfect for engineers and database developers looking to build AI systems that give accurate, reliable answers by connecting responses to their source documents. It helps you reduce hallucinations, balance cost and performance, and improve accuracy using real-time feedback and tools like Pinecone and Deep Lake. By the end, you’ll know how to design AI that makes smart decisions based on real-world data—perfect for scaling projects and staying competitive! Start your free trial for access, renewing at $19.99/month.
➽ Building Production-Grade Web Applications with Supabase: This new book is all about helping you master Supabase and Next.js to build scalable, secure web apps. It’s perfect for solving tech challenges like real-time data handling, file storage, and enhancing app security. You'll even learn how to automate tasks and work with multi-tenant systems, making your projects more efficient. By the end, you'll be a Supabase pro! Start your free trial for access, renewing at $19.99/month.
➽ Python Data Cleaning and Preparation Best Practices: This new book is a great guide for improving data quality and handling. It helps solve common tech issues like messy, incomplete data and missing out on insights from unstructured data. You’ll learn how to clean, validate, and transform both structured and unstructured data—think text, images, and audio—making your data pipelines reliable and your results more meaningful. Perfect for sharpening your data skills! Start your free trial for access, renewing at $19.99/month.
➽ Google introduces Gemini 2.0: A new AI model for the agentic era. Google has introduced Gemini 2.0, its most advanced AI model yet, with groundbreaking multimodal capabilities, agentic features for enhanced reasoning, and integration across products like Search. It’s faster, smarter, and redefines AI’s role as a universal assistant.
➽ AlphaQubit: Google’s research on quantum error correction. Google DeepMind and Quantum AI introduce AlphaQubit, a groundbreaking AI decoder that improves quantum error correction with unmatched accuracy. This innovation brings us closer to reliable quantum computing, unlocking possibilities in drug discovery, material design, and fundamental science.
➽ Genie 2: A large-scale foundation world model. Google DeepMind unveils Genie 2, a cutting-edge world model generating endless 3D environments for training AI and interactive gameplay. From a single image prompt, it creates action-controllable worlds, accelerating embodied agent development and advancing AI research.
➽ Boosting the customer retail experience with GPT-4o-mini: Zalando, Europe’s leading online fashion platform, partnered with OpenAI to enhance its AI-powered Zalando Assistant. Upgraded to GPT-4o mini, the Assistant now delivers personalized recommendations in 25 markets, boosting product clicks by 23%, wishlists by 41%, and reducing costs.
➽ Microsoft Research Introduces AI-Powered Carbon Budgeting Method: A Real-Time Approach to Tracking Global Carbon Sinks and Emission. Microsoft Research Asia, in collaboration with global institutions, introduces an AI-powered method for near-real-time carbon budgeting. Using satellite data and machine learning, the model predicts global carbon sinks with unprecedented speed and accuracy, addressing critical climate change challenges.
➽ Quartz Atlas AI for Drug Discovery: Quartz Atlas AI™, developed by Deloitte and AWS, revolutionizes drug discovery by streamlining data connectivity, enhancing insights with domain-specific AI models, and simplifying accessibility for researchers. This AI-powered workbench accelerates R&D while reducing reliance on costly, unproductive trials.
➽ Top 5 Tips for Fine-Tuning LLMs: Fine-tuning large language models (LLMs) can unlock domain-specific performance for tasks in medicine, law, and beyond. By prioritizing data quality and selecting the right architecture, like GPT for generation or BERT for comprehension, models become more robust and effective.
➽ Overcoming AI Implementation Challenges: Lessons from Early Adopters. Implementing AI is transformative but challenging, with hurdles like data quality, accessibility, and talent shortages. Early adopters share valuable lessons in overcoming these issues, emphasizing robust data management, scalable infrastructure, and fostering skilled talent for successful AI adoption.
➽ DeepSeek AI Just Released DeepSeek-V2.5-1210: DeepSeek AI introduces DeepSeek-V2.5-1210, an enhanced model excelling in mathematics, coding, writing, and reasoning. With improved accuracy, live coding capabilities, and user-friendly features, it’s a versatile tool for researchers, developers, and professionals across diverse fields.
➽ Splunk Researchers Introduce MAG-V: Splunk Inc. introduces MAG-V, a multi-agent framework addressing challenges in AI trajectory verification and synthetic data generation. By combining machine learning and deterministic methods, MAG-V ensures accuracy, scalability, and privacy while outperforming traditional LLM-based solutions in reliability and cost-efficiency.
➽ Stability AI Releases Arabic Stable LM 1.6B Base and Chat Models: Stability AI's Arabic Stable LM 1.6B offers a resource-efficient solution for Arabic NLP, balancing cultural alignment and performance. With fine-tuning on over 100 billion tokens, it excels in tasks like question answering and cultural context recognition, advancing inclusivity in language AI.
➽ 7 Essential Python Libraries for MLOps: This blog explores seven essential Python libraries for MLOps, enabling users to streamline machine learning workflows, from experiment tracking and orchestration to model serving and performance monitoring, with tools like MLflow and Prefect.
➽ Accusatory AI: How misuse of technology is harming students. This blog discusses the flaws of AI-powered cheating detection tools in education, highlighting their potential for false accusations against students. It emphasizes the importance of transparency, evidence, and fairness, urging educators to use these tools constructively rather than as punitive measures.
➽ LG AI Research Releases EXAONE 3.5: LG AI Research's EXAONE 3.5 introduces advanced bilingual models excelling in English and Korean tasks, offering long-context processing, scalability, and cost-efficiency. With three versions optimized for diverse applications, EXAONE 3.5 sets new benchmarks in language AI performance.
➽ Cerebras Introduces CePO (Cerebras Planning and Optimization): Cerebras introduces CePO, an AI framework enhancing Llama models with embedded planning and reasoning capabilities. CePO streamlines complex decision-making in industries like logistics and healthcare, combining neural-symbolic methods for adaptability, efficiency, and scalability in advanced optimization tasks.
➽ Hugging Face Releases Text Generation Inference (TGI) v3.0: Hugging Face's Text Generation Inference (TGI) v3.0 enhances text generation efficiency, offering 13x faster processing, 3x higher token capacity, and reduced memory usage. It simplifies deployment with zero-configuration, enabling scalable, high-performance NLP for long prompts and dynamic contexts.
➽ Meta AI Introduces SPDL (Scalable and Performant Data Loading): Meta AI's SPDL (Scalable and Performant Data Loading) optimizes AI training by accelerating data delivery to GPUs. With thread-based architecture, prefetching, and caching, SPDL reduces training times, cuts costs, and boosts efficiency, making it ideal for large-scale, distributed AI workflows.
➽ Learn how to build a podcast with Gemini 1.5 Pro: Google Cloud's Gemini 1.5 Pro and Text-to-Speech API enable creators to generate custom podcasts by transforming written content into engaging audio formats. With diverse voices, multilingual support, and script generation, this approach expands reach, boosts engagement, and repurposes content effortlessly.
➽ How to Build a Text Classification Model with Hugging Face Transformers? This article explains how to train a transformer-based text classification model using Hugging Face Transformers in five simple steps. It covers loading data, tokenizing, initializing model architecture, and fine-tuning with ease for custom tasks.
➽ 3 Business Skills You Need to Progress Your Data Science Career in 2025: This blog highlights the essential business and strategic skills data scientists need as they transition into leadership roles. It emphasizes the importance of financial fluency, staying updated on AI/ML trends, and aligning technical expertise with business impact for career growth.
➽ How to Use Structured Generation for LLM-as-a-Judge Evaluations? This blog explores the concept of structured generation, a method to guide large language model (LLM) outputs into specific formats using schemas like context-free grammars (CFG). It demonstrates how structured generation enhances tasks such as hallucination detection and content validation in LLM-based evaluations.
➽ Synthetic Data in Practice: A Shopify Case Study: This blog examines the practical utility of synthetic data through a side-by-side comparison of 30,000 real Shopify transactions and their synthetic counterparts. It evaluates how closely synthetic data mirrors real trends, identifies discrepancies, and highlights when it’s reliable for decision-making.
➽ Combining Large and Small LLMs to Boost Inference Time and Quality: This blog explores efficient and high-quality text generation strategies using contrastive decoding, combining large and small language models. It demonstrates how optimizing token selection improves inference speed and output reliability in large language models like GPT-2.
➽ How to Build a General-Purpose LLM Agent? This blog explains how to build a general-purpose LLM agent, a versatile system capable of executing user queries with adaptable workflows. It covers selecting the right LLM, defining agent control logic, and leveraging agentic architectures for diverse, flexible use cases.