





















































We're looking for data professionals to join a quick 30-minute chat about their learning needs. The first 25 respondents in a data-specific role will have the opportunity to speak with our team, share their insights, and receive a free Packt credit to claim any eBook of their choice! Hurry – submit your interest now and keep an eye out for our team's meeting invite. You could be one of the chosen ones!
GRC is no longer just a checkbox, it’s a competitive advantage.
Hyperproof’s6th Annual IT Risk & Compliance Benchmark Reportreveals a major shift: organizations are maturing their GRC practices, centralizing teams, and increasing budgets. With91% of companies now prioritizing compliance, the landscape is evolving fast.
The key takeaway?Governance, risk, and compliance are now drivers of operational excellence and strategic growth. Hyperproof’s industry insights and newGRC Maturity Modelequip organizations to stay ahead.
📊Get thefull report& start building a stronger, more resilient GRC strategy today.
Sponsored
📢 Welcome to DataPro #126 ~ Your Weekly Dose of Data Science & ML Innovation!
The world of data science and machine learning is advancing at lightning speed, and we’re here to keep you ahead of the curve! Whether it’s breakthrough AI frameworks, game-changing open-source tools, or must-know industry updates, this edition packs everything you need to stay informed, innovate, and lead in the ML space.
📚 New Releases You Can't Miss:
✅Hands-On Machine Learning with C++ - Build smart models with modern C++ libraries.
✅Biostatistics with Python - Apply Python to real-world biomedical & biotech projects.
✅Data Engineering with Databricks Cookbook - Master Apache Spark, Delta Lake & Databricks.
🔍 This Week’s Deep Dive:
✅ Support Vector Machine (SVM) Algorithm - A fundamental yet powerful ML technique.
✅ OpenAI’s Deep Research Agent -How it’s revolutionizing data-driven discovery.
✅ Yandex’s Open-Source Perforator - Optimizing server performance like never before.
✅ Meta AI’s MILS - A training-free multimodal AI framework pushing zero-shot learning to new heights.
✅ No-Code ML with Amazon SageMaker Canvas - Predict heart disease with an intuitive workflow.
✅ Vertex AI Gen AI Evaluation Service - A smarter way to assess and improve AI agents.
🧠 Featured Insights:
✅Mistral AI Releases Mistral-Small-24B-Instruct-2501 - A low-latency 24B-parameter model under Apache 2.0.
✅Improving Agent Systems & AI Reasoning - Smarter, more reliable AI solutions.
Whether you’re a data scientist, ML engineer, or AI enthusiast, DataPro keeps you informed, inspired, and ahead of the curve. Stay tuned for more updates next week!
💡 Got a topic you'd love to see covered? Let us know! 🚀
Cheers,
Merlyn Shelley
Growth Lead, Packt.
❯❯❯❯ Hands-On Machine Learning with C++:Written by Kirill Kolodiazhnyi, this book equips machine learning engineers with practical ML and deep learning techniques using modern C++ libraries. You will learn about model selection, tuning, and deployment on mobile and embedded devices, real-time object detection, transfer learning, MLflow for experiment tracking, and Optuna for hyperparameter tuning, providing a complete guide to building efficient ML systems. Start your free trial for access, renewing at $19.99/month.
❯❯❯❯ Biostatistics with Python: Written by Darko Medin, this book simplifies biostatistics with Python through hands-on biomedical and biotechnology projects. You will learn about data cleaning, hypothesis testing, effect size analysis, predictive modeling, survival analysis, and meta-analysis, making it easier to apply statistical methods in biological research. With real-world case studies, this guide helps life science professionals and researchers confidently integrate biostatistical analysis into their work. Start your free trial for access, renewing at $19.99/month.
❯❯❯❯ Data Engineering with Databricks Cookbook: Written by Pulkit Chadha, this cookbook provides a practical, recipe-based guide to mastering data engineering with Databricks, Apache Spark, and Delta Lake. You will learn about data ingestion, transformation, and optimization, as well as orchestrating pipelines, implementing DataOps/DevOps, and enforcing data governance with Unity Catalog. Designed for data engineers and practitioners, this book offers hands-on techniques to build scalable, high-performance data solutions in modern cloud environments. Start your free trial for access, renewing at $19.99/month.
❯❯❯❯ Support Vector Machines: A Progression of Algorithms: This blog explores the Support Vector Machine (SVM) algorithm, a powerful tool for classification problems. It explains the progression from the Maximal Margin Classifier (MMC) to the Support Vector Classifier (SVC) and finally to SVM, highlighting how each step improves decision boundary flexibility and robustness.
❯❯❯❯ Are Public Agencies Letting Open-Source Software Down? This blog explores the impact of open-source software on technology, innovation, and democracy. It highlights its role in AI advancements, geospatial mapping, and public collaboration. Through personal anecdotes and practical examples, it underscores how open access, transparency, and shared knowledge drive progress across industries and global communities.
❯❯❯❯ Improving Agent Systems & AI Reasoning: This blog explores the rise of AI Agents and the limitations of large language models (LLMs) in reasoning. It examines how new Reasoning Language Models (RLMs), like DeepSeek-R1 and OpenAI’s o1 and o3, improve AI reasoning through post-training and test-time compute scaling, reshaping AI agent development.
❯❯❯❯ What OpenAI’s Deep Research Means for the Future of Data Science? This blog introduces OpenAI’s Deep Research Agent, a tool designed to streamline complex data gathering and analysis for data scientists. It automates multi-step research, synthesizes information from diverse sources, ensures accuracy with verified citations, and enhances efficiency in problem-solving across domains like healthcare, finance, and AI development.
❯❯❯❯ Meta AI Introduces MILS: A Training-Free Multimodal AI Framework for Zero-Shot Image, Video, and Audio Understanding. This blog introduces Meta AI’s MILS, a training-free multimodal AI framework that enables large language models (LLMs) to perform image, video, and audio reasoning without task-specific training. Using an iterative optimization process with a generator and scorer, MILS enhances zero-shot performance across diverse modalities, improving multimodal AI adaptability.
❯❯❯❯ 4 Open-Source Alternatives to OpenAI’s $200/Month Deep Research AI Agent. This blog explores four open-source AI research agents that serve as cost-effective alternatives to OpenAI’s Deep Research AI Agent. These tools leverage advanced search, extraction, and reasoning capabilities, offering researchers customizable, self-hostable solutions for automating in-depth research without the high cost of proprietary AI systems.
❯❯❯❯ Mistral AI Releases the Mistral-Small-24B-Instruct-2501: A Latency-Optimized 24B-Parameter Model Released Under the Apache 2.0 License: This blog introduces Mistral-Small-24B-Instruct-2501, a compact yet high-performing language model designed for efficiency and accessibility. With 24 billion parameters, multilingual capabilities, and a 32k context window, it rivals larger models like Llama 3 while supporting local deployment and open-source flexibility under the Apache 2.0 license.
❯❯❯❯ Yandex Develops and Open-Sources Perforator: An Open-Source Tool that can Save Businesses Billions of Dollars a Year on Server Infrastructure. This blog introduces Perforator, an open-source tool from Yandex designed for real-time server and application performance monitoring. By identifying resource-intensive code and enabling profile-guided optimization, Perforator helps businesses cut infrastructure costs by up to 20%, making it a powerful solution for efficiency and scalability.
❯❯❯❯ Advances to low-bit quantization enable LLMs on edge devices: This blog explores advancements in low-bit quantization for deploying large language models (LLMs) on edge devices. Microsoft Research introduces T-MAC, Ladder, and LUT Tensor Core, three solutions optimizing mixed-precision matrix multiplication (mpGEMM) to improve AI efficiency. These innovations enhance model performance, reduce memory demands, and enable real-time AI processing on resource-constrained hardware.
❯❯❯❯ Trellix lowers cost, increases speed, and adds delivery flexibility with cost-effective and performant Amazon Nova Micro and Amazon Nova Lite models: This blog explores how Trellix Wise, an AI-powered cybersecurity platform, integrates Amazon Nova Micro to enhance threat investigation speed and cost efficiency. By leveraging generative AI and Retrieval-Augmented Generation (RAG), Trellix automates security event analysis, reducing investigation time while maintaining accuracy, improving scalability, and optimizing operational costs.
❯❯❯❯ OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service: This blog explores how OfferUp modernized its search architecture by adopting Amazon Titan Multimodal Embeddings and Amazon OpenSearch Service. By integrating multimodal search, OfferUp improved search relevance, user engagement, and local discovery, enabling users to search with both text and images for a more intuitive marketplace experience.
❯❯❯❯ Use generative AI on AWS for efficient clinical document analysis: This blog explores how Clario leverages generative AI on AWS to streamline clinical trial document analysis. By integrating Amazon Textract, OpenSearch, Bedrock, and SageMaker, Clario automates parsing, retrieval, classification, and analysis, significantly reducing review time and accelerating drug development while maintaining regulatory compliance.
❯❯❯❯ Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket: This blog explores how Amazon Q Business and Slack enable multi-interface AI assistants for seamless user interaction. By integrating Retrieval Augmented Generation (RAG) with Amazon Kendra and CloudFront, organizations can enhance AI accessibility, provide context-aware responses, and improve productivity without requiring users to switch applications.
❯❯❯❯ No-Code ML Approach to Predict Heart Disease with Amazon SageMaker Canvas: This blog explores how Amazon SageMaker Canvas enables no-code predictive modeling for heart disease detection. By integrating SageMaker Data Wrangler for data preparation and machine learning for classification, healthcare professionals can analyze biomedical data, identify key indicators, and improve early diagnosis without extensive coding expertise.
❯❯❯❯ OpenAI Introducing data residency in Europe: This blog introduces data residency in Europe for ChatGPT Enterprise, ChatGPT Edu, and the API Platform, enhancing data sovereignty compliance for organizations. OpenAI ensures secure, private AI usage with in-region data processing, encryption, and GDPR compliance, empowering businesses and institutions across Europe to integrate AI confidently.
❯❯❯❯ Create a 360-degree master data management patient view solution using Amazon Neptune and generative AI: This blog explores how Amazon Neptune and generative AI enable a 360-degree patient view, integrating electronic health records (EHRs), lab results, prescriptions, and social determinants. By unifying healthcare data, providers can enhance personalized care, improve early disease detection, and support clinical decision-making, leading to better patient outcomes.
❯❯❯❯ Build a brand logo with Imagen 3 and Gemini: This post explores how Imagen 3, Gemini, and the Python Library Pillow work together to help businesses create branded marketing visuals. Using AI-powered image generation, selection, and integration, companies can design unique brand identities and logos tailored to their aesthetic. Learn how this AI workflow can enhance your creative process and deliver high-quality promotional visuals efficiently.
❯❯❯❯ Evaluate your AI agents with Vertex Gen AI evaluation service: Vertex AI Gen AI Evaluation Service is now in public preview, enabling rigorous AI agent assessment. It offers final response and trajectory analysis metrics to improve decision-making. Compatible with LangChain, LangGraph, CrewAI, and Google Cloud services, it supports native agent inference and automatic logging in Vertex AI Experiments.