





















































Data doesn’t have to be overwhelming. Join our webinar to learn about Data Storytelling and turn complex information into actionable insights for faster decision-making.
Click below to check the schedule in your time zone and secure your spot. Can't make it? Register to get the recording instead.
Sponsored
Happy Friday! 🌟
Welcome to DataPro #110—Your Ultimate Data Science & ML Update! 🚀
In the world of AI and ML, sharp reasoning is the key to smarter decisions and impactful leadership. Our latest insights and strategies will help you boost model accuracy, optimize performance, and cut costs with scalable solutions. Dive in for cutting-edge tips and real-world techniques to elevate your data game.
📚 Book Haven: Top Reads & Author Insights
◽"Data Science for Decision Makers": Elevate your leadership with data science and AI prowess by Jon Howells.
◽"Data Science for IoT Engineers": Unlock data science techniques and ML applications for innovative IoT solutions by P. G. Madhavan.
◽"Bash for Data Scientists": Master shell scripting for data science tasks with Oswald Campesato.
◽"Angular and Machine Learning Pocket Primer": Get the essentials on integrating ML with Angular, also by Oswald Campesato.
◽"AI, ML, and Deep Learning": Explore advanced AI techniques with Oswald Campesato’s practical guide.
🔍 Model Breakdown: Algorithm of the Week
◽Custom Tokenizers for Non-English Languages: Dive into Hugging Face Transformers for multilingual models.
◽Concrete ML Privacy: Secure end-to-end privacy in model training and inference.
◽Multilingual Multi-Agent Chat with LangGraph: Build diverse language chat applications.
◽Approximating Stochastic Functions: Techniques for multivariate output functions.
🪐Trendspotting: Hot Tech Trends
◽Legal Reasoning Engines: How reasoning drives legal arguments.
◽R Clinical Flowcharts with shinyCyJS: Use R for clinical flowcharting.
◽Claude for Enterprise: Explore Anthropic's latest.
◽IBM Quantum Update: Qiskit SDK v1.2 release news!
🛠️ Platform Showdown: ML Tools & Services
◽FastAPI for ML Web Apps: Build powerful web apps with FastAPI.
◽DetoxBench: Benchmarking large language models for fraud and abuse detection.
◽Llama-3.1-Storm-8B & CausalLM/miniG: New Hugging Face models.
◽Build RAG Pipelines: Combine LlamaIndex with Amazon Bedrock for robust pipelines.
📊 Success Stories: ML in Action
◽Ecommerce Data Quality: Strategies for improving data quality.
◽Essential Python Modules: Must-know Python modules for data engineers.
◽Avoiding Data Science Mistakes: Tips to steer clear of common pitfalls.
◽Thomson Reuters Labs: Accelerating AI/ML innovation with AWS MLOps.
◽Galxe & AlloyDB: Cost-cutting success story.
🌍 ML Newsflash: Industry Buzz & Discoveries
◽GPT-4 for Customer Service: Redefining standards with GPT-4.
◽HYGENE: A novel diffusion-based hypergraph generation method.
◽Yi-Coder: Meet a compact yet powerful LLM for code.
◽Guided Reasoning: New approaches to enhance multi-agent system intelligence.
Enjoy the newsletter and have a fantastic weekend! ✨
DataPro Newsletter is not just a publication; it’s a complete toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copyand start transforming your data expertise today!
Calling Data & ML Enthusiasts!
Want to share your insights and build your online reputation? Contribute to our new Packt DataPro column! Discuss tools, share experiences, or ask questions. Gain recognition among 128,000+ data professionals and boost your CV. Simply reply with your Google Docs link or use our feedback form. Whether you’re looking for visibility or a discreet approach, we’re here to support you.Share your content today and engage with our vibrant community! We’re excited to hear from you!Take our weekly survey and get a free PDF copy of our best-selling book,"Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!
200+ hours of research on AI-led career growth strategies & hacks packed in 3 hours
The only AI Crash Course you need to master 20+ AI tools, multiple hacks & prompting techniques in just 3 hours
You’ll save 16 hours every week & find remote jobs using AI that will pay you upto $10,000/mo
Sponsored
Did you know? “Books are the quietest, most constant friends, holding the world’s treasured wisdom. They offer gentle guidance and timeless lessons, passing their rich inheritance from one generation to the next.”
We’re thrilled to bring you this week’s must-have new releases, straight from the experts to your bookshelf! Whether you're eager to enhance your skills or explore new horizons, now is the perfect moment to add these invaluable resources to your collection.
For a limited time,enjoy 30% off all eBooks at Packtpub.com. These books are thoughtfully crafted by industry insiders with hands-on experience, offering unique insights you won’t find anywhere else.
Don’t let these Packt-exclusive deals slip away—seize the opportunity to learn from the best at an unbeatable price!
Data Science for Decision Makers: Enhance your leadership skills with data science and AI expertise
Struggling to bridge the gap between data science and business leadership? Our new book is here to help!
What you’ll gain:
✔️ Master statistics and ML to interpret models and drive decisions.
✔️ Identify AI opportunities and oversee data projects from start to finish.
✔️ Empower teams to tackle complex problems and build AI solutions.
Elevate your leadership and make data work for you! Get the book now—just $24.99, down from $35.99!
By Mercury Learning and Information, P. G. Madhavan
Dive into our new book, crafted for engineers, physicists, and mathematicians eager to bridge the gap between theory and practice!
What’s inside:
✔️ Integrate systems theory and machine learning seamlessly.
✔️ Apply practical solutions like digital twins to real-world problems.
✔️ Progress from basics to advanced techniques with ease.
Whether you're tackling IoT challenges or modeling complex systems, this workbook with MATLAB code will guide you every step of the way. Get the eBook now for just $34.98, down from $49.99! Elevate your skills and tackle IoT and complex systems with confidence.
Bash for Data Scientists: A Comprehensive Guide to Shell Scripting for Data Science Tasks
By Mercury Learning and Information, Oswald Campesato
Unlock the power of Bash for your data science projects with our latest book!
What’s inside:
✔️ Master Bash for efficient data processing with practical, real-world examples.
✔️ Learn to integrate with Pandas and databases for advanced data handling.
✔️ Get hands-on with grep, sed, and awk to clean and manage datasets effectively.
Grab the eBook now for just $37.99, originally $54.99! Elevate your scripting skills and streamline your data tasks today!
By Mercury Learning and Information, Oswald Campesato
Ready to elevate your Angular apps with machine learning? Our latest Pocket Primer has you covered!
What’s inside:
✔️ Seamless integration of Angular and machine learning using TensorFlow.js and Keras.
✔️ Practical, step-by-step tutorials and real-world examples.
✔️ Comprehensive coverage of Angular basics, UI development, and machine learning models.
Get the eBook now for just $27.98, originally $39.99! Transform your skills and build sophisticated applications with ease.
By Mercury Learning and Information, Oswald Campesato
Discover the world of AI with our new book, perfect for expanding your skills from basics to advanced techniques!
What’s inside:
✔️ In-depth coverage of AI, machine learning, and deep learning.
✔️ Practical examples and hands-on tutorials with Keras, TensorFlow, and Pandas.
✔️ Explore classifiers, deep learning architectures, NLP, and reinforcement learning.
Get the eBook now for just $41.98, down from $59.99! Transform your understanding and apply these cutting-edge concepts in real-world scenarios.
➽ How to Create a Custom Tokenizer for Non-English Languages with Hugging Face Transformers? This blog explains the importance of tokenization in NLP and provides a detailed guide on training a custom tokenizer for non-English languages using Hugging Face libraries, ensuring improved model performance for diverse datasets.
➽ End-to-end privacy for model training and inference with Concrete ML: This blog explores how to achieve end-to-end privacy in collaborative machine learning using federated learning and fully homomorphic encryption (FHE). It details a demo with scikit-learn and Concrete ML for secure model training and inference.
➽ Building a Multilingual Multi-Agent Chat Application Using LangGraph: This blog details the development of a multilingual chat application to bridge language barriers in workplaces. It covers building features using LangChain and LangGraph, including agent design, translation workflows, and deployment with FastAPI.
➽ Approximating Stochastic Functions with Multivariate Outputs: The article describes an enhanced method for training generative machine learning models, named Pin Movement Training (PMT). It extends the original PMT, which approximated single-output stochastic functions, to handle multiple-output functions. The approach uses a neural network and a hypersphere-based Z-space to map and approximate multidimensional outputs, like autoencoders but with uniform sampling for better results.
Developing for iOS? Setapp's 2024 report on the state of the iOS market in the EU is a must-see
How do users in the EU find apps? What's the main source of information about new apps? Would users install your app from a third-party app marketplace?
Set yourself up for success with these and more valuable marketing insights in Setapp Mobile's report iOS Market Insights for EU.
Sponsored
➽ Reasoning as the Engine Driving Legal Arguments: The article explores how tribunals assess evidence in legal cases, focusing on three key stages: determining evidence relevance, evaluating trustworthiness, and weighing competing evidence. It highlights the role of "reasoning sentences" in explaining decision-making and discusses machine learning techniques for identifying these sentences in legal documents.
➽ Use R to build Clinical Flowchart with shinyCyJS: The blog discusses creating Clinical Flowcharts for visualizing clinical trials, focusing on various methods, particularly using R. It details challenges and solutions in drawing flowcharts, including software limitations and customizations with shinyCyJS for precise visual representation.
➽ Claude for Enterprise \ Anthropic: The Claude Enterprise plan now offers enhanced features for secure collaboration, including a 500K context window, GitHub integration, and advanced security measures. This allows teams to leverage internal knowledge while safeguarding data.
➽ IBM Quantum Computing - Release news: Qiskit SDK v1.2 is here! Qiskit SDK v1.2 introduces major updates, including Rust-based circuit infrastructure for faster performance, improved synthesis and transpilation, and new features. It also ends support for Python 3.8, requiring Python 3.9 or later.
➽ Using FastAPI for Building ML-Powered Web Apps: This tutorial demonstrates building a machine learning web app using FastAPI and Jinja2 templates. It covers creating a prediction API for a Random Forest model and integrating it with a web interface for user interaction.
➽ DetoxBench: Benchmarking large language models for multitask fraud & abuse detection. This paper introduces a benchmark suite to evaluate large language models (LLMs) for detecting and mitigating fraud and abuse in various real-world scenarios, highlighting performance gaps and offering a tool for improving LLMs in high-stakes applications.
➽ Llama-3.1-Storm-8B · Hugging Face: The Llama-3.1-Storm-8B model outperforms Meta’s Llama-3.1-8B-Instruct and Hermes-3 across multiple benchmarks. It improves instruction-following, QA, reasoning, and function-calling via self-curation, fine-tuning, and model merging techniques.
➽ CausalLM/miniG · Hugging Face: The miniG model has two versions: standard and "alt," the latter trained with masked context to improve stability. Trained on a large dataset with text and image support, it performs best with Hugging Face Transformers for minimal performance degradation.
➽ Build powerful RAG pipelines with LlamaIndex and Amazon Bedrock: This blog explores using Retrieval Augmented Generation (RAG) techniques to enhance large language models (LLMs) by integrating external knowledge sources. It discusses building advanced RAG pipelines with LlamaIndex and Amazon Bedrock, covering topics like query routing, sub-question handling, and stateful agents.
➽ Improving ecommerce data quality: This blog details how Lowe’s enhanced its website search accuracy by fine-tuning OpenAI’s GPT-3.5 model. By applying advanced prompt engineering, Lowe’s improved product data quality, reduced associate workload, and achieved a 20% accuracy boost in product tagging.
➽ 10 Built-In Python Modules Every Data Engineer Should Know: This article highlights essential Python modules for data engineering, including tools for file management, data serialization, database interaction, and text processing. It covers how modules like `os`, `pathlib`, `shutil`, and `csv` can enhance data engineering tasks.
➽ 5 Common Data Science Mistakes and How to Avoid Them: This blog outlines five common mistakes in data science projects, such as unclear objectives, neglecting basics, poor visualizations, lack of feature engineering, and overemphasizing accuracy. It offers practical solutions to avoid these pitfalls and improve project outcomes.
➽ How Thomson Reuters Labs achieved AI/ML innovation at pace with AWS MLOps services? This post details how Thomson Reuters Labs developed a standardized MLOps framework using AWS SageMaker to streamline ML processes. It highlights the creation of TR MLTools and MLTools CLI to enhance efficiency, standardize practices, and accelerate AI/ML innovation.
➽ Galxe migrates to AlloyDB for PostgreSQL, cutting costs by 40%: This blog explains how Galxe is addressing Web3 challenges by using AlloyDB for PostgreSQL and Google Cloud services. It highlights Galxe's innovations in decentralized identity, gamified user experiences, and scalable infrastructure to enhance Web3 adoption and performance.
➽ Using GPT-4 to deliver a new customer service standard: Ada, valued at $1.2B with $200M in funding, is leading a $100B shift in customer service with its AI-native automation platform. Since its 2016 inception, Ada has doubled resolution rates using OpenAI’s GPT-4, achieving up to 80% resolution and setting new industry standards for effectiveness.
➽ HYGENE: A Diffusion-based Hypergraph Generation Method. The paper introduces HYGENE, a diffusion-based method for generating realistic hypergraphs. Using a bipartite representation, it iteratively expands nodes and hyperedges through a denoising process, effectively modeling complex hypergraph structures. This is the first deep learning approach for hypergraph generation.
➽ Meet Yi-Coder: A Small but Mighty LLM for Code. Yi-Coder is an open-source series of coding-focused LLMs, available in 1.5B and 9B parameter sizes. It offers advanced coding performance with up to 128K token context modeling, surpassing models like CodeQwen1.5 and DeepSeek-Coder, and excels in benchmarks such as LiveCodeBench and HumanEval.
➽ Guided Reasoning: A New Approach to Improving Multi-Agent System Intelligence. Gregor Betz from Logikon AI introduces Guided Reasoning, a multi-agent system where a guide agent helps client agents improve their reasoning through structured methods. This approach, using argument maps and pros/cons evaluations, aims to enhance clarity and accuracy in AI decision-making and explanations.
See you next time!