





















































The Future of AI & Data is Unfolding, Here’s What You Need to Know in DataPro #131!
This week’s edition is packed with AI breakthroughs, data strategy debates, and hands-on tools to elevate your workflow. LlamaIndex is now part of the Gen AI Toolbox for Databases, streamlining AI-powered queries, while AutoGluon makes AutoML more accessible than ever. Meanwhile, the Platform-Mesh vs. Hub-and-Spoke vs. Centralized data team debate heats up, what’s the right structure for scaling AI?
AI observability is the next big frontier, 2026 will mark a turning point as businesses move beyond experimentation to large-scale deployment. We also explore AWS & NVIDIA’s generative AI impact, how EliseAI is revolutionizing housing & healthcare, and why spurious regression in time series analysis remains a critical challenge.
For hands-on practitioners, we’re covering heatmaps for time series, advanced DBeaver SQL tips, and a guide to integrating Google Analytics 4 with Amazon Redshift using AppFlow. Plus, the latest on Elon Musk’s lawsuit against OpenAI, and why the courts aren’t buying his claims.
Keep scrolling for the full scoop!
Cheers,
Merlyn Shelley
Growth Lead, Packt
📚 Limited-Time Offer: 30% Off Bestselling eBooks!
🔹 LlamaIndex is on Gen AI Toolbox for Databases: Google Cloud announced the integration of LlamaIndex with Gen AI Toolbox for Databases, an open-source server simplifying AI tool management for databases. LlamaIndex enhances AI agent development by structuring data and enabling powerful query engines. This collaboration streamlines security, scaling, and deployment for AI applications.
🔹 Building Agentic Application Using Streamlit and Langchain: This guide explains how to build an agentic application using Streamlit and LangChain by integrating AI agents for answering queries, web searches, computations, and data visualization. It leverages Tavily Search, Python REPL, and Llama 3.3 LLM to create an interactive AI-driven workflow.
🔹 Do I Need to Learn MicroPython as a Data Scientist? MicroPython is a lightweight version of Python optimized for microcontrollers and constrained environments. Data scientists can benefit from it for IoT, edge computing, prototyping, and robotics. As AI integrates with hardware, learning MicroPython can enhance data collection and processing capabilities.
🔹 Getting Started with AutoGluon: Your First Steps in Automated Machine Learning: This blog introduces AutoGluon, an open-source AutoML library that simplifies machine learning by automating model selection, hyperparameter tuning, and ensembling. It walks through installation, training a model on the Titanic dataset, evaluating performance, and making predictions, making AutoML accessible for beginners.
🔹 Build Your First Python Extension for VS Code in 7 Easy Steps: This blog provides a step-by-step guide to building a custom Python extension for VS Code. It covers setting up the environment, writing extension logic, testing, packaging, and publishing the extension to the marketplace, making it easy for developers to enhance their IDE.
📚 Limited-Time Offer: 30% Off Bestselling eBooks!
🔹 Reduce cost and improve your AI workloads: This blog provides five practical tips to optimize AI workloads on Google Cloud, covering platform selection, inference startup time, storage solutions, resource reservations, and custom disk images. It helps developers improve efficiency, reduce costs, and streamline AI model deployment and training processes.
🔹 The Impact of GenAI and Its Implications for Data Scientists: Anthropic’s study on Claude.ai conversations reveal how GenAI is transforming workplaces, especially in data science. Rather than replacing jobs, GenAI enhances productivity by augmenting tasks. The blog emphasizes the importance of adaptability, critical thinking, and collaboration skills in the evolving AI landscape.
🔹 Mastering Prompt Engineering with Functional Testing: A Systematic Guide to Reliable LLM Outputs: Functional testing in prompt engineering provides a structured approach to optimizing LLM outputs. By automating validation, running multiple iterations, and using algorithmic scoring, this method enhances reliability, reduces trial-and-error, and ensures consistent, accurate responses for complex AI workflows and tasks.
🔹 Effortless Spreadsheet Normalisation With LLM: Large Language Models (LLMs) automate spreadsheet normalization by analyzing structure, estimating schemas, and generating transformation code. This improves data quality, tidiness, and usability. A structured workflow ensures efficiency, accuracy, and adaptability, enabling seamless machine-readable formats for better insights and analysis.
🔹2026 Will Be the Year of Data + AI Observability: The blog observes that 2026 will be the tipping point for data + AI observability, as enterprise AI moves from experimentation to large-scale deployment. Key challenges include data readiness, system sprawl, feedback loops, and cost concerns. Without a standardized architecture, teams struggle to maintain reliability while integrating structured and unstructured data, AI models, and SaaS systems. Observability must be end-to-end, covering data, system performance, and AI outputs. Organizations with strong foundations in data reliability will gain a competitive edge, while those lacking observability risk inefficiency, poor AI performance, and potential failure in an evolving AI landscape.
🔹 Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow: This blog explains how to ingest data from Google Analytics 4 (GA4) and Google Sheets into Amazon Redshift using Amazon AppFlow. It covers setting up data flows, configuring authentication, and establishing a seamless integration for efficient data analysis in Redshift.
🔹7 Powerful DBeaver Tips and Tricks to Improve Your SQL Workflow: This blog shares seven practical DBeaver tips to enhance your SQL workflow. It covers hidden features like the command palette, custom SQL formatting, column statistics, SQL templates, advanced copying options, and more to improve efficiency when working with databases.
🔹 The court rejects Elon’s latest attempt to slow OpenAI down: This blog discusses the court’s rejection of Elon Musk’s attempt to hinder OpenAI, highlighting his alleged self-interest. It refutes claims about OpenAI’s structure, defends its nonprofit mission, and criticizes Musk’s legal tactics while reaffirming OpenAI’s commitment to long-term public benefit.
🔹 How to Develop Complex DAXExpressions: This blog explores best practices for developing complex DAX expressions in Power BI. It emphasizes understanding requirements, defining logic, and managing filter contexts. Using step-by-step examples, it demonstrates how to build and refine calculations for accurate, scalable data analysis.
🔹 From innovation to impact: How AWS and NVIDIA enable real-world generative AI success. This blog explores how AWS and NVIDIA enable real-world generative AI adoption at scale. It highlights customer success stories across industries, emphasizing infrastructure, optimization strategies, and the role of domain-specific AI in transforming workflows, healthcare, and enterprise applications with reliable, high-performance AI solutions.
🔹 Heatmaps for Time Series: This blog explores how heatmaps visualize time series data, focusing on trends and outliers using non-linear color scales. It recreates the WSJ’s measles heatmap with Python’s Matplotlib, demonstrating data preprocessing, colormap design, and effective visualization techniques for analyzing and communicating complex datasets.
🔹 Platform-Mesh, Hub and Spoke, and Centralised | 3 Types of data team: This blog explores three data team structures, Centralized, Hub-and-Spoke, and Platform Mesh, highlighting their impact on data and AI success. It explains how organizations evolve from centralized control to decentralized collaboration, emphasizing visibility, governance, and efficiency in scaling AI-driven workflows across teams.
🔹 Linear Regression in Time Series: Sources of Spurious Regression. This blog explores the issue of spurious regression in time series analysis, highlighting how autocorrelated errors can lead to misleading statistical results. It explains key concepts like random walks, ARIMA processes, and Durbin-Watson statistics, using Python simulations to illustrate and prevent erroneous conclusions.
🔹 EliseAI improves housing and healthcare efficiency with AI: This blog features an interview with EliseAI CEO Minna Song on how AI improves efficiency in housing and healthcare. It discusses AI adoption strategies, key technical breakthroughs, success metrics, and how the company stays competitive in a rapidly evolving AI landscape.
We’ve got more great things coming your way, see you soon!