Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

PythonPro

27 Articles
Divya Anne Selvaraj
11 Feb 2025
Save for later

PythonPro #62: Python 3.14’s New Interpreter: 9–15% Faster; Pydantic.ai Agent Framework; and Unified TTS Wrapper

Divya Anne Selvaraj
11 Feb 2025
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#62Python 3.14’s New Interpreter: 3~30% Faster; Pydantic.ai Agent Framework; and Unified TTS WrapperHi ,Welcome to a brand new issue of PythonPro!Here are today's News Highlights: py3-TTS-Wrapper 0.9.18 simplifies speech synthesis across AWS, Google, Azure, IBM, and ElevenLabs; Pydantic.ai beta framework supports OpenAI, Anthropic, Gemini, and real-time debugging; and Python 3.14 promises a new interpreter with 3~30% speed boosts.My top 5 picks from today’s learning resources:How I Built a Deep Learning Library from Scratch Using Only Python, NumPy & Math🔢Django PDF Actions: How to Export PDF from Django Admin📑From Scratch to Masterpiece: The VAE’s Journey to Generate Stunning Images🧑‍🎨The Best Pre-Built Toolkits for AI Agents🕵️‍♂️Permutation Generation in PyTorch on GPU: Statistic Based Decision Rule forrandpermvs.argsortandrand ⚙️And, in From the Cutting Edge, we introduce HintEval, a Python library that streamlines hint generation and evaluation by integrating datasets, models, and assessment tools, providing a structured and scalable framework for AI-driven question-answering systems.Stay awesome!Divya Anne SelvarajEditor-in-ChiefPS: We're conducting market research to better understand the evolving landscape of software engineering and architecture – including how professionals like you learn, grow and adapt to the impact of AI.We think your insights would be incredibly valuable, and would love to hear what you have to say in a quick 1:1 conversation with our team.What's in it for you?✅ A brief 20–30 minute conversation at a time that’s convenient for you✅ An opportunity to share your experiences and shape the future of learning✅ A free credit to redeem any eBook of your choice from our library as a thank-youHow to Participate:Schedule a quick call at your convenience using the link provided after the form:https://forms.office.com/e/Bqc7gaDCKqLooking forward to speaking with you soon!Thank you,Team Packt.Note: Credits may take up to 15 working days to be applied to your accountSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsUnified TTS Interface: py3-TTS-Wrapper 0.9.18 Simplifies Speech Synthesis Across APIs: The library simplifies integration across services like AWS Polly, Google, Microsoft Azure, IBM Watson, and ElevenLabs.Pydantic.ai:Pythonagent framework from Pydantic team: Inspired by FastAPI’s success, the framework (in early beta) supports multiple AI models (OpenAI, Anthropic, Gemini, etc.), real-time debugging via Pydantic Logfire, and more.Python 3.14 Lands A New Interpreter With 3~30% Faster Python Code: alpha 5 is slated for release today, and word is, Python may be receiving a new interpreter with a 9-15% speedup on PyPerformance benchmarks.💼Case Studies and Experiments🔬Let's compile Python 1.0: Details the process of compiling Python 1.0 using podman and an old Debian container and reveals that despite its age, 1.0 had high-level data structures, process control, file handling, and more.How I Built a Deep Learning Library from Scratch Using Only Python, NumPy & Math: Explains the motivation, abstraction layers, and technical design, and delves into comparisons with PyTorch, covering key components like tensors, autograd, neural network modules, and optimizers.📊AnalysisWebAssembly and Python Ecosystem: Explores the current state of Python in WASM, its challenges, available tools, and performance comparisons with Rust, Go, and Docker for serverless computing.Data Analysis Showdown: Comparing SQL, Python, and esProc SPL: Compares SQL, Python, and esProc SPL for various data analysis tasks, including session counting, player scoring, and user retention.🎓Tutorials and Guides🤓Choose Your Fighter • Let's Play (#1 in Inheritance vs Composition Pair): Provides a step-by-step tutorial on building a simple shooting game using Python's turtle module, touching on OOP concepts, particularly inheritance.From Scratch to Masterpiece: The VAE’s Journey to Generate Stunning Images: Covers key VAE components—encoder, decoder, reparameterization trick, and loss function—and demonstrates how to train a VAE on MNIST to generate synthetic images.Installing and using DeepSeek AI on a Linux system: Covers CUDA setup, Ollama installation, model download, Chatbox integration, and Python scripting, highlighting the advantages of running AI models offline.Build Your Own DeepSeek-R1 ChatBot That Can Search Web: Covers Ollama installation, DeepSeek model setup, Docker-based SearXNG search integration, and Gradio-based UI creation, enabling offline AI interactions with real-time web augmentation.Data Analysis with Python Pandas and Matplotlib (Advanced): Coversusing Python, Pandas, and Matplotlib, covering data manipulation, importing CSV files, filtering, grouping, and visualization.Django PDF Actions: How to Export PDF from Django Admin: Introduces a package that simplifies exporting data to PDFs from Django Admin, addressing challenges like multilingual support, layout consistency, and styling.Elisp Cheatsheet for Python Programmers: Maps common Python constructs to their Elisp equivalents, covering collections, looping, file I/O, string operations, and data structures like lists, vectors, and hash tables.🔑Best Practices and Advice🔏The One About the £5 Note and the Trip to the Coffee Shop • The Difference Between `is` and `==` in Python: Explains how Python handles equality and identity, when to use is vs. ==, and how to define custom equality rules in classes using __eq__().The Best Pre-Built Toolkits for AI Agents: Explores toolkits such as CrewAI, LangChain, Agno, and Vercel AI SDK, which allow developers to extend AI agent capabilities.LangChain vs LlamaIndex: designing RAG and choosing the right framework for your project: Demonstrates side-by-side implementations of a chatbot using both frameworks, integrating vector databases (Qdrant), OpenAI embeddings, and PDF processing.Permutation Generation in PyTorch on GPU: Statistic Based Decision Rule forrandpermvs.argsortandrand : Analyzes the trade-offs between torch.randperm() and torch.argsort(torch.rand()) and introduces a statistical decision rule to determine when batching with argsort(rand()) is acceptable.Stop Creating Bad DAGs — Optimize Your Airflow Environment By Improving Your Python Code: Coversbest practices like limiting top-level code and avoiding XComs and Variables, and introduces airflow-parse-bench, an open-source tool for measuring and comparing DAG parse times.🔍From the Cutting Edge: HintEval: A Comprehensive Framework for Hint Generation andEvaluation for Questions💥In "HintEval: A Comprehensive Framework for Hint Generation andEvaluation for Questions," Mozafari et al. introduce a Python library for hint generation and evaluation in question-answering tasks. The framework consolidates scattered resources and provides a unified toolkit for developing and assessing hints.ContextThe integration of LLMsin Information Retrieval (IR) and Natural Language Processing (NLP) has improved information access but this can hinder critical thinking. Hint Generation mitigates this by guiding users towards answers rather than providing them outright, while Hint Evaluation ensures hints remain effective without revealing answers.Existing datasets and tools for hint research are fragmented and often incompatible, making comparisons difficult. HintEval addresses this by integrating multiple datasets, hint generation methods, and evaluation metrics into a single framework.Key Features of HintEvalAccess to preprocessed datasets: Provides a collection of preprocessed datasets, including TriviaHG, WikiHint, HintQA, and KG-Hint, which are designed for fact-based question answering.Support for two hint generation models: Includes an Answer-Aware model, which generates hints based on a known answer, and an Answer-Agnostic model, which generates hints without requiring an answer.Comprehensive hint evaluation system: Includes five evaluation metrics—relevance, readability, convergence, familiarity, and answer leakage—to ensure hints remain useful, clear, and non-revealing.Integration with advanced language models: Supports state-of-the-art LLMs such as GPT-4, LLaMA, Gemini, and others, allowing researchers to experiment with different hint-generation techniques.Freely available and open-source: Accessible on GitHub and PyPI, with detailed documentation and example implementations to facilitate ease of use.What This Means for YouHintEval is useful for researchers, developers, and educators working with AI-driven question-answering systems. Researchers can use it to test and compare models, developers can integrate smart hints into their applications, and educators can create interactive learning experiences that encourage critical thinking.Examining the DetailsHintEval simplifies working with hints by offering a structured approach to generating, evaluating, and testing them. It allows users to load preprocessed datasets or create custom ones, ensuring flexibility across different research needs. The framework also makes it easy to run hint evaluations at scale, with options to extend its capabilities using custom models and methods. Designed to work locally or in the cloud, it integrates smoothly with modern AI workflows, making it adaptable for a range of NLP and machine learning applications.You can learn more by reading the entire paper or accessing the library on GitHub.And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 189

Divya Anne Selvaraj
11 Mar 2025
Save for later

PythonPro #66: Visualizing NLP with displaCy, Python’s ESP32-S3 Integration, and LM Studio’s AI SDKs

Divya Anne Selvaraj
11 Mar 2025
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#66Visualizing NLP with displaCy, Python’s ESP32-S3 Integration, and LM Studio’s AI SDKsHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the book, Mastering spaCy, whichintroduces displaCy, spaCy's built-in visualizer, demonstrating how to visualize syntactic structures and named entities through interactive demos, local servers, and Jupyter notebooks.News Highlights: LM Studio launches open-source SDKs for Python and TypeScript with an .act() API for AI tasks; Python now runs on ESP32-S3 with NuttX RTOS, enabling direct hardware interaction; and a malicious Python package ‘set-utils’ targeted Ethereum wallets, exfiltrating keys via the Polygon RPC endpoint.My top 5 picks from today’s learning resources:Performance of the Python 3.14 tail-call interpreter🧐Python Data Processing in Microsoft Fabric — End-to-End Transformation and Visualization📊Isolating Python and Jupyter using firejail🔥Guarantee a Locked & Reproducible Environment with Every Python Run🔒Exposing string types to maximize user happiness🌀And, in From the Cutting Edge, we introduce optimizn, a Python library that helps users easily build customized optimization algorithms—such as simulated annealing and branch and bound—for efficiently solving complex combinatorial optimization problems with features like continuous training and customisation capabilities.Stay awesome!Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsIntroducinglmstudio-pythonandlmstudio-js : LM Studio's open-source SDKs enable developers to integrate advanced AI capabilities into Python and TypeScript applications, featuring an innovative .act() API for agent-oriented tasks and support for complex system management features.Running Python on the ESP32-S3 using NuttX RTOS is now possible: This integration supports complex Python applications using a POSIX-compliant interface, allowing direct interaction with hardware components.Python package ‘set-utils’ targets Ethereum wallets: Masquerading as a utility for Python sets, the package targeted developers using Python-based wallet libraries like 'eth-account,' exfiltrated keys via the Polygon RPC endpoint, and remains a threat even after uninstallation.💼Case Studies and Experiments🔬Serverless Python Websites: Details a hobbyist's journey to convert a Strava activity visualization project to a serverless architecture using AWS Lambda, highlighting motivations such as cost-efficiency, the desire to use existing Python skills, and a personal learning goal.The 'eu' in eucatastrophe – Why SciPy builds for Python 3.12 on Windows are a minor miracle: Explains how the development overcomes complex challenges in Python packaging and compiler limitations through the collective effort and strategic foresight of the SciPy development community.📊AnalysisPerformance of the Python 3.14 tail-call interpreter: Critically evaluates the performance of Python 3.14's tail-call interpreter, revealing that initial gains were largely due to a regression in LLVM 19, with actual improvements ranging only from 1-5%.Python monorepo with uv and pex: Discusses the transition to a serverless architecture for a Python-based hobby project using AWS Lambda, aimed at reducing costs and complexities associated with traditional server setups.🎓Tutorials and Guides🤓Python Data Processing in Microsoft Fabric — End-to-End Transformation and Visualization: Provides step-by-step instructions on using Python for data transformation and visualization within Microsoft Fabric, including how to save data as Delta tables, perform SQL transformations with DuckDB, and integrate with Power BI.Isolating Python and Jupyter using firejail: Provides a step-by-step tutorial on how to set up Python and Jupyter using Firejail for enhanced security, focusing on the technical setup and execution.📖Open Source Book | PySDR: A Guide to SDR and DSP using Python by Dr. Marc Lichtman: Features detailed explanations supported by diagrams, animations, and code examples, making complex concepts accessible.Kubernetes End-to-End Testing Using Testcontainers and Python: Introduces Testcontainers, which enables realistic, automated, and efficient testing environments by simulating Kubernetes clusters.Integrate Model Context Protocol (MCP) Servers in Python LLM Code: Provides a guide to building a Python MCP client for dynamic, context-aware application development.Using Python to Measure Immigration Trends: Shows how to use Python and the American Community Survey to analyze immigration trends in a town, through different periods.OTP in Django Without Saving it in the Database (Using Redis): Provides step-by-step instructions on implementing an OTP system in Django using Redis for caching, without storing OTPs in the database.🔑Best Practices and Advice🔏Loss of Words • A Nested Journey (# 1 in The `itertools` Series • `product()`): Eplores the creative use of Python's itertools.product() and itertools.starmap() in a narrative about Yteria, who must adapt her coding strategies after losing the ability to use the word "for" due to a mugging.Guarantee a Locked & Reproducible Environment with Every Python Run: Introduces Fetter, a command-line tool designed to enforce locked and reproducible Python environments by validating dependencies before every Python execution.Exposing string types to maximize user happiness: Demonstrates how using string-backed Enums or StrEnum can provide better error handling, clearer documentation, and maintainable code, while still supporting string inputs for ease of use.The features of Python's help() function: Explores Python's built-in help() function, detailing its ability to provide documentation for functions, modules, classes, and other objects directly within the Python environment.Explicitly About Implicit Functions: Through examples like parabolas and circles, the author explains how implicit functions can define complex shapes that explicit functions cannot easily represent.🔍From the Cutting Edge: optimizn—a Python Library for Developing Customized OptimizationAlgorithms 💥In "optimizn: a Python Library for Developing Customized OptimizationAlgorithms," Sathiya and Pandey from Azure Core Insights Data Science, introduces a Python library designed to facilitate the creation of tailored optimization algorithms for combinatorial problems.ContextCombinatorial optimization involves selecting the best possible arrangement or ordering of a finite set of elements, crucial for solving complex operational and logistical issues in various industries, including cloud computing. Such problems are often classified as NP-hard, meaning they are computationally intensive, and no known methods solve them efficiently within a reasonable timeframe.Traditional algorithms might either fail to find optimal solutions or require excessive computational resources, making heuristic approaches essential. Common optimization paradigms used in these contexts include simulated annealing—a method inspired by the controlled cooling process of metals, which iteratively improves a solution while occasionally accepting less optimal steps to avoid getting trapped in local minima—and branch and bound, an approach systematically examining subsets of possible solutions by branching them into more constrained versions and pruning those unlikely to yield optimal solutions. These methods are effective in producing good, near-optimal solutions in practical, resource-constrained settings.The optimizn library introduced in this study addresses the lack of comprehensive Python tools for effectively implementing these techniques, particularly for combinatorial and constrained optimization scenarios.Key Features of optimiznOptimization Paradigms: Supports general optimization approaches, including simulated annealing and both traditional and look-ahead branch and bound.Continuous Training: Allows algorithms to run periodically, retain results and parameters from previous runs, and progressively achieve closer-to-optimal solutions.Customization: Offers customizable components enabling users to tailor optimization algorithms to their specific problems.Constraint Handling: Capable of managing both constrained and unconstrained optimization problems effectively.What This Means for YouThe optimizn library is particularly relevant for data scientists, engineers, and researchers dealing with complex optimization scenarios. Practically, it addresses real-world combinatorial optimization problems efficiently, especially where resources and computational time are limited or intermittent. Industries dealing with large-scale, complex decision-making tasks, such as cloud computing environments, could benefit significantly from the efficiency and customizability of optimizn.Examining the DetailsThe authors validated optimizn's performance through experiments involving two notable NP-hard problems: the symmetric traveling salesman problem and Azure's "environment design" problem. In both cases, simulated annealing and various branch and bound algorithms were implemented and compared against existing solutions, such as python-tsp.For the traveling salesman problem, optimizn’s depth-first-best-first branch and bound approach achieved substantial improvements from initial solutions, closely following the python-tsp library's simulated annealing implementation, indicating its competitive performance. For Azure's environment design problem, optimizn's simulated annealing algorithms consistently produced superior solutions compared to other tested methods.Users can implement optimizn by creating subclasses for specific problems, defining essential methods like get initial solution, cost, and problem-specific methods such as next_candidate, branch, and lbound. These implementations allow precise customization according to problem constraints and input types. The library's architecture is clearly structured, offering methods that encapsulate standard optimization logic while allowing users significant flexibility in algorithm customisation.You can learn more by reading the entire paper or accessing the library on GitHub.🧠 Expert insight💥Here’s an excerpt from “Chapter 1: Getting Started with spaCy” in the book, Mastering spaCy by Déborah Mesquita and Duygu Altınok, published in February 2025.Visualization with displaCyVisualization is the easiest way to explain some concepts to your colleagues, your boss, and any technical or non-technical audience. Visualization of language data is specifically useful and allows you to identify patterns in your data ata glance.There are many Python libraries and plugins such asmatplotlib,seaborn,tensorboard, and so on. spaCy also comes with its own visualizer – displaCy. In this subsection, you’ll learn how to spin up a displaCy server on your machine, in a Jupyter notebook, and in a web application. We’ll start by exploring the easiest way – using displaCy’sinteractive demo.Getting started with displaCyGo ahead and navigate tohttps://demos.explosion.ai/displacyto use the interactive demo. Enter your text in theText to parsebox andthen click the search icon on the right to generate the visualization. The result might look likeFigure 1.3.Figure 1.3 – displaCy’s online demoThe visualizer performs two syntactic parses,POS tagging, and adependency parser. We’ll explore them in the upcoming chapters. For now, just think of the result as a sentence structure. Now, let’s see how to visualize named entitieswith displaCy.Entity visualizerdisplaCy’s entity visualizer highlights the named entities in your text. The online demo is athttps://demos.explosion.ai/displacy-ent. We haven’t gone through named entities yet, but you can think of them as proper nouns for important entities such as people’s names, company names, dates, city and country names, andso on.The online demo works similar to the syntactic parser demo. Enter your text into the textbox and hit theSearchbutton.Figure 1.4showsan example.Figure 1.4 – displaCy’s named entity visualizerThe right side contains checkboxes for entity types. You can check the boxes that match your text type such as, for instance,MONEYandQUANTITYfor a financial text. Just like in the syntactic parser demo, you can choose from theavailable models.Using displaCy with pure PythonThe displaCy visualizers are integrated into the core library. This means that you can start using displaCy immediately after installing spaCy on your machine. Let’s go throughsome examples:First, let’s import spaCy and displaCy, load the model, and processa sentence:import spacyfrom spacy import displacynlp = spacy.load('en_core_web_sm')doc= nlp("One step forward and you're no longer in the same place")Now, we can use the displaCyserve()method to run a server locally, specifying that we want to see the dependencyparser visualization:displacy.serve(doc, style='dep')After firing up this code, you should see a response from displaCy as inFigure 1.5.Figure 1.5 – Firing up displaCy locallyWe can see thathttp://0.0.0.0:5000is the local address where displaCy renders your visualization. You can open this URL in a browser to see it. When you want to shut down the server, you can pressCtrl+Cto shut down the displaCy server and go back to thecommand line.If you wish to use another port or if you get an error because port5000is already in use, you can use theportparameter of displaCy with another port number. Replacing the last line of the preceding code block with the following linewill suffice:displacy.serve(doc, style='dep', port=5001)Here, we provide the port number5001explicitly. In this case, displaCy will render the graphicsonhttp://0.0.0.0:5001.Creating an entity recognizer visualization is done similarly. We passentto the style parameter insteadofdep:import spacyfrom spacy import displacynlp = spacy.load('en_core_web_sm')doc= nlp("spaCy's main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion")displacy.serve(doc, style='ent')Let’s move on to other platforms we can use for displayingthe results.Using displaCy with pure PythonJupyter Notebook is an important part of daily data science work. To display the visualizations in Jupyter notebooks, we can change theserve()method torender(). The rest of the code is all the same.Figure 1.6shows the result of runningdisplaCyin aJupyter notebook.Figure 1.6 – displaCy rendering results in a Jupyter notebookIf you wish to find out how to use different background images, background colors, and fonts, you can visit the displaCy documentationathttp://spacy.io/usage/visualizers.Mastering spaCy was published in February 2025. Packt library subscribers can continue reading the entire book for free.Get the eBook for $35.99 $24.99Get the Print Book for $44.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 177

Divya Anne Selvaraj
05 Nov 2024
Save for later

PythonPro #54: Global Forecasting Models, Python Overtakes JavaScript, and Hidden Python Libraries

Divya Anne Selvaraj
05 Nov 2024
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#54Global Forecasting Models, Python Overtakes JavaScript, and Hidden Python LibrariesHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, Modern Time Series Forecasting with Python - Second Edition, which explains the shift from traditional, isolated time series models to global forecasting models, which leverage related datasets to enhance scalability, accuracy, and reduce overfitting in large-scale applications.News Highlights: Python has overtaken JavaScript on GitHub, driven by its role in AI and data science, per GitHub's Octoverse 2024 report; and IBM’s Deep Search team has released Docling v2, a Python library for document extraction with models on Hugging Face.Hidden Python Libraries That Will Blow Your Mind🌟Python threading and subprocesses explained🧵Books are Datasets: Mapping 12 Sacred Texts with Python and D3.js📖Python Closures: Common Use Cases and Examples➿ChatGPT-4o cannot run proper Generalized Additive Models currently— but can correctly interpret results from R🧠And, today’s Featured Study, introducesSafePyScript, a machine-learning-based tool developed by researchers at the University of Passau, Germany, for detecting vulnerabilities in Python code.Stay awesome!Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPython has overtaken JavaScript on GitHub: GitHub’s Octoverse 2024 report reveals Python as the most popular language on GitHub, driven by its role in AI, data science, and machine learning. Jupyter Notebooks usage has also surged.Docling: Document extraction Python library from the Deep Search team at IBM: IBM’s Deep Search team released Docling v2, an MIT-licensed Python library for document extraction with custom models available on Hugging Face.💼Case Studies and Experiments🔬Programmed differently? Testing for gender differences in Python programming style and quality on GitHub: The study confirms that programming style can predict gender but these differences do not impact code quality.Tune your guitar with python: Demonstrates using Python’s sounddevice and matplotlib modules to create a real-time guitar tuner, where a live spectrogram identifies key bass guitar note frequencies for tuning, with a custom interface..📊AnalysisPackage compatibility tracker: Python 3.13 free-threading and subinterpreters: This compatibility tracker shows that 83% of the 500 most downloaded Python packages are compatible with Python 3.13’s new free-threading feature, while 73% support importing without GIL in Docker tests.Hidden Python Libraries That Will Blow Your Mind: Introduces six powerful yet lesser-known Python libraries including Streamlit for quick app-building, PyWhatKit for task automation, and Typer for simplified CLIs.🎓Tutorials and Guides🤓Python threading and subprocesses explained: Details Python’s threading and multiprocessing methods to handle parallel tasks, using thread/process pools for I/O-bound and CPU-bound tasks.Tiny GraphRAG in 1000 lines ofPython:introduces a simplified, local implementation of the GraphRAG algorithm, using a graph-based structure to enhance RAG for more contextually aware information retrieval.Building AI chatbots with marimo: Covers how to create adaptable chatbots capable of sophisticated tasks, including visualizing data and processing diverse inputs.Explanation of Vision Transformer with implementation: Provides an in-depth explanation and step-by-step implementation of Vision Transformer, covering key concepts such as embedding patches with code examples.PyD-Net: Advancing Real-Time Depth Estimation for Mobile and Embedded Devices: Showcases PyD-Net's practical applications across autonomous navigation, augmented reality, assistive technology, and design.Explore Solvable and Unsolvable Equations with Python: Delves into solving equations in Python, discussing when closed-form solutions are feasible and when numerical methods become necessary.Books are Datasets: Mapping 12 Sacred Texts with Python and D3.js: Explores using Python and D3.js to analyze and visualize 12 major sacred texts as datasets, showcasing text-processing techniques to reveal connections and patterns within and between these religious texts.🔑Best Practices and Advice🔏Variables in Python: Usage and Best Practices: Covers variable creation, dynamic typing, expressions, and best practices for naming and using variables in various scopes along with parallel assignments, and iterable unpacking.The Python Square Root Function: Details Python’s sqrt() function from the math module, explaining its use for calculating square roots of positive numbers and zero, while raising errors for negative inputs.Python Closures: Common Use Cases and Examples: Explains Python closures, inner functions that capture variables from their surrounding scope, enabling state retention, function-based decorators, and encapsulation.Python ellipses considered harmful: Argues that using ellipses (... ) to declare unimplemented methods in Python’s abstract classes can lead to hidden errors, and advocates for raise NotImplementedError instead.ChatGPT-4o cannot run proper Generalized Additive Models currently— but it can correctly interpret results from R: Highlights limitations of ChatGPT-4o in advanced statistical modeling, informing Python users about workarounds and considerations when working with similar tools.🔍Featured Study: SafePyScript💥In "SafePyScript: A Web-Based Solution for Machine Learning-Driven Vulnerability Detection in Python," Farasat et al., researchers from the University of Passau,Germany, introduce SafePyScript, a machine-learning-based web tool designed to detect vulnerabilities in Python code.ContextIn software development, identifying vulnerabilities is a major concern due to the security risks posed by cyberattacks. Vulnerabilities, or flaws in code that can be exploited by attackers, require constant detection and correction. Traditionally, vulnerability detection relies on:>Static Analysis: This rule-based approach scans code for known vulnerability patterns but often results in high false positives.>Dynamic Analysis (Penetration Testing): This approach tests code in a runtime environment, relying on security experts to simulate potential attacks, making it resource-heavy and dependent on professional expertise.Machine learning offers a data-driven alternative, enabling automated vulnerability detection with improved accuracy. Despite its popularity, Python lacks dedicated machine-learning-based tools for this purpose, which SafePyScript aims to provide. SafePyScript leverages a specific machine learning model, BiLSTM (Bidirectional Long Short-Term Memory), and the ChatGPT API to not only detect but also propose secure code, addressing this gap for Python developers.Key Features of SafePyScriptBiLSTM Model for Vulnerability Detection: Trained on word2vec embeddings, this model has achieved an accuracy of 98.6% and ROC of 99.3% for Python code vulnerabilities.Integration with ChatGPT API: SafePyScript uses ChatGPT (Turbo 3.5) to analyse and generate secure alternatives for vulnerable code.Common Vulnerabilities Addressed: These include SQL Injection, Cross-Site Scripting (XSS), Remote Code Execution, Cross-Site Request Forgery (XSRF), and Open Redirect.User-Friendly Interface: Built using Django (backend) and HTML, CSS, and JavaScript with Ajax (frontend) for a responsive, accessible user experience.Report Generation: Users can download detailed reports on vulnerabilities detected in their code, making it easier to track and resolve issues systematically.Feedback Mechanism: Users can provide feedback, allowing for tool improvement and adaptation to new security threats.What This Means for YouSafePyScript is most useful for Python developers and software engineers who need an efficient way to detect vulnerabilities in their code without relying on traditional, labour-intensive methods. Its machine-learning foundation and integration with ChatGPT make it highly practical for real-world application, providing not only insights into code vulnerabilities but also generating secure code alternatives.Examining the DetailsSafePyScript’s effectiveness rests on a robust BiLSTM model. This model, using word2vec embeddings, achieved an impressive 98.6% accuracy, 96.2% precision, and 99.3% ROC in vulnerability detection. The researchers optimised the BiLSTM’s hyperparameters—such as a learning rate of 0.001 and a batch size of 128—through rigorous testing, achieving reliable results as benchmarks.Additionally, SafePyScript leverages ChatGPT’s language model to generate secure code alternatives. The research team implemented precise prompt engineering to maximise ChatGPT’s effectiveness in analysing Python code vulnerabilities, further supporting the tool’s usability.SafePyScript’s frontend design, built with HTML, CSS, JavaScript (with Ajax), and a Django backend, ensures a smooth user experience. This structure allows developers to log in, upload or import code, select detection models, review reports, and access secure code—all within an intuitive, accessible platform.You can learn more by reading the entire paper or accessing SafePyScript.🧠 Expert insight💥Here’s an excerpt from “Chapter 6: Time Series Forecasting as Regression” in the book, Modern Time Series Forecasting with Python - Second Edition by Manu Joseph and Jeffrey Tackes, published in October 2024.Global forecasting models – a paradigm shiftTraditionally, each time series was treated in isolation. Because of that, traditional forecasting has always looked at the history of a single time series alone in fitting a forecasting function. But recently, because of the ease of collecting data in today's digital-first world, many companies have started collecting large amounts of time series from similar sources, or related time series.For example, retailers such as Walmart collect data on sales of millions of products across thousands of stores. Companies such as Uber or Lyft collect the demand for rides from all the zones in a city. In the energy sector, energy consumption data is collected across all consumers. All these sets of time series have shared behavior and are hence calledrelated time series.We can consider that all the time series in a related time series come from separatedata generating processes(DGPs), and thereby model them all separately. We call these thelocalmodels of forecasting. An alternative to this approach is to assume that all the time series are coming from a single DGP. Instead of fitting a separate forecast function for each time series individually, we fit a single forecast function to all the related time series. This approach has been calledglobalorcross-learningin literature.The terminologyglobalwas introduced byDavid Salinas et al.in theDeepARpaper andCross-learningbySlawek Smyl....having more data will lead to lower chances of overfitting and, therefore, lower generalization error (the difference between training and testing errors). This is exactly one of the shortcomings of the local approach. Traditionally, time series are not very long, and in many cases, it is difficult and time-consuming to collect more data as well. Fitting a machine learning model (with all its expressiveness) on small data is prone to overfitting. This is why time series models that enforce strong priors were used to forecast such time series, traditionally. But these strong priors, which restrict the fitting of traditional time series models, can also lead to a form of underfitting and limit accuracy.Strong and expressive data-driven models, as in machine learning, require a larger amount of data to have a model that generalizes to new and unseen data. A time series, by definition, is tied to time, and sometimes, collecting more data means waiting for months or years and that is not desirable. So, if we cannot increase thelengthof the time-series dataset, we can increase thewidthof the time series dataset. If we add multiple time series to the dataset, we increase the width of the dataset, and there by increase the amount of data the model is getting trained with.Figure 5.7shows the concept of increasing the width of a time series dataset visually:Figure 5.7 – The length and width of a time series datasetThis works in favor of machine learning models because with higher flexibility in fitting a forecast function and the addition of more data to work with, the machine learning model can learn a more complex forecast function than traditional time series models, which are typically shared between the related time series, in a completely data-driven way.Another shortcoming of the local approach revolves around scalability. In the case of Walmart we mentioned earlier, there are millions of time series that need to be forecasted and it is not possible to have human oversight on all these models. If we think about this from an engineering perspective, training and maintaining millions of models in a production system would give any engineer a nightmare. But under the global approach, we only train a single model for all these time series, which drastically reduces the number of models we need to maintain and yet can generate all the required forecasts.This new paradigm of forecasting has gained traction and has consistently been shown to improve the local approaches in multiple time series competitions, mostly in datasets of related time series. In Kaggle competitions, such asRossman Store Sales(2015),Wikipedia WebTraffic Time Series Forecasting(2017),Corporación Favorita Grocery Sales Forecasting(2018), andM5 Competition(2020), the winning entries were all global models—either machine learning or deep learning or a combination of both. TheIntermarché Forecasting Competition(2021) also had global models as the winning submissions. Links to these competitions are provided in theFurther readingsection.Although we have many empirical findings where the global models have outperformed local models for related time series, global models are still a relatively new area of research.Montero-Manson and Hyndman(2020) showed a few very interesting results and showed that any local method can be approximated by a global model with required complexity, and the most interesting finding they put forward is that the global model will perform better, even with unrelated time series. We will talk more about global models and strategies for global models in Chapter 10,Global Forecasting Models.Modern Time Series Forecasting with Python - Second Edition was published in October 2024.Get the eBook for $46.99 $31.99!Get the Print Book for $57.99!And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 166
Visually different images
Subscribe to Packt PythonPro
PythonPro is a weekly newsletter that brings you the latest developments in the Python landscape along with handpicked tutorials, guides, and analyses from experts.

Divya Anne Selvaraj
04 Mar 2025
Save for later

PythonPro #65: PyPy v7.3.19 Updates, FastRTC for AI Communication, and AutoML with mljar-supervised

Divya Anne Selvaraj
04 Mar 2025
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#65PyPy v7.3.19 Updates, FastRTC for AI Communication, and AutoML with mljar-supervisedHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the book, Learn Quantum Computing with Python and IBM Quantum, Second Edition, which describes how to visualize quantum circuits using Qiskit's circuit_drawer function.News Highlights: PyPy v7.3.19 fixes JIT bugs and introduces Python 3.11 beta, FastRTC launches for AI-driven real-time communication, Google's free Gemini-powered agent automates data analysis on Colab, and mljar-supervised automates the full ML pipeline with its new AutoML framework.My top 5 picks from today’s learning resources:The Secret Life of__init__.py : Why This Tiny File Holds the Key to Python’s Magic🔑Demand Forecasting with Darts: A Tutorial🎯It’s About Time: An Empirical Study of Date and Time Bugs in Open-Source Python Software⏳uv + Ray: Pain-Free Python Dependencies in Clusters🌐Modern Good Practices for Python Development🛠️And, in From the Cutting Edge, we introduce evclust, a Python library that enhances traditional clustering methods by incorporating the Dempster-Shafer theory to effectively manage and represent uncertainty in cluster memberships.Stay awesome!Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPyPy v7.3.19 released: The release primarily addresses JIT-related bug fixes and introduces a Python 3.11 beta interpreter, alongside continued support for Python 2.7 and 3.10, with plans to drop support for 3.10 in the next update.FastRTC: The Real-Time Communication Library for Python: The library simplifies building audio and video AI applications with features like automatic voice detection, a built-in Gradio UI, and the capability to connect via phone.Google launches free Gemini-powered Data Science Agent on its Colab Python platform: The free Gemini 2.0-powered agent will automate data analysis with AI-generated Jupyter notebooks to streamline workflows for researchers, data scientists, and developers.AutoML Open Source Framework with Python API and GUI: The framework, mljar-supervised, automates the entire ML pipeline, including data preprocessing, feature engineering, model selection, and hyperparameter tuning.💼Case Studies and Experiments🔬It’s About Time: An Empirical Study of Date and Time Bugs in Open-Source Python Software: Systematically analyzes date and time computation bugs in Python projects using data from GitHub and suggests improvements in software practices.I Uploaded a 27-Year-Old EXE File to Claude 3.7 and What Happened Next Blew My Mind: Describes an experience with Claude 3.7, which successfully analyzed and converted a Visual Basic EXE into a functional Python application using Pygame.📊AnalysisEmbedding Python in Elixir, it's Fine: Analyzes the integration of Python into Elixir through Pythonx, enhancing interoperability and functionality within Elixir's ecosystem.A peek into a possible future of Python in the browser: Discusses breakthroughs in running Python on the web, focusing on the SPy project which seeks to compile Python-like code into WebAssembly.🎓Tutorials and Guides🤓Performing K-means Clustering with Python and Scikit-learn: Explains the concept and steps involved in K-means clustering, including choosing the number of clusters, assigning data points to the nearest cluster, updating cluster centers, and assessing convergence using the inertia metric.The Secret Life of__init__.py : Why This Tiny File Holds the Key to Python’s Magic: Explains the purpose and functionality of the __init__.py file in Python, highlighting its crucial role in treating directories as packages and organizing modules efficiently within a Python project.Controlling Ableton Live with Python: Provides step-by-step instructions for setting up MIDI in Ableton using the IAC driver, sending MIDI commands using the rtmidi library, generating melodies with Markov chains, and more.Project Setup with Python: Covers modern practices for Python project setup, including using pyproject.toml for configurations, src layout for directory structures, virtual environments for development, and the use of requirements files for package management.Affinity Propagation with Python and Scikit-learn: Explores both theoretical aspects of the algorithm—such as how it mimics social group formation and determines cluster numbers—and practical implementation steps, including creating and running an example model.How to deploy Python or Flask apps on Plesk: Covers installing mod_python and Phusion Passenger, updating Plesk components, configuring Python support in domain settings, adding WSGI application settings to your code, and managing Apache and Nginx settings.Demand Forecasting with Darts: A Tutorial: Offers a comprehensive tutorial on demand forecasting using Python and Darts, focusing on the TiDE and TFT models for retail scenarios.🔑Best Practices and Advice🔏uv + Ray: Pain-Free Python Dependencies in Clusters: Discusses how the integration of the uv package manager with Ray enhances Python dependency management in distributed systems by enabling consistent and efficient environment setup across cluster nodes.Modern Good Practices for Python Development: Covers code formatting, linting, type hinting, and testing primarily with pytest, alongside packaging advice and the use of data classes, enums, f-strings, and datetime objects.Counting: How Hard Can it Be?: Explores common misunderstandings in object identity and equality, using the simple act of counting biscuits as a teaching tool.Hello FastHTML and MonsterUI: Introduces Jeremy Howard's FastHTML and Isaac Flath's MonsterUI, Python-based tools that simplify building modern, responsive web applications with features for easy prototyping and scalability.Python packaging: Why we can't have nice things: Strongly advises against using Pip with administrative privileges and recommends the exclusive use of wheel distributions to enhance security and reduce the risk of running unintended code.🔍From the Cutting Edge: evclust—Python library for evidential clustering💥In "evclust: Python library for evidential clustering," Soubeiga and Antoine present evclust, a Python library which enables evidential clustering. The approach advances traditional clustering methods by integrating the Dempster-Shafer theory to manage uncertainty in cluster memberships.ContextEvidential clustering extends traditional clustering methods by allowing objects to belong to multiple clusters, each with varying degrees of belief quantified using mass functions. This approach leverages the Dempster-Shafer Theory, a mathematical framework designed for modeling and reasoning with uncertainty. By incorporating this theory, evclust quantifies and manages the uncertainty of cluster memberships more effectively than traditional hard or fuzzy clustering methods. The result is a Credal Partition, a structured set of mass functions that represents uncertain cluster memberships and enables a more nuanced interpretation of data. This framework is particularly valuable in applications where uncertainty or overlap between clusters is a concern, providing a robust tool for complex data analysis scenarios.Key Features of evclustComprehensive Library: Includes state-of-the-art evidential clustering algorithms and tools for analysis.Integration with Python Ecosystem: Works seamlessly with libraries like numpy, pandas, matplotlib, and scikit-learn.Visualization and Evaluation Tools: Facilitates the understanding of credal partitions through various visual and analytical tools.Cross-platform Compatibility: Supports Windows, macOS, and Linux operating systems.Open Source: Available under the MIT License with ongoing community contributions on GitHub.What This Means for Youevclust is particularly relevant for data scientists, researchers, and developers involved in data analysis and clustering where uncertainty and ambiguity are factors. It provides a robust framework for enhancing traditional clustering methods with the capability to manage and represent uncertainty effectively.Examining the DetailsThe evclust library's architecture supports a variety of evidential clustering algorithms, catering to different data types and clustering complexities. The algorithms like Evidential c-Means (ECM), Relational Evidential c-Means (RECM), and Credal c-Means (CCM) extend existing clustering methodologies to handle uncertainty in data more effectively. These methods incorporate the Dempster-Shafer theory to assign belief levels to cluster memberships, offering a nuanced interpretation compared to hard or fuzzy clustering. This approach allows handling complex uncertainty patterns and better management of overlapping cluster memberships and outliers. The library's comprehensive toolset for visualizing, evaluating, and analyzing credal partitions helps in making more informed decisions based on clustering results.You can learn more by reading the entire paper or accessing the library on GitHub.🧠 Expert insight💥Here’s an excerpt from “Chapter 8: Optimizing and Visualizing Quantum Circuits” in the book, Learn Quantum Computing with Python and IBM Quantum, Second Edition by Robert Loredo, published in February 2025.Visualizing and enhancing circuit graphsThis section will focus on the various visualizations available in Qiskit. The graphs we have been using so far were from the default visualization library in Qiskit. However, we can specify other drawing tools that may be better suited for your documentation purposes. Say,for example, that you are authoring a research paper withLaTeXand youwant to use the LaTeX content.By simply adding style parameters from the Qiskit visualization library, you can then leverage the many features included with the visualization library. We’ll cover a few of those now to get you started.Learning about customized visual circuitsWhen rendering a circuit, it is often necessary or convenient to have the results in a format that suits the format of your document. It’s here where the Qiskit circuit_drawer comes in handy with various features. Let’s begin with a simple quantum circuit to illustrate the various visual rendering examples:First, let’s create a quantum circuit with various operators to get a good representation of all the visual components in the various formats:# Sample quantum circuitqc = QuantumCircuit(4)qc.h(0)qc.cx(0,1)qc.barrier()qc.cx(0,2)qc.cx(0,3)qc.barrier()qc.cz(3,0)qc.h(0)qc.measure_all()# Draw the circuit using the default renderercircuit_drawer(qc, output='mpl')This will render the following circuit drawing, which is just a random representation of gates. This circuit does not do anything special; it’s just used to represent various components. As an option, you can use therandom_circuitmethod to create a random circuit:Figure 8.17: Circuit rendering using the default libraryNext, we will render the preceding circuit usinglatex:circuit_drawer(qc, output='latex')This will render thelatexversion of the circuit:If you’re running this on your local machine and not on the platform, you may have some warnings or errors indicating you need to install some file dependencies, such as installingpylatexenc. To install this library you will need to runpip install pylatexencin a cell first, and then restart the kernel.Figure 8.18: Circuit rendering using the latex libraryIf you are planning to post your circuit onto a website, blog, or social media and would like to include some styles on the image, you can do that as well by passing in the style contents as a parameter, such asbackgroundcolor,gatetextcolor, andfontsize, just to name a few:# Define the style to render the circuit and componentsstyle = {'backgroundcolor': 'lightblue','gatefacecolor': 'white', 'gatetextcolor': 'black', 'fontsize': 9}# Draw the mpl with the specified stylecircuit_drawer(qc, style=style, output='mpl')The preceding code results in adjusting the background, gate color schemes, and font size, as illustrated here:Figure 8.19: Rendered circuit with the custom style dictionary on matplotlibTo use the style setting, you must use the outputmatplotlibas this is the only library that supports the styles.Note:Details on the available list of styles can be found in theStyle Dict Detailssection of the Qiskit API documentation (https://docs.quantum-computing.ibm.com/api/qiskit/qiskit.visualization.circuit_drawer).Learn Quantum Computing with Python and IBM Quantum, Second Editionwas published in February 2025. Packt library subscribers can continue reading the entire book for free.Get the eBook for $35.99 $24.99Get the Print Book for $44.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 140

Divya Anne Selvaraj
29 Oct 2024
Save for later

PythonPro #53: FastAPI on Docker, Python-CUDA Integration with Numbast, and Concurrent Requests with httpx vs aiohttp

Divya Anne Selvaraj
29 Oct 2024
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#53FastAPI on Docker, Python-CUDA Integration with Numbast, and Concurrent Requests with httpx vs aiohttpHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, FastAPI Cookbook, which explains how to deploy FastAPI apps using Docker, covering Dockerfile creation, image building, and container generation.News Highlights: Numbast simplifies Python-CUDA C++ integration by auto-generating Numba bindings for CUDA functions; and DJ Beat Drop enhances Django’s new developer onboarding with a streamlined project initializer.Concurrent Requests in Python: httpx vs aiohttp🚦Python Thread Safety: Using a Lock and Other Techniques🔐Time-Series Data Meets Blockchain: Storing Time-Series Data with Solidity, Ganache and Python⛓️Let's Eliminate General Bewilderment • Python's LEGB Rule, Scope, and Namespaces🧩Optimization of Iceberg Table In AWS Glue🧊And, today’s Featured Study, introduces LSS-SKAN, a Kolmogorov–Arnold Network (KAN) variant that uses a single-parameter function (Shifted Softplus) for efficient accuracy and speed.Stay awesome!Divya Anne SelvarajEditor-in-ChiefP.S.:Thank you to those who participated in this month's survey. With this issue, we have tried to fulfill at least one request made by each participant. Keep an eye out for next month's survey.Sign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsBridging the CUDA C++ Ecosystem and Python Developers with Numbast: Numbast streamlines the integration of CUDA C++ libraries with Python by automatically generating Numba bindings for CUDA functions.Improving the New Django Developer Experience: Introduces DJ Beat Drop as a streamlined project initializer to improve the onboarding experience for new Django developers.💼Case Studies and Experiments🔬Concurrent Requests in Python: httpx vs aiohttp: Describes how switching from the httpx to aiohttp library resolved high-concurrency issues and improved stability in a computer vision application.From Python to CPU instructions: Part 1: Explains how rewriting a Python program in C exposes low-level details Python abstracts away, particularly highlighting the manual effort required for tasks like input handling.📊AnalysisPython 3.13, what didn't make the headlines: highlights Python 3.13's understated but impactful improvements, focusing on debugging enhancements, filesystem fixes, and minor concurrency updates.When should you upgrade to Python 3.13?: Advises waiting until December 2024 for Python 3.13 upgrades to ensure compatibility with libraries, tools, and bug-fix improvements.🎓Tutorials and Guides🤓Python Thread Safety: Using a Lock and Other Techniques: Explains how to address issues like race conditions and introduces synchronization techniques such as semaphores to ensure safe, concurrent code execution.Time-Series Data Meets Blockchain: Storing Time-Series Data with Solidity, Ganache and Python: Walks you through the steps to set up Ethereum locally, deploy a smart contract, and store and retrieve data points.Beautiful Soup: Build a Web Scraper With Python: Covers how to inspect site structure, scrape HTML content, and parse data using Requests and Beautiful Soup to build a script that extracts and displays job listings.🎥Advanced Web Scraping Tutorial! (w/ Python Beautiful Soup Library): Covers Requests to retrieve and parse data, especially from dynamic pages like Walmart's, with enhancements like using modified headers.Fuzzy regex matching in Python: Introduces the orc library to simplify fuzzy matching by providing a human-friendly interface that highlights edits and can invert changes, enhancing usability for complex text correction tasks.Achieving Symmetrical ManyToMany Filtering in Django Admin: Covers using Django's RelatedFieldWidgetWrapper and a custom ModelForm , allowing for consistent filtering on both sides of a ManyToMany relationship.Get started with the free-threaded build of Python 3.13: Details installation, usage in Python programs, compatibility with C extensions, and how to detect GIL status programmatically.🔑Best Practices and Advice🔏Let's Eliminate General Bewilderment • Python's LEGB Rule, Scope, and Namespaces: Details how variables are resolved in local, enclosing, global, and built-in scopes, using accessible examples to clarify potential pitfalls.🎥Robust LLM pipelines (Mathematica, Python, Raku): Given the unreliable and often slow nature of LLMs, this presentation outlines methods to enhance pipeline efficiency, robustness, and usability.A new way of Python Debugging with the Frame Evaluation API: Introduces Python's Frame Evaluation API, a tool that allows real-time monitoring and control of program execution at the frame level.Buffers on the edge: Python and Rust: Explains how Python's buffer protocol, which enables memory sharing between objects, can lead to undefined behavior due to data races in C, and the challenges Rust faces in maintaining soundness.Optimization of Iceberg Table In AWS Glue: Discusses how AWS Glue offers built-in optimization, but a Python-based solution using boto3 and Athena SQL scripts provides customizable, cost-effective automation.🔍Featured Study: LSS-SKAN💥In "LSS-SKAN: Efficient Kolmogorov–Arnold Networks based on Single-Parameterized Function," Chen and Zhang from South China University of Technology present a refined Kolmogorov–Arnold Network (KAN) variant. Their study introduces an innovative design principle for neural networks, improving accuracy and computational speed while ensuring greater model interpretability.ContextKANs are neural networks based on the Kolmogorov-Arnold theorem, which breaks down complex, multivariate functions into simpler univariate ones, aiding in better visualisation and interpretability. This makes them valuable in critical decision-making applications, where understanding a model's decision process is crucial. Unlike typical neural networks like Multilayer Perceptrons (MLPs), which rely on opaque linear and activation functions, KANs assign functions to network edges, creating a more interpretable structure. Over time, several KAN variants, such as FourierKAN and FastKAN, have emerged, each with unique basis functions to balance speed and accuracy.LSS-SKAN builds on these advancements with the Efficient KAN Expansion (EKE) Principle, a new approach that scales networks using fewer complex basis functions, allocating parameters to the network's size instead. This principle is central to LSS-SKAN's efficiency and demonstrates how a simpler basis function can yield high accuracy with reduced computational cost.Key Features of LSS-SKANEKE Principle: Scales the network by prioritising size over basis function complexity, making LSS-SKAN faster and more efficient.Single-Parameter Basis Function: Utilises the Shifted Softplus function, requiring only one learnable parameter for each function, which simplifies the network and reduces training time.Superior Accuracy: Outperforms KAN variants, showing a 1.65% improvement over Spl-KAN, 2.57% over FastKAN, 0.58% over FourierKAN, and 0.22% over WavKAN on the MNIST dataset.Reduced Training Time: Achieves significant reductions in training time, running 502.89% faster than MLP+rKAN and 41.78% faster than MLP+fKAN.What This Means for YouFor those working in machine learning or fields requiring interpretable AI, LSS-SKAN offers a practical solution to enhance neural network accuracy and speed while maintaining transparency in model decision-making. LSS-SKAN is particularly beneficial in applications involving image classification, scientific computing, or scenarios demanding high interpretability, such as medical or financial sectors where model explainability is crucial.Examining the DetailsThe researchers conducted detailed experiments using the MNIST dataset to measure LSS-SKAN’s performance against other KAN variants. They tested both short-term (10-epoch) and long-term (30-epoch) training cycles, focusing on two key metrics: accuracy and execution speed.Through these tests, LSS-SKAN consistently outperformed other KAN models in accuracy, achieving a 1.65% improvement over Spl-KAN, 2.57% over FastKAN, and 0.58% over FourierKAN, while also running 502.89% faster than MLP+rKAN and 41.78% faster than MLP+fKAN.The LSS-SKAN Python library is available on GitHub, along with experimental code, so you can replicate and build on their findings. They recommend a learning rate between 0.0001 and 0.001 for best results, particularly due to KANs’ sensitivity to learning rate adjustments.You can learn more by reading the entire paper and accessing LSS-SKAN.🧠 Expert insight💥Here’s an excerpt from “Chapter 12: Deploying and Managing FastAPI Applications” in the book, FastAPI Cookbook by Giunio De Luca, published in August 2024.Running FastAPI applications in Docker containersDockeris a useful tool that lets developers wrap applications with their dependencies into a container. This method makes sure that the application operates reliably in different environments, avoiding the commonworks on my machine issue. In this recipe, we will see how to make aDockerfile and run a FastAPI application inside a Docker container. By the end of this guide, you will know how to put your FastAPI application into a container, making it more flexible and simpler to deploy.Getting readyYou will benefit from some knowledge of container technology, especially Docker, to follow the recipe better. But first, check thatDocker Engineis set up properly on your machine. You can see how to do it at thislink:https://docs.docker.com/engine/install/.If you use Windows, it is better to installDocker Desktop, which is a Docker virtual machine distribution with a built-ingraphical interface.Whether you have Docker Engine or Docker Desktop, make sure the daemon is running by typingthis command:$ docker imagesIf you don’t see any error about the daemon, that means that Docker is installed and working on the machine. The way to start the Docker daemon depends on the installation you choose. Look at the related documentation to see how todo it.You can use the recipe for your applications or follow along with theLive Applicationapplication that we introduced in the first recipe, which we are using throughoutthe chapter.How to do it…It is not very complicated to run a simple FastAPI application in a Docker container. The process consists ofthree steps:Createthe Dockerfile.Buildthe image.Generatethe container.Then, you just have to run the container to have theapplication working.Creating the DockerfileThe Dockerfile contains the instructions needed to build the image from an operating system and the file we wantto specify.It is good practice to create a separate Dockerfile for the development environment. We will name itDockerfile.devand place it under the projectroot folder.We start the file by specifying the base image, which will beas follows:FROM python:3.10This will pull an image from the Docker Hub, which already comes with Python 3.10 integrated. Then, we create a folder called/codethat will hostour code:WORKDIR /codeNext, we copyrequirements.txtinto the image and install the packages insidethe image:COPY ./requirements.txt /code/requirements.txtRUN pip install --no-cache-dir -r /code/requirements.txtThepip installcommand runs with the--no-cache-dirparameter to avoidpipcaching operations that wouldn’t be beneficial inside a container. Also, in a production environment, for larger applications, it is recommended to pin fixed versions of the packages inrequirements.txtto avoid potential compatibility issues due topackage upgrades.Then, we can copy theappfolder containing the application into the image with thefollowing command:COPY ./app /code/appFinally, we define the server startup instructionas follows:CMD ["fastapi", "run", "app/main.py", "--port", "80"]This is all we need to create ourDockerfile.devfile.Building the imageOnce we haveDockerfile.dev, we can build the image. We can do it by running the following from the command line at the project rootfolder level:$ docker build -f Dockerfile.dev -t live-application .Since we named our DockerfileDockerfile.dev, we should specify it in an argument. Once the build is finished, you can check that the image has been correctly built by runningthe following:$ docker images live-applicationYou should see the details of the image on the output printlike this:REPOSITORY TAG IMAGE ID CREATED SIZElive-application latest 7ada80a535c2 43 seconds ago 1.06GBWith the image built, we can proceed with creating thecontainer creation.Creating the containerTo create the container and run it; simply runthe following:$ docker run -p 8000:80 live-applicationThis will create the container and run it. We can see the container by runningthe following:$ docker ps -aSince we didn’t specify a container name, it will automatically affect a fancy name. Mine, for example,isbold_robinson.Open the browser onhttp://localhost:8000and you will see the home page response ofour application.This is all you need to run a FastAPI application inside a Docker container. Running a FastAPI application in a Docker container is a great way to use the advantages of both technologies. You can easily scale, update, and deploy your web app withminimal configuration.See alsoThe Dockerfile can be used to specify several features of the image. Check the list of commands in the official documentation:Dockerfilereference:https://docs.docker.com/reference/dockerfile/Docker CLI documentation:https://docs.docker.com/reference/cli/docker/FastAPI in Containers - Docker:https://fastapi.tiangolo.com/deployment/docker/FastAPI Cookbook was published in August 2024.Get the eBook for $35.99 $24.99!Get the Print Book for $44.99 $30.99!And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 140

Divya Anne Selvaraj
03 Dec 2024
Save for later

PythonPro #58: HTTP Requests Demystified, Goat vs. Car 🐐🚗, and Python's Dependency Dilemma

Divya Anne Selvaraj
03 Dec 2024
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#58HTTP Requests Demystified, Goat vs. Car 🐐🚗, and Python's Dependency DilemmaHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, Learn Python Programming - Fourth Edition, which introduces HTTP requests using Python's requests library, with examples of performing GET and POST requests against the httpbin.org API.News Highlights: Ollama 0.4 adds Python function tools and improved JSON schema with Pydantic; Preswald simplifies metrics setup in VSCode with AI-powered data engineering.My top 5 picks from today’s learning resources:Goodbye Make and Shell, Hello... Python?🛠️➡️🐍🚪🐐🚗Goat or Car? Solving The Monty Hall Problem With Python and NumPyContinuous Integration and Deployment for Python With GitHub Actions🔄📦🚀🔥Python dependency management is a dumpster fireHow to Check if a Python String Contains a Substring🔍🔗📜And, today’s Featured Study, introduces Stateful Behaviour Trees (SBTs), an evolution of Behaviour Trees (BTs), tailored for dynamic and safety-critical systems, and BehaVerify, a tool for scalable formal verification, integrating with Python libraries and tools like nuXmv.Stay awesome!Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsOllama Python library 0.4 with function calling improvements: The version introduces enhanced functionality, including the ability to use Python functions as tools, improved JSON schema generation using Pydantic and more.Preswald – AI Data Engineer in VSCode for Local Testing, Metrics: Preswald is an AI-powered data engineer integrated into VSCode, designed to simplify setting up and managing startup metrics within the codebase.💼Case Studies and Experiments🔬Vector animations with Python: Demonstrates creating dynamic vector animations using Python with Gizeh for vector graphics and MoviePy for video and GIF generation.AnyChart integration for the Financial Trading Dashboard with Python Django: Details integrating AnyChart into a Python Django-based Financial Trading Dashboard, replacing D3.js for treemaps and adding interactive stock charts.📊AnalysisGoodbye Make and Shell, Hello... Python?: Advocates using Python for build and project tooling over traditional Makefiles and shell scripts, emphasizing its benefits for maintainability, debuggability, and flexibility.Constraints are Good: Python's Metadata Dilemma: Discusses Python's metadata challenges, emphasizing the lack of constraints in its system compared to JavaScript's structured approach.🎓Tutorials and Guides🤓Goat or Car? Solving The Monty Hall Problem With Python and NumPy: Explains the Monty Hall problem and demonstrates, using Python and NumPy, that switching doors increases the probability of winning.Continuous Integration and Deployment for Python With GitHub Actions: Covers workflow creation, automated testing, secure credential handling, dependency updates, and deployment to PyPI with practical examples.Python's F-String for String Interpolation and Formatting: Demonstrates their use for interpolating variables, expressions, and formatting into strings efficiently, with enhanced readability and performance.Basic Input and Output in Python: Explains Python's basic input and output functions, focusing on input() for capturing user input and print() for displaying output, with advanced features like formatting and enhanced input collection.Advanced Python Development Workflow in Emacs: Explores Emacs's extensibility, allowing for a personalized development setup comparable to established IDEs like PyCharm or VS Code.Augmented Reality with Python and OpenCV (part 3): Describes improving an augmented reality (AR) application built with Python and OpenCV by implementing a Kalman filter for tracking and stabilization.🎥Let's build a AI Photo Generator with Python and FastAPI: Demonstrates fine-tuning the open-source Flux image generation model using Replicate's AI services to generate personalized AI-generated images at minimal costs.Django and Postgres for the Busy Rails Developer: Shares insights from a Rails developer’s experience with Python, Django, and Postgres, highlighting differences and similarities in runtime management, library use, an more.🔑Best Practices and Advice🔏What Does if __name__ == "__main__" Do in Python?: Details the idiom'spurpose for controlling code execution when a file runs as a script versus when it’s imported as a module, along with usage scenarios and best practices.How to Check if a Python String Contains a Substring: Explains methods to check if a Python string contains a substring, focusing on the in operator for simplicity, along with alternatives like .count(), .index() , and pandas.Python Exceptions: An Introduction: introduces Python exceptions, explaining how to handle errors using try, except, else, and finally blocks, raise exceptions, and create custom ones to ensure robust error management.Python dependency management is a dumpster fire: Advocates for best practices such as using virtual environments, explicit dependency management with tools like Poetry, and avoiding global package installationsSome notes on my experiences with Python type hints and mypy: Discusses the limitations of type aliases versus NewType for preventing type confusion, the inability to use NewType with certain operations, and more.🔍Featured Study: Formalising Stateful Behaviour Trees for Advanced System Verification💥In the paper, "Formalising Stateful Behaviour Trees," presented at FMAS 2024, Serbinowska et al. explore the formalisation and verification of SBTs. The study aims to expand Behaviour Trees' capabilities, ensuring their reliability in dynamic and safety-critical applications through enhanced computational modelling and verification techniques.ContextBTs are modular, hierarchical controllers widely used in robotics and AI for managing complex systems. They organise tasks into a tree structure, enabling flexible and scalable behaviour design. However, traditional BTs lack persistent memory, limiting their use in state-dependent or dynamic environments.SBTs address this gap by incorporating a shared memory (blackboard), allowing them to track auxiliary variables and adapt to environmental changes. This makes them suitable for advanced applications, such as autonomous systems, where predictability and safety are crucial. The study also introduces BehaVerify, a tool designed to formalise and verify SBTs, which integrates with Python libraries and supports model-checking tools.Key Features of SBTsShared Blackboard Memory: SBTs include a persistent shared memory, called a blackboard, which allows tracking of auxiliary variables across tasks and ticks. This feature enables dynamic systems to adapt to changes in their environment.Enhanced Computational Power: The study establishes that SBTs are computationally equivalent to Turing machines when the blackboard uses infinite memory and to finite state automata when memory is constrained. This versatility allows SBTs to model a wide range of system behaviours.Domain-Specific Language (DSL): The authors introduce a DSL specifically designed for creating SBT models. This DSL generates Python-compatible implementations and integrates with tools like nuXmv for formal verification.Scalability: BehaVerify, the tool developed in the study, demonstrates the ability to verify trees with up to 20,000 nodes, outperforming existing tools such as MoVe4BT, which struggles beyond 250 nodes.Fast-Forwarding Mechanism: To reduce computational overhead, BehaVerify condenses the execution of multiple tree ticks into single computational steps, significantly enhancing verification speed and performance.Versatility in Applications: SBTs can model deterministic systems like finite state machines and handle complex, nondeterministic behaviours, making them suitable for safety-critical applications in robotics and AI.What This Means for YouThis study is highly relevant for developers and researchers in robotics, AI, and safety-critical systems. For Python programmers, the integration of BehaVerify with libraries like PyTrees simplifies the design and testing of stateful, autonomous behaviours. The ability to verify temporal logic specifications ensures robust system performance, making SBTs a powerful tool for advanced applications, from autonomous vehicles to robotic mission planning.Examining the DetailsIn key experiments, such as the “Bigger Fish” and “Simple Robot” scenarios, BehaVerify verifies trees with up to 20,000 nodes and handles extensive state spaces. The inclusion of a fast-forwarding mechanism condenses execution steps, significantly improving verification speed. Real-world examples, including a drone tracking moving targets in dynamic environments, illustrate the tool’s practicality and relevance for safety-critical systems.You can learn more by reading the entire paper and accessing BehaVerify.🧠 Expert insight💥Here’s an excerpt from “Chapter 8: Files and Data Persistence” in the book, Learn Python Programming - Fourth Edition by Fabrizio Romano and Heinrich Kruger.Making HTTP requestsIn this section, we explore two examples on HTTP requests. We will use therequestslibrary for these examples, which you can install withpip, and it is included in the requirements file for this chapter.We are going to perform HTTP requests against the httpbin.org API, which,interestingly, was developed by Kenneth Reitz, the creator of the requestslibrary itself.This library is among the most widely adopted:# io_examples/reqs.pyimport requestsurls = { "get": "https://httpbin.org/get?t=learn+python+programming", "headers": "https://httpbin.org/headers", "ip": "https://httpbin.org/ip", "user-agent": "https://httpbin.org/user-agent", "UUID": "https://httpbin.org/uuid", "JSON": "https://httpbin.org/json",}def get_content(title, url): resp = requests.get(url) print(f"Response for {title}") print(resp.json())for title, url in urls.items(): get_content(title, url) print("-" * 40)The preceding snippet should be straightforward. We declare a dictionary of URLs against which we want to perform HTTP requests. We have encapsulated the code that performs the request into theget_content()function. As you can see, we perform a GET request (by usingrequests.get()), and we print the title and the JSON decoded version of the body of the response. Let us spend a few words on this last bit.When we perform a request to a website, or to an API, we get back a response object encapsulating the data that was returned by the server we performed the request against. The body of some responses fromhttpbin.orghappens to be JSON encoded, so instead of getting the body as it is (by readingresp.text) and manually decoding it callingjson.loads()on it, we simply combine the two by leveraging thejson()method of the response object. There are plenty of reasons why therequestspackage has become so widely adopted, and one of them is its ease of use.Now, when you perform a request in your application, you will want to have a much more robust approach in dealing with errors and so on, but for this chapter, a simple example will do. We will see more examples of requests inChapter 14, Introduction to API Development.Going back to our code, in the end, we run aforloop and get all the URLs. When you run it, you will see the result of each call printed on your console, which should look like this (prettified and trimmed for brevity):$ python reqs.pyResponse for get{ "args": {"t": "learn python programming"}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.31.0", "X-Amzn-Trace-Id": "Root=1-123abc-123abc", }, "origin": "86.14.44.233", "url": "https://httpbin.org/get?t=learn+python+programming",}… rest of the output omitted …Notice that you might get a slightly different output in terms of version numbers and IPs, which is fine. Now, GET is only one of the HTTP verbs, albeit one of the most commonly used. Let us also look at how to use the POST verb. This is the type of request you make when you need to send data to the server, for example to request the creation of a resource. Every time you submit a form on the web, you are making a POST request. So, let us try to make one programmatically:# io_examples/reqs_post.pyimport requestsurl = "https://httpbin.org/post"data = dict(title="Learn Python Programming")resp = requests.post(url, data=data)print("Response for POST")print(resp.json())The preceding code is very similar to what we saw before, only this time we don't callget(), butpost(), and because we want to send some data, we specify that in the call. Therequestslibrary offers much more than this. It is a project that we encourage you to check out and explore, as it is quite likely you will be using it too.Running the previous script (and applying some prettifying magic to the output) yields the following:$ python reqs_post.pyResponse for POST{ "args": {}, "data": "", "files": {}, "form": {"title": "Learn Python Programming"}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "30", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.31.0", "X-Amzn-Trace-Id": "Root=1-123abc-123abc", }, "json": None, "origin": "86.14.44.233", "url": "https://httpbin.org/post",}Notice how the headers are now different, and we find the data we sent in the form of key/value pair of the response body.We hope these short examples are enough to get you started, especially with requests. The web changes every day, so it is worth learning the basics and then brushing up every now and then.Learn Python Programming was published in November 2024.Get the eBook for $35.99 $24.99Get the Print Book for $44.99And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 138
Divya Anne Selvaraj
11 Sep 2024
Save for later

Master Python for Data, AI, and API Development

Divya Anne Selvaraj
11 Sep 2024
New Python books—designed for today’s needsMaster Python for Data, AI, and API DevelopmentHi ,Python powers some of the fastest-growing fields in tech today. According to the latest Python Developer Survey results, 47% of Python users apply it in data analysis, 42% in machine learning, and 39% in web development. With Python’s influence only expanding, staying ahead means mastering these key areas.Packt's August 2024 releases offer the practical expertise you need to enhance your Python skills, whether you're working with big data, building machine learning models, or developing high-performance APIs.Python Feature Engineering Cookbook - Third Editionby Soledad GalliA complete guide to crafting powerful features for your machine learning modelsEquips you with practical techniques for handling complex datasets, to craft features that will improve model performance.Learn to impute missing values, transform numerical variables, and extract powerful features from complex datasets like time series and transactional data.Get the eBook for $35.99 $24.99!Get the Print Book for $44.99!Polars Cookbook by Yuki KakegawaOver 60 practical recipes to transform, manipulate, and analyze your data using Python Polars 1.xOptimise data analysis tasks with Python Polars, a blazingly fast alternative to pandas.Ideal for data professionals looking to improve performance across a variety of datasets, solve common data problems, perform complex transformations, and analyse time-series data.Get the eBook for $35.99 $24.99!Get the Print Book for $44.99!FastAPI Cookbook by Giunio De LucaDevelop high-performance APIs and web applications with PythonFastAPI is gaining ground rapidly, with 25% of Python developers now using it for web development.Learn how to use FastAPI’s modern, async-friendly features, and take your backend development to the next level with custom middleware and WebSockets.Get the eBook for $35.99 $24.99!Get the Print Book for $44.99!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 136

Divya Anne Selvaraj
13 May 2025
Save for later

PythonPro #70: Python Hits All-Time High, New Type Checker ‘ty’, SQL-Ready ML Pipelines, and Debugging RAG with raggy

Divya Anne Selvaraj
13 May 2025
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#70Python Hits All-Time High, New Type Checker ‘ty’, SQL-Ready ML Pipelines, and Debugging RAG with raggyLive Webinar | Scale AppSec with Security Champions – May 15Security Champions programs are a proven way to scale AppSec across dev teams. Join Snyk’s live webinar on May 15 @ 11AM ET✓ Defining the role of security champions✓ Designing a scalable, tailored program✓ Recognizing, rewarding & growing your champions🎓 BONUS: Earn CPE credits for attending!Save your spot now!Hi ,Welcome to a brand new issue of PythonPro!News Highlights: Python hits an all-time high in the Tiobe Index, solidifying its dominance; Astral unveils ty, a fast new type checker built to scale alongside Ruff and UV; Python 3.14 enters beta with t-strings and key PEPs for type checking and debugging; Orbital lets developers run scikit-learn pipelines as pure SQL directly inside databases.My top 5 picks from today’s learning resources:What’s Happening to Embeddings During Training?🧠How to Build an MCP Server in 5 Lines of Python🔌Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines🎥Engineer Python projects like a PRO🛠️Top Python Code Quality Tools to Improve Your Development Workflow🧹And, in From the Cutting Edge, we introduce raggy, a developer tool that enables real-time, interactive debugging of Retrieval-Augmented Generation (RAG) pipelines by combining a Python library of composable components with a visual interface for rapid iteration and evaluation.Stay awesome!Divya Anne SelvarajEditor-in-ChiefPractical workshops and technical sessions with 20+ ML engineers and researchers.• Sebastian Raschka: Live AMA on Large Language Models• Khuyen Tran: GPTs for time series forecasting• Luca Massaron, Thomas Nield, and others: Applied ML at scaleUse code EARLY40 for 40% off.Register with EARLY40Sign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPython popularity climbs to highest ever – Tiobe: Python has reached its highest-ever Tiobe Index rating at 25.35% in May 2025, surpassing all languages since Java’s 2001 peak.ty: Astral's New Type Checker (Formerly Red-Knot) - Talk Python to Me Ep. 506:Developed as a complement to Astral’s popular toolsRuffandUV,tyaims to offer faster, scalable, and more beginner-friendly type checking. It focuses on performance, better editor integration, and smoother adoption in large codebases. TY will be released as a standalone tool, not a drop-in replacement for MyPy or Pyright.Python's T-Strings Coming Soon and Other Python News for May 2025: Python 3.14 enters beta with PEP 750 introducing reusable template strings (t-strings) and PEPs 751, 768, and 781 enhancing dependency tracking, debugging safety, and type-checking support.Orbital for Python released: Orbital converts trained scikit-learn pipelines into pure SQL, enabling machine learning model execution directly within databases—no Python runtime needed.💼Case Studies and Experiments🔬An Empirical Study on the Performance and Energy Usage of Compiled Python Code: Evaluates Python compilers across seven benchmarks using eight compilation tools. Codon, PyPy, and Numba showed over 90% improvement in speed and energy, while Nuitka reduced memory use consistently.I Taught My Fridge Inventory to Text Me When I’m Out of Milk: Combines a Raspberry Pi, Python, OCR (Tesseract), and Twilio to automate fridge inventory tracking.📊AnalysisWhat’s Happening to Embeddings During Training?: Investigates how embedding vectors evolve during training by analyzing metrics like Gini index, Hoyer sparsity, vector entropy, and spectral entropy.PyTorch Tensors Explained: Explains how PyTorch handles tensors—covering memory layout, strides, and autograd—to help developers understand efficient tensor operations and automatic differentiation.🎓Tutorials and Guides🤓How to Build an MCP Server in 5 Lines of Python: Shows you how to turn a Python function into an LLM-compatible tool by launching an MCP server using Gradio in just five lines of code. It covers setup, deployment, and integration with MCP clients like Claude Desktop and Cursor.Data Profiling in Python: common ways to explore your data (part 2): Introduces practical techniques for data profiling, focusing on using value_counts() to analyze categorical variables and understand dataset composition.5 steps to N-body simulation: Teaches beginners to build efficient N-body gravity simulations in Python through initial setup, implementing gravity, basic simulation, higher-order methods, and adaptive time-stepping.Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines: This new Python framework enables real-time video analytics using Python tools, and supports object detection, tracking, captioning, and more.The Python Profilers: Explains how to use Python’s deterministic profilers—cProfile and profile —to analyze performance by measuring function call frequency and duration and covers usage examples.Automating code deletion with Gemini (and a little Python): Details how the author used Gemini 2.0 Flash and Python to automate the removal of outdated docgen code from 235 GN build files after migrating Pigweed’s documentation system to Bazel.📖Open Source Book | Causal Inference for The Brave and True by Matheus Facure Alves: Offers a Python-based, practical introduction to causal inference, balancing rigorous theory with humour and real-world examples. Part I covers foundational methods like causal graphs and regression; Part II explores modern, tech-focused approaches like CATE and meta-learners.🔑Best Practices and Advice🔏Engineer Python projects like a PRO: Guides AI engineers on structuring Python projects using modern tools like uv, ruff, and Docker Compose, while advocating for a monorepo setup to improve code quality, reproducibility, and scalability in real-world development.Top Python Code Quality Tools to Improve Your Development Workflow: Covers linters, formatters, type checkers, security scanners, test coverage, profiling, and CI/CD integration.Kate and Python language server:Explains how to configure the python-lsp-server in the Kate editor to work smoothly with Python virtual environments by using a custom bash script (pylsp_in_env) and enabling the ruff plugin for linting."AI Coffee" Grand Opening This Monday • A Story About Parameters and Arguments in Python Functions:Uses a coffee shop analogy to explain Python function parameters, covering positional and keyword arguments,*args and**kwargs, default values, and more.What does @Slot() do?: Explains the role of the @Slot() decorator in PySide6, showing that while it's optional for most signal-slot connections, it's required for thread-safe execution and slightly improves memory efficiency.🔍From the Cutting Edge: Raggy–RAG Without the Lag💥In RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines, Lauro et al. introduce raggy, a developer tool designed to simplify debugging and iterative development of Retrieval-Augmented Generation (RAG) pipelines. The study comes from researchers at the University of Pittsburgh and UC Berkeley.ContextRAG is a technique that combines a retriever and an LLM to generate responses based on external documents. It's widely used to build AI assistants that require domain-specific knowledge, with 86% of enterprise LLM deployments reportedly using it as of 2024.However, RAG pipelines are notoriously hard to debug. Retrieval and generation are deeply intertwined, and developers must tune many parameters (chunk size, retrieval method, prompt wording, etc.) while enduring long feedback loops, often involving time-intensive re-indexing. Existing tools don’t support rapid iteration or show how changes in one part affect the whole pipeline.Key Features of raggyComposable Python primitives for defining RAG pipelines (e.g., Query, Retriever, LLM, Answer).Interactive debugging interface that visualises chunk retrieval quality and generated outputs.Real-time parameter editing for chunk size, retrieval methods, LLM prompts, and more.Versioned checkpoints to rollback and test alternative pipeline states.Support for manual overrides, allowing direct selection of chunks or editing of LLM responses.Evaluation tools, including the ability to save “golden” answers and compare outputs.What This Means for Youraggy is especially relevant for machine learning engineers, LLM application developers, and data scientists working on question-answering systems, enterprise chatbots, or knowledge-intensive assistants. With raggy, you can debug your RAG pipeline interactively, isolate root causes of errors, and iterate without costly delays. It is designed to fit within Python-based workflows and support both experienced and novice developers alike.Examining the DetailsTo evaluate raggy’s effectiveness, the authors conducted a user study involving 12 developers with prior experience building production-grade RAG pipelines. Participants were asked to improve a baseline question-answering system over a corpus of 220 hospital documents. The study followed a think-aloud protocol, with participants engaging in tasks such as debugging poorly performing queries, handling noisy inputs, and rejecting irrelevant questions. The authors observed that developers consistently started by validating the retrieval component—manually inspecting and adjusting chunk size, retrieval methods, or number of chunks—before moving on to LLM generation. This retriever-first strategy persisted even when LLM components preceded retrieval in the pipeline, underscoring the centrality of retrieval quality in RAG debugging.raggy’s low-latency feedback was particularly well received. On average, 71.3% of parameter changes would have required document re-indexing in traditional workflows, yet participants could implement and test these changes instantly within raggy. The tool’s pre-materialisation of hundreds of vector indexes (across chunk sizes and retrieval methods) and its checkpointing mechanism for preserving intermediate pipeline states enabled this rapid iteration. Participants also appreciated how the tool integrated seamlessly with their existing Python code, automatically generating an interactive UI without requiring manual configuration. This reduced context switching and allowed them to stay focused on the debugging task.You can learn more by reading the entire paper or looking at the source code on GitHub.And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 133

Divya Anne Selvaraj
03 Sep 2024
Save for later

PythonPro #45: Converting DataFrames, Python Developer Survey, DBSCAN in 5 Minutes, and Web Scraping with Scrapy

Divya Anne Selvaraj
03 Sep 2024
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#45:Converting DataFrames, Python Developer Survey, DBSCAN in 5 Minutes, and Web Scraping with ScrapyHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published, Polars Cookbook, which shows you how to convert DataFrames and Series between Polars and pandas.News Highlights: Python Developer Survey: 55% use Linux, 6% still on Python 2; SuperTree enables interactive decision tree visuals in Jupyter; and OneBusAway launches Python and JavaScript SDKs for seamless data integration.Here are my top 5 picks from our learning resources today:Exploring the National Park Service API - Harvesting and Visualizing Data for National Parks🌲Web Scraping With Scrapy and MongoDB🕸️DBSCAN, Explained in 5 Minutes🧩Python packaging is a MESS📦Why I Still Use Python Virtual Environments in Docker🛳️And, today’s Featured Study, highlights how process mining, using tools like pm4py, can uncover insights into workflow efficiency, variability, and algorithmic performance.Stay awesome!Divya Anne SelvarajEditor-in-ChiefP.S.: This month’ssurvey is now live. Do take the opportunity to tell us what you think of PythonPro, request learning resources, and earn your one Packt Credit for this month.Sign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPython Developer Survey - 55% Use Linux, 6% Use Python 2: The 7th annual Python Developers Survey, which gathered responses from over 25,000 developers worldwide also found that Visual Studio Code is the leading IDE.supertree - Interactive Decision Tree Visualization: This Python package is designed to create interactive visualizations of decision trees within Jupyter Notebooks, Jupyter Lab, Google Colab, and similar environments that support HTML rendering.OneBusAway Launches Official Python and JavaScript SDKs: Developed as part of the Google Summer of Code, these SDKs simplify the incorporation of OneBusAway's data, offer consistent API usage across platforms, and include comprehensive documentation.💼Case Studies and Experiments🔬Exploring the National Park Service API - Harvesting and Visualizing Data for National Parks: Provides a step-by-step guide on accessing the API, retrieving data such as park entrance fees, and organizing it into a Pandas DataFrame for analysis.Code Without Any Syntax: Discusses an experiment in which the author uses an LLM to convert natural language instructions into functional Python code without traditional syntax.📊AnalysisMake magic with Mesop - python based web apps: Reviews Mesop, a newly released Python-based framework for building web apps. Read for tips to get started.Why I Prefer Django for My Projects: While acknowledging the strengths of Node.js and Express.js, the author of this article finds Django's holistic, secure, and efficient approach better suited to their needs in web development.🎓Tutorials and Guides🤓Web Scraping With Scrapy and MongoDB: Guides you through setting up a Scrapy project, building a web scraper, extracting data, and storing it in MongoDB. Read to also learn about testing and debugging techniques.Generate Images With DALL·E and the OpenAI API: Covers setting up the necessary environment, making API calls to create images from text prompts, handling image variations, and converting Base64 JSON responses to PNG files.Primer on Jinja Templating: Covers installation, basic usage, and advanced features like loops, conditional statements, and macros. Read to learn how to integrate Jinja with Flask to build a basic web project with dynamic web pages.How to Install Python on Your System - A Guide: Provides a comprehensive guide to installing Python on various systems, including Windows, macOS, Linux, iOS, and Android.Adventures building a spreadsheet engine in Python: Demonstrates using the Lark Python package to parse formulas and compute dependencies, employing a topological sort algorithm to determine the order of cell evaluation.How to write your first Genetic Algorithm — Knapsack Problem: Guides you through implementing a genetic algorithm using Python. Read to learn how to apply genetic algorithms to solve complex optimization problems.Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Explained in 5 Minutes: Provides a concise explanation of the DBSCAN algorithm, which identifies clusters in data based on spatial distance and detects outliers without needing to predefine the number of clusters.🔑Best Practices and Advice🔏Escaping from Anaconda's Stranglehold on macOS: Provides simple, non-technical instructions to move the .zshrc file, allowing users to switch between Anaconda and official Python installations without terminal commands.Why I Still Use Python Virtual Environments in Docker: Argues that virtual environments simplify the management of Python applications, particularly in production settings, by ensuring consistent and isolated environments across different stages of development.Python Classes - The Power of Object-Oriented Programming: Covers defining classes, creating objects, managing attributes and methods, and the benefits of using classes. Read to learn about advanced topics like inheritance.Python packaging is a MESS: Stress-tests nine Python package managers, including pip, conda, poetry, and newer tools like pixi and hatch, highlighting the historical issues and modern solutions in Python packaging.Use python -m http.server in SSL: Provides a custom script, ssl_server.py, that wraps http.server to enable serving static sites over HTTPS using a self-signed SSL certificate. Read to learn how to serve static content securely.🔍Featured Study: Mastering Robotic Control with PyRoboCOP for Complex Tasks💥In "Navigating Process Mining: A Case Study using pm4py," Kovács et al., explore the application of the pm4py library in analysing road traffic fine management processes. The study aims to demonstrate how process mining can uncover key insights into process efficiency and optimisation.ContextProcess mining is a technique that combines data mining and business process management to analyse event logs generated by information systems. It is particularly effective for uncovering hidden patterns, identifying bottlenecks, and optimising workflows. The study focuses on applying the pm4py library, an open-source Python tool, to a real-world road traffic fine management process. This approach offers a deeper understanding of process execution compared to traditional business intelligence tools.Key FindingsThe study's application of process mining to road traffic fine management revealed significant insights into process variability, algorithmic performance, and workflow complexity:Process Variants: The analysis identified 231 distinct process variants, with one variant accounting for 56,482 cases (approximately 37.6% of the total 150,370 cases), indicating a dominant workflow path.Algorithm Performance: Three process mining algorithms were evaluated:Alpha Miner: Revealed causal dependencies between activities, achieving simplicity and precision scores of 0.66.Inductive Miner: Employed a recursive approach to construct process models, scoring 0.62 in simplicity and 0.58 in precision.Heuristic Miner: Utilised heuristics to infer process models from event data, achieving a perfect precision score of 1.0 but a lower simplicity score of 0.54.Start and End Events: The process log analysis showed that 'Create Fine' was the most frequent start event, occurring 150,370 times. Multiple end events, such as 'Send Fine', 'Payment', and 'Send for Credit Collection,' were identified, indicating diverse process pathways.Process Discovery and Visualisation: The discovered models allowed a detailed understanding of workflow structures and dependencies. Each mining approach had strengths and limitations in capturing the process dynamics, with pm4py proving effective in facilitating process mining tasks.What This Means for YouThis study is relevant to data scientists, business analysts, and operations managers interested in optimising business processes. The pm4py library, as demonstrated in this case study, provides practical tools for analysing complex workflows, identifying inefficiencies, and improving operational efficiency. The insights gained can be applied to other business processes, making it a valuable resource for those aiming to enhance process performance.Examining the DetailsThe study used the pm4py library to analyse an event log related to the management of road traffic fines, covering activities such as creating fines, sending fines, adding penalties, managing appeals, and handling payments. The analysis involved three process mining algorithms—Alpha Miner, Inductive Miner, and Heuristic Miner—to discover process models from the event log data. The evaluation of simplicity and precision across these algorithms revealed that the Heuristic Miner achieved the highest precision score of 1.0, while the Alpha Miner provided a balance between simplicity and accuracy.You can learn more by reading the entirepaper and accessing the pm4py library.🧠 Expert insight💥Here’s an excerpt from “Chapter 10: Interoperability with Other Python Libraries” in the Polars Cookbook,by Yuki Kakegawa, published in August 2024.Converting to and from a pandas DataFrameMany of you have used pandas before, especially in your day-to-day work. Although pandas and Polars are often compared as one-or-the-other tools, you can use these tools to supplement each other.📚Related Titles from PacktUnderstand key data science algorithms with Python-based examplesIncrease the impact of your data science solutions by learning how to apply existing algorithmsTake your data science solutions to the next level by learning how to create new algorithmsGet the eBook for $35.99 $24.99!Conduct Bayesian data analysis with step-by-step guidanceGain insight into a modern, practical, and computational approach to Bayesian statistical modelingEnhance your learning with best practices through sample problems and practice exercisesGet the eBook for $55.99 $38.99!Polars allows you to convert between pandas and Polars DataFrames, which is exactly what we’ll cover in this recipe.Getting readyYou needpandas andpyarrowinstalled for this recipe to work. Execute the following code to make sure that you havethem installed:pip install pandas pyarrowHow to do it...Here’s how to convert to and from pandas DataFrames. We’ll first create a Polars DataFrame and then go through ways to convert back and forth between Polarsand pandas:Create a Polars DataFrame from aPython dictionary:df = pl.DataFrame({ 'a': [1,2,3], 'b': [4,5,6]})type(df)The preceding code will return thefollowing output:>> polars.dataframe.frame.DataFrameConvert a Polars DataFrame to a pandas DataFrame using the.to_pandas()method:pandas_df = df.to_pandas()type(pandas_df)The preceding code will return thefollowing output:>> pandas.core.frame.DataFrameConvert a pandas DataFrame to a Polars DataFrame using the.from_pandas()method:df = pl.from_pandas(pandas_df)type(df)The preceding code will return thefollowing output:>> polars.dataframe.frame.DataFrameIf you want to allow zero copy operations, then you need to enable theuse_pyarrow_extension_arrayparameter:df.to_pandas(use_pyarrow_extension_array=True).dtypesThe preceding code will return thefollowing output:>>a int64[pyarrow]b int64[pyarrow]dtype: objectYou can also create a Polars DataFrame by wrapping a pandas DataFrameusingpl.DataFrame():type(pl.DataFrame(pandas_df))The preceding code will return thefollowing output:>> polars.dataframe.frame.DataFrameHow it works...Polars has built-in methods to interoperate with pandas such as.from_pandas() and.to_pandas(). Each method is descriptive enough that you can see that .from_pandas() is used for reading data into Polars from pandas, whereas .to_pandas()is used to convert Polars objectsinto pandas.Theuse_pyarrow_extension_arrayparameter of the.to_pandas()method uses PyArrow-supported arrays instead of NumPy arrays for the columns within the pandas DataFrame. This enables zero-copy operations and maintains the integrity ofnull values.There’s more...You can convert to and from a pandas Series to aPolars Series:s = pl.Series([1,2,3])type(s.to_pandas())The preceding code producesthe following:>> pandas.core.series.SeriesThe.from_pandas()method returns a Series object when a pandas Series waspassed in:type(pl.from_pandas(s.to_pandas()))The preceding code producesthe following:>> polars.series.series.SeriesPackt library subscribers cancontinue reading the entire book for free. You can buy the Polars Cookbook,by Yuki Kakegawa,here.Get the eBook for $35.99 $24.99!And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you aPythonlearning resource on a particular subject, take thesurveyor just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 80
Divya Anne Selvaraj
15 Oct 2024
Save for later

PythonPro #51: Python 3.13 REPL Enhancements, Python 3.12 vs. 3.13, and Visualizing Named Entities in Text

Divya Anne Selvaraj
15 Oct 2024
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#51Notion for StartupsThousands of startups use Notion as a connected workspace to create and share docs, take notes, manage projects, and organize knowledge—all in one place.We’re offering 6 months of new Plus plans, including unlimited Notion AI so you can try it all for free!To redeem the Notion for Startups offer:1. Submit an application using our custom link: https://ntn.so/packt and select Packt on the partner list.2. Include our partner key: STARTUP4110P19151Get your Free 6-month Notion Plus Acceess!Hi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, Python Natural Language Processing Cookbook - Second Edition, which explains how to use the displaCy library from spacy to visualize named entities in text.News Highlights: PEP 762 in Python 3.13 adds multi-line editing, syntax highlighting, and custom commands to the REPL, and Pyinstrument 5 introduces a flamegraph timeline view for better code execution visualization.Here are my top 5 picks from our learning resources today:Python 3.12 vs Python 3.13 – performance testing⚡️Exploring Infrastructure as Code (IaC) with Python: AWS CDK, Terraform CDK, and Pulumi🏗️lintsampler : a new way to quickly get random samples from any distribution🎲Python and SysV shared memory🧠Gradient-Boosting anything (alert: high performance)🚀And, today’s Featured Study, presents a method using LLMs to generate precise, transparent code transformations, improving accuracy and efficiency for compiler optimizations and legacy refactoring.Stay awesome!Divya Anne SelvarajEditor-in-ChiefP.S.:This month's survey is still live, do take the opportunity to leave us your feedback, request a learning resource, and earn your one Packt credit for this month.Your cloud deserves dedicated data protection94% of cloud tenants were targeted last year, and 62% were successfully compromised.The hard truth is that organizations are having a hard time securing their cloud data—and cyberattackers are ready to exploit that challenge.Here’s a handy resource you’ll want with you as you map out your plan: Orchestrating the Symphony of Cloud Data Security.You’ll learn how to: Overcome the challenges of securing data in the cloud, Navigate multi cloud data security, and Balance data security with cloud economicsDownload Your Complimentary Copy NowSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPEP 762 – REPL-acing the default REPL: As of Python 3.13, the default REPL has been replaced with a Python-based version (PEP 762), offering modern features like multi-line editing, syntax highlighting, and custom commands.Pyinstrument 5 - Flamegraphs for Python: The new version of the Python statistical profiler introduces a new flamegraph-style timeline view for visualizing code execution, improves on previous timeline modes, and more.💼Case Studies and Experiments🔬Moving all our Python code to a monorepo: pytendi: Describes the migration of Attendi’s Python codebase into a monorepo using the Polylith architecture to improve code discoverability, reusability, and developer experience.How Maintainable is Proficient Code? A Case Study of Three PyPI Libraries: Aims to help you recognize when proficient coding might hinder future maintenance efforts.📊AnalysisIn the Making of Python Fitter and Faster: Provides insights into how Python's evolving interpreter architecture enhances execution speed, memory efficiency, and overall performance for modern applications.Python 3.12 vs Python 3.13 – performance testing: Tests on AMD Ryzen 7000 and Intel 13th-gen processors show Python 3.13 generally performs faster, especially in asynchronous tasks, but there are slowdowns in certain areas.🎓Tutorials and Guides🤓Build a Contact Book App With Python, Textual, and SQLite: Covers creating the app’s text-based interface (TUI), setting up a SQLite database for contact storage, and integrating both elements.Syntactic Sugar: Why Python Is Sweet and Pythonic: Covers various Pythonic constructs like operators, assignment expressions, loops, comprehensions, and decorators, and shows how they simplify code.The Ultimate Guide to Error Handling in Python: Provides a comprehensive guide to Python error handling, exploring common patterns like "Look Before You Leap" (LBYL) and "Easier to Ask Forgiveness than Permission" (EAFP).Exploring Infrastructure as Code (IaC) with Python: AWS CDK, Terraform CDK, and Pulumi: Explains how Python integrates with IaC tools to automate cloud infrastructure management.Web scraping of a dynamic website using Python with HTTP Client: Walks you through analyzing sites with JavaScript-rendered content and using the Crawlee framework to extract data in JSON format.lintsampler : a new way to quickly get random samples from any distribution: Introduces a Python package designed to easily and efficiently generate random samples from any probability distribution.Mastering Probability with Python: A Step-by-Step Guide with Simulations:Through examples like coin tosses, dice rolls, and event probabilities, this tutorial guides you on how to simulate and analyze real-world scenarios.🔑Best Practices and Advice🔏What's In A List—Yes, But What's *Really* In A List: Explains common pitfalls when multiplying lists and why it matters when working with mutable versus immutable data types.Yes, you need to duplicate your frontend business logic on the server: Explains why backend validation is essential to protect data integrity, regardless of frontend sophistication.Python and SysV shared memory: Explains how to wrap C functions like shmget, shmat, and shmctl for shared memory management, handling void pointers, and performing basic operations like writing to shared memory.Gradient-Boosting anything (alert: high performance): Explores using Gradient Boosting with various machine learning models, adapting LSBoost in the Python package mlsauce for both regression and classification tasks.Code Generation with ChatGPT o1-preview as a Story of Human-AI Collaboration: Through experiments in Python and C++, the author demonstrates that human-AI collaboration improves code generation, specifically in building sentiment analysis tools.🔍Featured Study: Don't Transform the Code, Code the Transforms💥In "Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs," researchers from Meta, Cummins et al., introduce a novel method called Code the Transforms (CTT), which leverages LLMs to generate precise code transformations rather than directly rewriting code.ContextCode transformation refers to rewriting or optimising existing code, a task essential for compiler optimisations, legacy code refactoring, or performance improvements. Traditional rule-based approaches to code transformations are difficult to implement and maintain. LLMs offer the potential to automate this process, but direct code rewriting by LLMs lacks precision and is challenging to debug. This study introduces the CTT method, where LLMs generate the transformation logic, making the process more transparent and adaptable.Key Featured of the CTT MethodChain-of-thought process: The method synthesises code transformations by iterating through input/output examples to create a precise transformation logic rather than rewriting code directly.Improved transparency and adaptability: The generated transformations are explicit, making them easier to inspect, debug, and modify when necessary.Higher precision: The method achieved perfect precision in 7 out of 16 Python code transformations, significantly outperforming traditional direct rewriting approaches.Reduced computational costs: By generating transformation logic instead of rewriting code, the method requires less compute and review effort compared to direct LLM rewriting.Iterative feedback loop: The method incorporates execution and feedback to ensure the generated transformations work as expected, leading to more reliable outcomes.What This Means for YouThis study is particularly beneficial for software engineers, developers, and those working on compiler optimisations or legacy code refactoring. By using this method, teams can reduce the time spent on manual code review and debugging, while improving the precision of code transformations.Examining the DetailsThe study's methodology involved testing 16 different Python code transformations across a variety of tasks, ranging from simple operations like constant folding to more complex transformations such as converting dot products to PyTorch API calls. The CTT method achieved an overall F1 score of 0.97, compared to the 0.75 achieved by the direct rewriting method. The precision of transformations ranged from 93% to 100%, with tasks like dead code elimination and redundant function elimination reaching near-perfect performance. In contrast, the traditional direct LLM rewriting approach showed an average precision of 60%, and was prone to more frequent errors, requiring manual correction.You can learn more by reading the entire paper.🧠 Expert insight💥Here’s an excerpt from “Chapter 7: Visualizing Text Data” in the book, Python Natural Language Processing Cookbook - Second Edition by Zhenya Antić and Saurabh Chakravarty, published in September 2024.VisualizingNERNamed entity recognition, orNER, is a very useful tool for quickly finding people, organizations, locations, and other entities in texts. In order to visualize them better, we can use thedisplacypackage to create compelling andeasy-to-read images.After working through this recipe, you will be able to create visualizations of named entities in a text using different formatting options and save the results ina file.Getting readyThedisplaCylibrary is part of thespacypackage. You need at least version 2.0.12 of thespacypackage fordisplaCyto work. The version in thepoetryenvironment andrequirements.txtfileis 3.6.1.The notebook is locatedathttps://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/Chapter07/7.3_ner.ipynb.How to do it...We will usespacyto parse the sentence and then thedisplacyengine to visualize thenamed entities:Import bothspacyanddisplacy:import spacyfrom spacy import displacyRun the languageutilities file:%run -i "../util/lang_utils.ipynb"Define the textto process:text = """iPhone 12: Apple makes jump to 5GApple has confirmed its iPhone 12 handsets will be its first to work on faster 5G networks.The company has also extended the range to include a new "Mini" model that has a smaller 5.4in screen.The US firm bucked a wider industry downturn by increasing its handset sales over the past year.But some experts say the new features give Apple its best opportunity for growth since 2014, when it revamped its line-up with the iPhone 6."5G will bring a new level of performance for downloads and uploads, higher quality video streaming, more responsive gaming,real-time interactivity and so much more," said chief executive Tim Cook.There has also been a cosmetic refresh this time round, with the sides of the devices getting sharper, flatter edges.The higher-end iPhone 12 Pro models also get bigger screens than before and a new sensor to help with low-light photography.However, for the first time none of the devices will be bundled with headphones or a charger."""In this step, we process the text using the small model. This gives us aDocobject. We then modify the object to contain a title. This title will be part of theNER visualization:doc = small_model(text)doc.user_data["title"] = "iPhone 12: Apple makes jump to 5G"Here, we set up color options for the visualization display. We set green for theORG-labeled text and yellow for thePERSON-labeled text. We then set theoptionsvariable, which contains the colors. Finally, we use therendercommand to display the visualization. As arguments, we provide theDocobject and the options we previously defined. We also set thestyleargument to"ent", as we would like to display just entities. We set thejupyterargument toTruein order to display directly inthe notebook:colors = {"ORG": "green", "PERSON":"yellow"}options = {"colors": colors}displacy.render(doc, style='ent', options=options, jupyter=True)The output should look like that inFigure 7.4.Figure 7.4 – Named entities visualizationNow we save the visualization to an HTML file. We first define thepathvariable. Then, we use the samerendercommand, but we set thejupyterargument toFalsethis time and assign the output of the command to thehtmlvariable. We then open the file, write the HTML, and closethe file:path = "../data/ner_vis.html"html = displacy.render(doc, style="ent", options=options, jupyter=False)html_file= open(path, "w", encoding="utf-8")html_file.write(html)html_file.close()This will create an HTML file with theentities visualization.Packt library subscribers can continue reading the entire book for free. You can buy Python Natural Language Processing Cookbook - Second Edition,here.Get the eBook for $35.99 $17.99!Get the Print Book for $44.99 $30.99!And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you aPythonlearning resource on a particular subject, take thesurveyor just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 73

Divya Anne Selvaraj
22 Oct 2024
Save for later

PythonPro #52: AI-Powered Vulnhuntr for Python, SageMaker Core SDK, and Exploring User Behaviour with Python

Divya Anne Selvaraj
22 Oct 2024
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#52AI-Powered Vulnhuntr for Python, SageMaker Core SDK, and Exploring User Behaviour with PythonHi ,Welcome to a brand new issue of PythonPro!In today’sExpert Insight we bring you an excerpt from the recently published book, Building AI Applications with OpenAI APIs - Second Edition, which discusses how to create a language translation desktop app using OpenAI's ChatGPT API and Microsoft Word.News Highlights: Protect AI to release Vulnhuntr, an AI tool for detecting Python zero-day vulnerabilities; Amazon launches SageMaker Core, a Python SDK simplifying machine learning with object-oriented interfaces; and PyCharm becomes the official IDE of OpenCV as JetBrains joins as a Silver Member.Comprehensive Python Cheatsheet📚Exploring User Behavior: A Python Case Study of Bike-Sharing Company Dataset🚴‍♂️Python's property(): Add Managed Attributes to Your Classes🔧Python approach to the Semantic Web: exploring linked data and RDF🌐Assert vs. Raise: When to Use Each in Your ML/AI Projects⚠️And, today’s Featured Study, presents ChangeGuard, a tool designed to compare code behaviour before and after changes to detect functionality modifications.Stay awesome!Divya Anne SelvarajEditor-in-ChiefP.S.:This month's survey is still live, do take the opportunity to leave us your feedback, request a learning resource, and earn your one Packt credit for this month.Sign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsOpen source LLM tool primed to sniff out Python zero-days: Researchers with Seattle-based Protect AI will soon release Vulnhuntr, an AI-powered open-source tool that uses Claude AI to detect zero-day vulnerabilities in Python codebases by analyzing entire call chains for security issues.Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker: The SDK will simplify the machine learning lifecycle by replacing complex JSON structures with object-oriented interfaces.Press Release: PyCharm Becomes Official IDE of OpenCV, JetBrains Joins as Silver Member: As a Silver Member, JetBrains will financially support OpenCV, ensuring its resources remain free.💼Case Studies and Experiments🔬Part 2: Data Quality Dashboard: A Visual Approach to Monitoring Expectations in Databricks: Explains how to quickly identify issues using graphical representations like pie charts and bar charts.Exploring User Behavior: A Python Case Study of Bike-Sharing Company Dataset: UsesPython to uncover user behaviour patterns and develop strategies to convert casual riders into annual members.📊Analysis🎥Russell Keith-Magee on Beeware, packaging, GUI & money in Python: Focuses on the challenges of cross-platform Python packaging, particularly for desktop and mobile platforms and discusses how BeeWare helps developers.Should you use uv’s managed Python in production?: Advises careful consideration of uv’s production readiness, noting recent improvements but recommending thorough evaluation based on project-specific risks.🎓Tutorials and Guides🤓Python's property(): Add Managed Attributes to Your Classes: Covers creating read-only, read-write, and computed properties, logging, and more, while maintaining a stable public API for your classes.A Multi-Agent AI Chatbot App using Databutton and Swarm: Explains how different agents can collaborate and hand off tasks, with an example of a multi-agent healthcare chatbot that connects users to specialized agents.Understanding Pluggable Authentication Module (PAM) and Creating a Custom One in Python: Covers PAM’s architecture, module stacks, and control flags and walks you through building and integrating a custom PAM.Python approach to the Semantic Web: exploring linked data and RDF: Covers creating RDF triples, querying SPARQL endpoints, and visualizing relationships using NetworkX.Understanding Web Scraping in Python and Scrapy: Explains what web scraping is, its significance, and the tools required, such as BeautifulSoup, Requests, and Scrapy.🎥A hand-holding guide to writing FUSE-based filesystems in Python: Covers the process of creating Python-based FUSE file systems, from basic functionality to more advanced features like file attributes.Adding syntax to the cpython interpreter: Demonstrates how to add new syntax to Python, specifically making ternary statements default to None when no else condition is provided, similar to Ruby.🔑Best Practices and Advice🔏What I Learned from Making the Python Backend for YouTube Transcript Optimizer: Explains the process of building the Python backend for a YouTube Transcript Optimizer using FastAPI and SQLmodel.Comprehensive Python Cheatsheet: An extensive resource covering a wide array of Python topics, including syntax, data structures, and advanced concepts.How to Use Lambda Functions in Python: Covers their syntax, common use cases with functions like map(), filter(), and sorted(), along with advantages, limitations, and best practices for effective use in simplifying code.Assert vs. Raise: When to Use Each in Your ML/AI Projects: Discusses when to use assert for internal checks during development and raise for handling user-facing errors in ML/AI projects to ensure robust error handling.Structural Pattern Matching in Python: Explores customizing pattern matching for classes, extracting nested data, and common limitations in Python’s implementation.🔍Featured Study: ChangeGuard - Validating Code Changes via Pairwise Learning-Guided Execution💥In "ChangeGuard: Validating Code Changes via Pairwise Learning-Guided Execution," Gröninger et al. present a tool called ChangeGuard, which compares code behaviour before and after changes to determine whether the modifications alter functionality.ContextValidating whether code changes preserve intended behaviour is a key challenge in software development, particularly when changes are deep within complex projects. Developers may make modifications to improve readability, performance, or to fix bugs, but unintended changes in functionality can lead to errors. Current methods, such as regression testing, often fail to catch these subtle changes. This study is relevant because it introduces a more reliable approach—ChangeGuard, which uses pairwise learning-guided execution. This approach involves running two versions of a code snippet simultaneously and predicting values to ensure the code runs correctly, even in complex scenarios.Key Featured of ChangeGuardPairwise learning-guided execution: Simultaneously executes old and new versions of code to compare their runtime behaviour.Value injection: Predicts and injects missing or uninitialised values, ensuring the code executes smoothly and reaches all relevant paths.High precision and recall: Achieves 77.1% precision and 69.5% recall in identifying behaviour-altering code changes.Extensive evaluation: Tested on 224 manually annotated code changes and datasets generated by automated refactoring tools.Outperforms regression tests: Traditional regression tests only achieved 7.6% recall in identifying semantics-changing code modifications.What This Means for YouThis paper will be most useful for software developers, especially those working with large and complex codebases. It provides practical insights into validating code changes more effectively than existing methods, offering a way to catch unintended behaviour early in the development process. Developers using automated refactoring tools or large language models like GPT-4 will particularly benefit from ChangeGuard's ability to detect subtle, behaviour-altering modifications.Examining the DetailsChangeGuard's methodology is based on pairwise learning-guided execution, an extension of an existing technique. It predicts missing values dynamically, ensuring more execution paths are covered than previous approaches. The tool was evaluated on 224 annotated code changes from popular Python open-source projects, showing high accuracy in detecting semantics changes. Additionally, ChangeGuard was applied to automated refactoring tools and large language models like GPT-3.5 and GPT-4, where it found 87 out of 187 and 143 out of 258 code changes to unexpectedly alter behaviour. This comprehensive testing provides strong evidence for ChangeGuard's reliability and robustness.You can learn more by reading the entire paper and accessing ChangeGuard.🧠 Expert insight💥Here’s an excerpt from “Chapter 6: Language Translation Desktop App with the ChatGPT API and Microsoft Word” in the book, Building AI Applications with OpenAI APIs - Second Edition by Martin Yanev, published in October 2024.Integrating the ChatGPT API with Microsoft OfficeIn this section, we will explore how to set up our project and install thedocxPython library to extract text fromWorddocuments. Thedocx library is a Python package that allows us to read and writeMicrosoft Word (.docx) files and provides a convenient interface to access information stored inthese files.The first step is to initiate your work by creating a new directory calledTranslation Appand loading it with VSCode. This will enable you to have a dedicated area to craft and systematize your translation app code. Activate your virtual environment from the terminal window following the steps outlined inChapter 1,Getting Started with the ChatGPT API forNLP Tasks.To run the language translation desktop app, you will need to install thefollowing libraries:openai: Theopenailibrary allows you to interact with the OpenAI API and perform variousNLP tasksdocx: Thedocxlibrary allows you to read and write Microsoft Word.docxfilesusing Pythontkinter: Thetkinterlibrary is a built-in Python library that allows you to createGraphical User Interfaces(GUIs) for yourdesktop appAstkinteris a built-in library, there is no need for installation since it already exists within your Python environment. To install theopenaianddocxlibraries, access the VSCode terminal, and then execute thefollowing commands:pip install openaipip install python-docxTo access and read the contents of a Word document, you will need to create a sample Word file inside your project. Here are the steps to create a newWord file:In your project, right-click on the project directory, selectNew Folder, and nameitfiles.Right-click on thefilesfolder and selectNew File.In the edit field that appears, enter a filename with the.docxextension – forexample,info.docx.Press theEnterkey to createthe file.Once the file is created, open it usingMicrosoft Word.You can now add some text or content to this file, which we will later access and read using thedocxlibrary in Python. For this example, we have created an article about New York City. You can find the complete article here:https://en.wikipedia.org/wiki/New_York_City. However, you can choose any Word document containing text that you wantto analyze:The United States’ most populous city, often referred to as New York City or NYC, is New York. In 2020, its population reached 8,804,190 people across 300.46 square miles, making it the most densely populated major city in the country and over two times more populous than the nation’s second-largest city, Los Angeles. The city’s population also exceeds that of 38 individual U.S. states. Situated at the southern end of New York State, New York City serves as the Northeast megalopolis and New York metropolitan area’s geographic and demographic center - the largest metropolitan area in the country by both urban area and population. Over 58 million people also live within 250 miles of the city. A significant influencer on commerce, health care and life sciences, research, technology, education, politics, tourism, dining, art, fashion, and sports, New York City is a global cultural, financial, entertainment, and media hub. It houses the headquarters of the United Nations, making it a significant center for international diplomacy, and is often referred to as theworld’s capital.Now that you have created the Word file inside your project, you can move on to the next step, which is to create a new Python file calledapp.pyinside theTranslation Approot directory. This file will contain the code to read and manipulate the contents of the Word file using thedocxlibrary. With the Word file and the Python file in place, you are ready to start writing the code to extract data from the document and use it inyour application.To test whether we can read Word files with thedocx-pythonlibrary, we can implement the following code in ourapp.pyfile:import docxdoc = docx.Document("<full_path_to_docx_file>")text = ""for para in doc.paragraphs: text += para.textprint(text)Make sure to replace<full_path_to_docx_file>with the actual path to your Word document file. Obtaining the file path is a simple task, achieved by right-clicking on your.docxfile in VSCode and selecting theCopy Relative Pathoption from thedrop-down menu.Once you have done that, run theapp.pyfile and verify the output. This code will read the contents of your Word document and print them to the console. If the text extraction works correctly, you should see the text of your document printed in the console (seeFigure 6.1). Thetextvariable now holds the data frominfo.docxas aPython string.Figure 6.1 – Word text extraction console outputPackt library subscribers can continue reading the entire book for free. You can buy Building AI Applications with OpenAI APIs - Second Edition,here.Get the eBook for $31.99 $21.99!Get the Print Book for $39.99!And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you aPythonlearning resource on a particular subject, take thesurveyor just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 62

Divya Anne Selvaraj
03 Jun 2025
Save for later

PythonPro #71: Pandas 3.0 Ditches NumPy, Pyrefly vs. ty, and HuggingFace for Object Detection

Divya Anne Selvaraj
03 Jun 2025
Bite-sized actionable content, practical tutorials, and resources for Python programmers.#71Pandas 3.0 Ditches NumPy, Pyrefly vs. ty, and HuggingFace for Object DetectionHi ,Welcome to a brand new issue of PythonPro!News Highlights: Pandas 3.0 adopts PyArrow for faster string handling; Meta releases Pyrefly, a Rust-based type checker for large Python codebases; String Grouper gets 8× faster; and Muffin tops new ASGI benchmarks, beating FastAPI on JSON throughput.My top 5 picks from today’s learning resources:Pyrefly vs. ty: Comparing Python’s Two New Rust-Based Type Checkers⚙️Building an MCP server as an API developer🛰️Object Detection with Python and HuggingFace Transformers🖼️Matplotlib Alternatives That Actually Save You Time⏱️What's the Difference Between Zipping and Unzipping Your Jacket? • Unzipping in Python🧥And, in From the Cutting Edge, we introduce dro, a Python library that makes state-of-the-art distributionally robust optimization techniques practical and scalable for machine learning by unifying 79 methods into a single modular framework compatible with scikit-learn and PyTorch.Stay awesome!Divya Anne SelvarajEditor-in-ChiefSign Up|Advertise🐍 Python in the Tech 💻 Jungle 🌳🗞️NewsPython Pandas Ditches NumPy for Speedier PyArrow: Pandas 3.0 introduces PyArrow as a required dependency and default for string data, marking a shift toward faster, columnar data processing—though full replacement of NumPy as the backend remains experimental.Meta Open-Sources Pyrefly, a High-Performance Python Type Checker in Rust: The type checker is designed to replace the OCaml-based Pyre and support responsive, scalable IDE typechecking—especially for large codebases like Instagram.Even Faster String matching in Python: The latest version of String Grouper, a Python library for fuzzy string matching using TF-IDF and cosine similarity, is now 8× faster than its original release.Benchmarks for MicroPie v0.9.9.8: A benchmark comparing seven ASGI frameworks using a simple JSON "hello world" response showed that Muffin delivered the highest performance while FastAPI trailed with the lowest throughput.MonsterUI: Bringing Beautiful UI to FastHTML: MonsterUI is a Python library that simplifies frontend development for FastHTML apps by providing pre-styled, responsive UI components with smart defaults.💼Case Studies and Experiments🔬Rhyme Analysis of Virgil’s Æneid in English translation — Part 2: Uses Python and CMUDict to detect rhyme patterns in Edward Fairfax Taylor’s English translation of Virgil’s Æneid, achieving over 92% accuracy in capturing the Spenserian stanza structure.A Python frozenset interpretation of Dependent Type Theory: Illustrates how Python can serve as an intuitive metatheory for understanding complex type-theoretic concepts through executable, computable analogues.📊AnalysisPyrefly vs. ty: Comparing Python’s Two New Rust-Based Type Checkers: Compares two emerging Rust-based Python type checkers—pyrefly (by Meta) and ty (by Astral)—based on speed, design goals, incrementalization strategies, and type inference behavior.From Rows to Vectors: Under the Hood of DFEmbedder — A DataFrame Vector Store: Introduces DFEmbedder, an open source Python library that transforms tabular data into a low-latency vector store using static CPU-based embeddings.🎓Tutorials and Guides🤓Making C and Python Talk to Each Other: Covers locating and including Python.h, initializing and finalizing the Python interpreter, loading Python modules, calling Python functions (with and without arguments), and managing memory using PyObject references.Building an MCP server as an API developer: Walks you through building and deploying a stateless MCP server using Python, FastAPI, and AWS services, illustrating how to integrate OAuth-secured Strava APIs and support Streamable HTTP transport for LLM-assisted applications.Object Detection with Python and HuggingFace Transformers: Walks you through building an object detection pipeline while explaining how Transformer-based models like Detection Transformer (DETR) work and demonstrating a complete implementation.Expected Goals on Target (xGOT) 101: Explains a post-shot metric that improves on xG by factoring in shot placement, power, and trajectory—demonstrating how analysts use it to evaluate strikers’ finishing skill and goalkeepers’ shot-stopping, with a Python template.Regression Trees Explained: The Most Intuitive Intoduction: Offers a step-by-step explanation and Python implementation of regression trees, illustrating how they partition feature space and make predictions through recursive variance minimization.Efficiently dissolving adjacent polygons by attributes in a large GIS database: Demonstrates a step-by-step method with SQL and Python to cluster, merge, and reduce over 750,000 land-use records into fewer, generalized geometries.Tracking Urban Expansion Through Satellite Imagery: Covers selecting satellite imagery, preparing training data, computing indices, running classification, interpreting outputs, and validating results.🔑Best Practices and Advice🔏Matplotlib Alternatives That Actually Save You Time: Compares five modern Python visualization libraries—Plotly, Seaborn, Vega-Altair, Bokeh, and Plotnine—as more efficient, interactive, and expressive alternatives to Matplotlib.Automate Your Life: Five Everyday Tasks Made Easy With Python: Showcases five simple, real-world Python scripts—generating QR codes, converting text to speech, translating text, taking screenshots, and censoring profanity.Serving Deep Learning in AdTech: Offers practical guidance on choosing a model-serving approach based on system constraints, latency, and deployment needs.What's the Difference Between Zipping and Unzipping Your Jacket? • Unzipping in Python: Explains how Python’s zip() function not only combines multiple iterables into grouped tuples but can also be used in reverse—with unpacking—to "unzip" them back into separate iterables.The Chores Rota (#3 in The `itertools` Series • `cycle()` and Combining Tools): Uses a fictional story to teach Python's itertools.cycle() and zip() functions, illustrating how to create synchronized infinite iterators for task rotation.🔍From the Cutting Edge: DRO for ML💥In "DRO: A Python Library for Distributionally Robust Optimization in Machine Learning," Liu et al. introduce dro, a Python library that brings together state-of-the-art distributionally robust optimization (DRO) techniques into a single, modular, and scalable software package for supervised learning tasks.ContextDRO is a technique used in machine learning to build models that remain reliable under uncertainty—especially when there's a mismatch between training and deployment data distributions. This is crucial in high-stakes domains like healthcare, finance, and supply chain systems. DRO typically addresses this challenge by considering a worst-case loss over an ambiguity set: a collection of distributions close to the empirical training data under some metric.However, despite its theoretical promise, DRO has seen limited practical adoption due to the computational complexity of solving min-max problems and the lack of general-purpose libraries. Existing tools often either focus on a narrow subset of formulations or require users to manually reformulate and solve optimisation problems using external solvers.The dro library directly addresses these gaps. It offers the first comprehensive, ML-ready implementation of diverse DRO formulations within a unified, modular Python package. Compatible with both scikit-learn and PyTorch, dro abstracts away the need for manual optimisation reformulations and enables scalable training, evaluation, and experimentation with robust models. This makes cutting-edge DRO techniques accessible to both practitioners and researchers, and usable in real-world workflows.Key Features of droComprehensive coverage: The library supports 79 DRO method combinations across 14 formulations and 9 model backbones, covering linear, kernel-based, tree-based, and neural models.Seamless integration: All components follow the scikit-learn estimator interface and are compatible with PyTorch, enabling easy integration into existing machine learning workflows.Significant speed improvements: The library applies vectorisation, kernel approximation, and constraint reduction techniques to achieve 10× to 1000× speedups over baseline implementations.Flexible customisation: Users can personalise loss functions, model architectures, and robustness parameters through a modular design that supports both exact and approximate optimisation.Built-in diagnostics: The package includes tools to generate worst-case distributions and evaluate out-of-sample performance, supporting principled model assessment under distribution shift.What This Means for YouThe dro library is especially relevant for machine learning researchers, applied data scientists, and engineers working in high-stakes or shift-prone domains such as healthcare, finance, and logistics. It offers a practical pathway to integrate distributional robustness into real-world pipelines without requiring manual optimisation reformulations or deep expertise in convex programming. By unifying a wide range of DRO methods within a standardised, high-performance framework, dro enables users to develop models that remain reliable under uncertainty, experiment with robustness techniques at scale, and bridge the gap between theoretical advances and practical deployment.Examining the DetailsThe dro library operationalises Distributionally Robust Optimization by solving min–max problems where the outer minimisation spans a model class and the inner maximisation ranges over an ambiguity set of plausible distributions. This ambiguity set is defined using distance metrics such as Wasserstein distances, f-divergences (KL, χ², Total Variation, CVaR), kernel-based distances like Maximum Mean Discrepancy (MMD), and hybrid measures including Sinkhorn and Moment Optimal Transport distances.Exact optimisation is handled through disciplined convex programming using CVXPY, applicable to linear and kernel-based models with standard losses such as hinge, logistic, ℓ₁, and ℓ₂. For more complex architectures like neural networks and tree ensembles, the library employs approximate optimisation strategies using PyTorch, LightGBM, and XGBoost.To enhance scalability, the authors implement performance-optimisation techniques such as constraint vectorisation, Nyström kernel approximation, and constraint subsampling or sparsification, significantly reducing computational overhead without sacrificing accuracy. The methodology is underpinned by modular abstractions that isolate model type, loss function, and robustness metric, making the framework both extensible and maintainable.Additional tooling supports synthetic and real-world dataset generation, worst-case distribution derivation, and corrected out-of-sample evaluation.You can learn more by reading the entire paper here and accessing the library on GitHub.And that’s a wrap.We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.reverse{display:table;width: 100%;
Read more
  • 0
  • 0
Success Subscribed successfully to !
You’ll receive email updates to every time we publish our newsletters.