Sunday, June 23, 2024

PyCon 2024

 

The main days of PyCon 2024 was held in Pittsburgh over the weekend of May 17-19. This year I stayed at the hotel connected to the conference center, which makes everything much more convenient. The hotel was connected to the David L. Lawrence Convention center by a walkway over Penn Street. I will admit though that after getting to the hotel I did get lost trying to find this walkway. But this is pretty much par for the course as it takes about a day to get used to navigating the hotel and where everything is in the convention center.

Day 1, Friday, May 17th

On Friday, the first talk I attended was by Jeff Glass, who introduced interactive documentation with PyScript. Imagine having a REPL right inside your documentation! Jeff demonstrated how developers can integrate interactive Python sessions into any webpage using PyScript. This not only enhances user engagement but also makes learning more dynamic and practical. It was fascinating to see how tools like Sphinx, MkDocs, and readthedocs can be seamlessly integrated, making the examples in your documentation come alive. The concept of a fully serverless interactive learning session is something that I found particularly interesting. 

Next up was Swarna Swapna Ilamathy’s session on tackling the issue of dead code. She shed light on the importance of identifying and removing unused code to maintain project health and agility. The session was packed with practical solutions, focusing on Python libraries like Vulture and Deadcode. Swarna emphasized the significance of integrating these tools into CI/CD pipelines for streamlined dead code removal. 

By the time of the next talk, I had started to become comfortable with finding my way around the conference center. The next talk was about Ruff, a Python linter and code formatter written in Rust. I have attended talks before that talked about Ruff and was impressed with it. It promises to replace dozens of static analysis tools while being exponentially faster. The speaker, Charlie Marsh, took us through the internals of Ruff, highlighting specific optimizations and design decisions that contribute to its remarkable performance. Learning about how Ruff powers static analysis for major Python projects like NumPy and Pandas was inspiring. Even though Ruff is written in Rust, Charlie made the session accessible to everyone, focusing on the broader aspects of building performant developer tools. 

By this time, I was getting pretty hungry and headed down to the convention hall for lunch. They had a lot of great options. After I finished eating I wandered around the hall listening to the various exhibitors. The big names were there Snowflake, Microsoft, MongoDB, Microsoft, Nvidia, Google, JetBrains and many more. Most were doing scheduled quick demos like Microsoft doing demos on Github Copilot and Snowflake with Streamlit.

After lunch, the first session I went to was Sydney Runkle's talk on Pydantic. As a software engineer at Pydantic, she was well versed in talking about the optimizations available with Pydantic V2, now leveraging Rust for core validation logic. The performance improvements seemed substantial, with speedups ranging from 5 to 50 times compared to V1. Sydney shared a range of tips, from one-line fixes to larger design modifications, to get the most out of Pydantic. I think her focus on tagged unions for efficient validation of union types was particularly useful. 

Next up was a talk on game development by Esther Alter. This is a topic I'm not well versed in but it was fun and informative. She walked us through the process of developing a procedurally-generated monster collecting game, which she successfully released on Steam. Unlike typical game dev talks that focus on minimal examples, Esther covered a wide array of topics including fast development, robust testing, unique graphical effects, and engaging gameplay. Her insights into why Python is an excellent choice for game development were eye-opening, making me consider Python for future game projects.

The next talk I went to was by Valerio Maggio’s talk which expanded on Jeff Glass’s earlier session, exploring the use of PyScript for interactive documentation. Valerio emphasized the importance of good documentation in open-source projects and demonstrated how PyScript can enhance the interactivity of software docs. With features like multiple interpreters, a Pythonic API for the DOM, and non-blocking execution threads, PyScript offers a robust platform for creating engaging documentation. Seeing these features in action was particularly useful, showcasing how they can transform static documentation into a dynamic learning environment.

The final talk of the day was by Juan Altmayer Pizzorno, who introduced SlipCover, a new tool for Python code coverage with minimal overhead. Traditional tools like coverage.py can slow down tests significantly, but SlipCover reduces this overhead to just 5%. 

Day 2,  Saturday, May 18th

Day two started with a session by Junya Fukuda, who talked about the intricacies of using asyncio for event-driven programming. Junya shared insights on how Python’s asynchronous capabilities are harnessed to streamline decision-making processes in robotics. The talk covered various asynchronous libraries, including trio, trio-util, anyio, and asyncio. It was cool to see how these tools can operate efficiently within a single thread, making them ideal for resource-constrained environments. 

As a heavy FastAPI user, I was very interested in Mia Bajić’s talk on combining Django ORM with FastAPI. Mia shared her unique experience of using Django ORM within a FastAPI application, a setup not commonly documented. She highlighted the practical implementation, benefits, and challenges of this hybrid approach. The session was aimed at developers of all levels and provided valuable insights into integrating asynchronous and synchronous frameworks. This talk made me think about some possibilities for leveraging the strengths of both Django and FastAPI in a single application.

Next up for me was a session by Sebastián Flores who presented an engaging session on enhancing presentations with Python and data storytelling. He showcased various tools and libraries, such as Jupyter Notebook, RISE, Streamlit, and Quarto, alongside graphic libraries like matplotlib, seaborn, altair, PowerBI, Tableau, and Vizzu. This talk was great in that it encouraged people to think beyond traditional PowerPoint slides. I personally think PowerPoint is one of those necessary evils but it is what everyone expects, but data communication should be so much more than talking over a bunch of slides.

It was now time for lunch, I quickly grabbed lunch on the convention floor and headed outside to explore downtown a little bit. I was particularly searching for a bookstore and had seen on my phone that there was a Barnes & Noble within a few blocks of the convention center that I think is associated with Duquesne University. I was looking for something to read on the plane ride home. Unfortunately, there was really nothing in the store. The basement had textbooks and supplies for the school and the top floor was just some shelves of books that were discounted. So I left empty handed. I'm sure there's a bookstore somewhere downtown if I had more time to find it.

After lunch I decided to go to Jodie Burchell’s session on large language models (LLMs) addressed the critical issue of hallucinations in AI. I honestly almost skipped this and went to a talk about using LLMs as part of graph flow, but the talk had a lot of interesting information she had compiled about the statistics of types of hallucinations. Jodie explained the reasons behind these hallucinations and introduced tools like TruthfulQA, Hugging Face’s datasets, transformers packages, and LangChain to measure and mitigate them. The talk also covered retrieval augmented generation (RAG) as a technique to reduce hallucinations and improve the reliability of LLMs.

I'm very familiar with the set of data tools in the next talk and am a big fan of both Dask and Polars. Patrick Hoefler’s talk compared the latest developments in Pandas and Dask with other big data tools like Spark, DuckDB, and Polars. Patrick highlighted the improvements in Dask, including a new shuffle algorithm, a logical query planning layer, and a more efficient data model thanks to Pandas 2.0. The session used TPC-H benchmarks to demonstrate the performance gains and robustness of the new Dask DataFrame. 

The next talk I went to was different. It was a fun mix of art and mathematics. Alastair Stanley demonstrated how the principles of paper folding can be used to perform complex calculations, from basic arithmetic to solving cubic equations and proving irrationality. Using a custom Python library to simulate fold sequences, Alastair showed the surprising power of origami in computational contexts. This session was very cool.

The day concluded with Naveed Mahmud’s session on hybrid quantum-classical machine learning using Qiskit. I have had an interest in quantum computing for years and in particular Qiskit, so I was very interesting to see Naveed's use. Naveed explained how to build quantum circuits and integrate them with classical models for tasks like text classification and sentiment analysis. I thought sentiment analysis was an interesting test case for quantum computing and machine learning.

Day 3, Sunday, May 19th

The first talk I attended on the third day was by Alla Barbalat, who tackled the critical issue of intellectual property (IP) in the context of AI-generated code. As AI becomes more prevalent in code generation, the question of ownership arises. Alla addressed the legal implications of using open-source code as training data and the potential infringement of IP rights by AI systems. This talk was particularly enlightening, as it demystified the complex legal landscape surrounding AI and IP law. 

Next up was Saksham Sharma’s session on optimizing Python performance with Cython. It was a deep dive into the practicalities of speeding up Python code. Saksham demonstrated how to write Python data analysis code in Cython, explore the generated code, and run microbenchmarks to understand performance bottlenecks. It was interesting to see how simple hints to Cython can significantly enhance performance. 

The final session I went to was Arun Suresh Kumar’s talk on synchronous and asynchronous programming in Python was highly informative. Arun explained the key differences between sync and async programming, explored tools like Uvicorn and uvloop, and provided benchmarks to compare their performance. The session also covered the evolution and importance of ASGI and WSGI servers in Python web development. Practical insights from real-world scenarios highlighted effective strategies for both sync and async programming. 

At that point, it was time to head back home, but not before picking up my PyCon 2024 t-shirt. I don't think I have a purple t-shirt, but I really like this one.


No comments:

Post a Comment

Elements of Monte Carlo Tree Search - Typical and Non-typical Applications

Monte Carlo Tree Search (MCTS) offers a very intuitive way of tackling challenging decision making problems. In essence, MCTS combines the...