First day – (Almost) all about data science
As both of us are working on the field of data and analytics, we were happy to notice how well represented data science was in PyCon Sweden this year. The first day was mostly dedicated to talks around this field. One of the talks that engaged us both was Ravi Singh’s presentation touching a subject often talked about in machine learning projects: how did this black box known as a machine learning model predict what it did?
The answer Singh gave in his presentation was SHAP (SHapley Additive exPlanations) that shows how much each model predictor contributes, and it can be calculated for any tree-based model. You don’t only apply this on global scale but can also see what affected the individual prediction which separates this from traditional importance algorithms.These kinds of methods will increase the transparency, auditability and stability of the model.
Errol Koolmeister, the head of AI and architecture at H&M, had a keynote about scaling AI. It was interesting to hear not just about the technology stack but also about the processes they had developed around AI projects. H&M hired a person to consider ethical issues and had e.g. a checklist to go through with legal department for every new project. Errol pointed out the importance of fast proof of concept projects and pilots. Only after those you should move on to the ”slowed growth, optimization, refinement and complex models” phase where you roll out, scale and improve. You shouldn’t develop technical debt though, and should plan the machine learning pipelines carefully.
It’s also important to understand that there is a difference between traditional software development where your goal is to meet a functional specification, and machine learning projects where you are trying to optimize a metric. Optimization requires you to tune input data and parameters and to try to find better libraries, models and algorithms. This work takes time. It’s also easier for teams to make decisions if some architecture principles have been defined at company level. E.g. in the H&M case they had decided to suggest the use of separation of concerns, stateless, automated, cloud native and serverless in their projects.
There were also other interesting topics for us working with data, such as Ludvig Hult’s talk about causal inference. At a first glance, this seemed pretty trivial if you’ve worked with statistics and data science before. However, there were very good insights and discussions about causality on a technical and philosophical level. On top of that, when you’re working with machine learning techniques and don’t familiarize yourself with its implications and interpretations, you might often forget the more traditional, yet highly important aspect of statistical modelling known as causal inference.
The first day included also talks about using machine learning for data protection purposes or for scaling your solution to different industries. In addition, Daniel Roos talked about how machine learning and Python have typically been used in the field of finance. A talk about generative art by using computing power to produce elegant artwork was like a cherry on top. If only there were more hours in a day!
Second day – Building solid software
The second day started with an interesting keynote about machine vision. Tess Ferrandez from Microsoft presented an example use case where machine learning served the purpose perfectly. She discussed how to recognize activity from video with machine learning tools in order to automatically create a video showing the highlights of a football match. Given the huge number of screens and pixels in video material, this type of video processing wouldn’t even be possible without machine learning techniques, such as facial and image recognition combined with audio signals and looking at speed and direction of people and objects. Ferrandez emphasized how machine learning shouldn’t be the final goal in itself (as it sometimes seems to be, using machine learning just for the sake of doing machine learning), but rather as a means to an end in complicated problems.
Tess Ferrandez had an interesting keynote about machine vision.
The rest of the day included topics like property-based testing, mutation testing, code optimization and microservices. Isaac Bernat made a very good point about the balance between readability and maintainability of code and optimizing for processing speed. There’s always a tradeoff between readability and performance. Sometimes you might want to consider if further optimization is a reasonable use of your time. For instance, if the code runs once a night, how much do you actually gain by investing into making it run super fast? It helps if you recognize the point after which your optimizations start affecting maintainability. Bernat showed some common tricks to optimize performance and pointed out that in Python, many of these tricks are already handled by existing modules and compilers, e.g. PyPy, Cython or Transonic. You might want to use these instead of inventing the wheel again.
This year’s conference-goers could probably be divided roughly into two categories – people more focused on data and ML and those whose interests lie more in software development. During the closing get-together, we heard that many programmers preferred the second day’s program whereas more data science-oriented people seemed to be excited about the first day’s topics. All in all, there were plenty of interesting talks for each group.
We didn’t participate in the panel discussions and workshops even though there would’ve been interesting topics like Apache Kafka, debugging tools and DevOps. Overall, the experience was great, and we were very inspired and eager to try some of the new things in our own projects after the first day. The second day showed us how much there actually is to learn and how great it is to have a diversity of people with many amazing skills working with Python. We met some very interesting and talented people, and building your network and being able to create something greater together is one of the best things about events like these. A big thanks to PyCon Sweden organizers! We’re looking forward to seeing everyone soon again!
PyCon Sweden was organized on October 31st - November 1st in Stockholm.