Data Science with Python: An Unexpected Tech Evolution

Written by Steve Gifford

October 31, 2024

In the expansive world of data science with Python, I’ve learned that technological success isn’t about mastering every tool but understanding how to make computers do exactly what you want. My approach has always been refreshingly direct: transform raw data into meaningful insights using Python as the primary vehicle for computational exploration.

I’ve never been particularly motivated by development technology. It’s great when it works, annoying when it doesn’t, and you can spend a lot of time in that sort of in-between state.

What motivates me is making computers do what I want. Whether that’s servers in the cloud or handheld devices, I want them to process or display my data! Everything else is in service of those goals.
Sometimes, I buck trends, but mostly, I use the dominant tooling for my work.

The Path to Data Science with Python

I didn’t arrive with preconceived notions when I first delved into weather data processing. Instead, I asked a straightforward question: “What are the standard tools in this field?” This approach has served me well—I have adapted to the ecosystem rather than fighting against it.

Python emerged as the clear favorite in meteorology, offering an impressive array of specialized packages. From processing individual NEXRAD messages to performing radar advection, Python provides data scientists with powerful, accessible tools.

How did we end up here? I should have paid more attention to the journey. I’ve had a general sense that scientists like Python, and they’ve been building it up in their respective fields for a long time. But that’s vibes, not history.

We use Python for Boxer, our server-side product because meteorologists (and atmospheric modelers) use Python. We use other things for Terrier, our front end.

Unexpected Trajectories

When the iPad first came out, I thought it was a bit dumb. Pilots, my clients at the time, came to absolutely adore the things. They saw something I didn’t, but I eventually came around and hopefully learned something.

Years ago, if you’d asked me to predict the next significant data science language, I would have outlined criteria like:

  • Be easy to write interactively
  • Have a compiled form that’s nearly as fast as C++

What actually happened was far more interesting. Python didn’t just meet these expectations—it reimagined them.

Interpreters and Ease of Use in Data Science with Python

Rather than a compiled language or one that you can interpret and compile, we have something else. In data science with Python, we have an easy on-ramp and a way to make things faster if you know what you’re doing.

Python is easy to use because of the interpreter. If you want to work out the syntax, you do so in the interpreter. This lets users move past the weirdness they get in any language and hack something together. Constructs like Python notebooks let you formalize that hacking process into something reproducible.

Optimizing Performance in Data Science with Python

I was initially skeptical about performance in data science with Python. How could an interpreted language compete? Python libraries like NumPy are often written in C for speed. That’s great, and it’s not dissimilar to Android’s bifurcation between C++ for implementation libraries and Java for app development. But Python does it better.

If I’m working in C++ myself, I’ll typically do my thing in a fast enough way. You can iterate over data points, pixels, or what have you fast enough to do the job. You can’t really do that in Python. If you’re iterating over your smallest data structures or even representing them as individual Python objects, you’ve got a problem.

The library developers know this and give you a lot of knobs to twist and very rich ways of specifying what you want and how to do it. For data scientists using Python, that typically means we do it first the dumb way, and when it’s too slow, we spend a bunch of time tweaking it. Thus far, we have yet to drop down to C to get the speed we need.

Technological Evolution and Adaptability

Technology doesn’t progress linearly. We’ve seen this with NodeJS, which seemed improbable initially but now works remarkably well. Currently, the tech community is exploring Rust with similar enthusiasm.

The lesson? Keep an open mind, be willing to adapt, and understand that today’s seemingly quirky tool might be tomorrow’s standard in data science.

Looking Ahead in Python Data Science

In computational analysis, flexibility trumps rigid expertise. Python exemplifies this philosophy—it is a language that evolves with its community’s needs, bridging ease of use with powerful data processing capabilities.

Our technological journey is about continuous learning, adaptation, and maintaining curiosity. Who knows what data science paradigms we’ll embrace in the next decade?