In the recent years, Python has become one of the most popular programming languages on this planet. This trend was partially driven by the rise of machine learning and data science – both areas, in which Python excels. (The beginner-friendly syntax didn’t hurt either.)
This focus on Python as data science tool or as scripting language sometimes overshadows its potential for software engineering. When I started out with Python I had a hard time finding info material on how to structure a larger project and which packages to use. So here is my complete list of “must haves” for each new project.
There are many Python IDEs out there and it is important to realize, that not all of them are equally good for each use case. While Jupyter is great for rapid prototyping or visualization and Anacondas Spyder has powerful features when it comes to data analysis, none of these IDEs are particularly good for software engineering. Here are my favorites:
- Visual Studio Code: Free, cross-platform, cross-language, many Extensions, great Python support and very lightweight. The best choice if you have projects in multiple languages and don’t want to switch your IDE all the time.
- Pycharm: A pure Python IDE, which is available as free Community edition or as commercial Professional edition. The professional edition comes with nice additional features like built-in performance profiling.
Personally, I prefer Visual Studio Code.
Environment and dependency management
Package management is one of the first things new Pythonistas have to learn before they can create reasonably sized projects or contribute to them.
- Venv: Each new project should start with a new virtual environment. This allows you to isolate the dependencies of the package you’re creating.
- Virtualenvwrapper: A very handy little package, that makes it easier to create, activate and switch between multiple virtual environments.
- Poetry is a swiss army knife when it comes to organizing your project. It helps you to create a basic directory structure, to organize the package dependencies and to publish your package to PyPI. It comes with a bit of a learning curve though.
- Pipenv is a commonly used alternative to Poetry and simplifies the dependency management. It automatically tracks the dependencies in your virtual environment and helps you to lock them down for deterministic builds (e.g. to make sure you’re deploying the same setup that you tested before).
I personally like to have a lot of control over the structure of my projects and usually stick to the use of basic pip + venv without any of the extensions. But they are definitely interesting projects and worth checking out.
Code quality & static code checking
One thing that makes Python different from many other programming languages is, that there is a clear definition on what good code looks like: PEP8 – the official style guide for Python. This fact is used by the many code style linters out there and I highly recommend to use at least one in each of your projects. Besides the style, make sure you have a security linter enabled.
- Pylint is the default linter I have enabled for any code I write. It is highly configurable and nicely integrated in many IDEs, including my preferred Visual Studio Code.
- Bandit is a security linter that checks your code for common pitfalls like SQL injection, weak encryption methods, etc..
- If you like beautiful code, but are too lazy to write it yourself, you’ll love black. The uncompromising code formatter takes even the ugliest code and auto-formats it to be compliant with PEP 8.
- Flake8 is a common alternative to Pylint. I’ve even seen people use both style linters in parallel to make sure their code really shines.
- Sonarlint is a powerful IDE extension for VS Code and other IDEs, that takes care of style and security linting at the same time. It is a bit too resource intensive for my taste, but does a very good job at checking your code.
If you want to go all the way when it comes to ensuring the quality of your product, you might also have a look at products like Sonarqube or SonarCloud. These server-side solutions scan your entire code and give you an overall status report including found vulnerabilities.
There are a few important differences between pytest and unittest. Most importantly, pytest can run test cases written in the unittest framework, but not the other way around. So you don’t have to migrate your entire test suite if you decide to go with pytest.
I recommend to give pytest a shot, especially due to the large number of available extensions for every scenario imaginable:
- pytest-cov for test coverage.
- pytest-flask for testing of web services built with Flask.
- pytest-mock for more convenient mocking.
Both test frameworks integrate nicely with all commonly used IDEs, so you don’t have to worry about compatibility.
Setup, build and continuous integration
This is a very large topic and I’m only going to scratch the surface here. Make sure to read up anything that doesn’t sound familiar.
- Check out the tox automation project. This essentially gives you an isolated CI pipeline on your local machine and allows you to test and build your python package in multiple environments simultaneously. It takes some time to configure, but once you have it set up you’ll never want to work without it again.
- When you’re planning on developing your package continuously, make sure to adhere to the official semantic versioning conventions for Python as written in PEP440. This will make your live easier when you’ll start pushing alpha and beta versions to PyPI.
- Versioneer: Like a rocketeer, but for versions! – You just have to love their slogan. This nice little helper takes care of the semantic versioning for you by scanning your git repository for appropriate tags.
- Wheels are the “new” (2012) recommended distribution format for Python packages. They are more lightweight and speed up the installation process. An excellent read about this topic is “What are Python wheels and why should you care.“
When it comes to web frameworks you’ll have to choose between the categories micro framework and full-stack web frameworks. I’ll provide you with the most common examples for both:
- Flask is a very lightweight framework which is pretty easy to learn. The hello world example has only 5 lines of code.
This minimalistic approach shouldn’t be confused with limited functionality. Flask relies heavily on a variety of extensions, that you can install on demand. This is nice in many ways: It makes the deployed web-service more lightweight, it simplifies your code base and it allows you to make use of powerful third-party extensions written by fans.
If you’re looking into developing a RESTful API, I recommend to have a look at Flask-RESTx, which essentially adds a Swagger UI to your service for free.
- Django is a very powerful, but also quite heavy full-stack web framework. It comes with a lot of configuration options and relies heavily on conventions. Learning it takes significantly longer compared to Flask, but it will give you a more complete feature set out of the box.
If you’re just starting out with web development, my recommendation is to go with Flask.
To set up a connection to a relational database you essentially have two options:
- Pyodbc is the default package to establish a database connection and is compatible with all common database providers. It includes all the basic features like transaction management (as long as the odbc driver supports it), execution of multiple queries and many more.
- SQLAlchemy is your go-to solution if you’re looking for a pythonic ORM mapper. It uses pyodbc under the hood and exposes the core functionality to you as user, so you’re not loosing any flexibility, which is really nice.
If you’re looking to established a connection to a NoSQL database or a HDFS system you’ll find the proper tool set easily with the search engine of your choice. There usually is a dominant package for each of them and the decision won’t be difficult.
This has been a lot and if you made it all the way here I admire your perseverance. You’ve earned yourself a good cup of tea or coffee.
While you’re enjoying your hot beverage: What are your favorite tools for software engineering in Python? Are you missing anything in the list or do you disagree with my selection? Please let me know in the comments! I’m looking forward to the discussion.