Welcome to the Contributor Guide! Here you will find useful resources that will help you start contributing to the Scientific Python ecosystem.
Your time is the most valuable thing you have, so when taking part in volunteer activities it is always worth asking “why?”. Here are a few reasons:
First, Scientific Python is about science. And if you believe that science makes the world better, then improving scientific tooling is extremely important! By putting better open source tools in the hands of researchers, we can help them to produce accurate results, do so in a transparent way, while also improving reproducibility. We believe that scientific tools should be open, and that they should belong to those who use them.
Second, Scientific Python is about openness. When you take part in building open source software, your work may be used by thousands, sometimes millions of people. Your software may only be a tiny cog in a big machine, but it could help fly the next space mission, decipher the origins of the universe, or help invent radically new medical treatments. That is real impact!
Of course, it’s not just about science and researchers, but also about you: the volunteer contributor. And as we said before, we strongly believe that scientific tools should be developed and owned by those that use them. This is the best way to ensure that tools meet the needs of science.
But, even if you are not a scientist, you can contribute and benefit from contributing.
Being part of the open source community, you will work with some of the very best programmers in the world. Through their feedback, you will become a better developer and also learn how to be an excellent collaborator and team member. You’ll learn best practices of software development and engineering, and how to best present and communicate your ideas.
Last, but not least, you will likely work with and make friends with people from around the globe!
These are but a few of the reasons why we contribute to open source Scientific Python.
Shaping the tools you and others use has been a transformative experience for many of us, and we hope it will be for you too.
We cannot wait to welcome you to the Scientific Python community!
Scientific Python is code designed by scientists and engineers for science and engineering. All projects have a straightforward license that determines what you can and cannot do; typically, you may use and modify the software, as long as you give credit to the original authors. The entire ecosystem relies on peer review and community production, so your contribution is really important. There are many ways to contribute outside of codingâwe’ll discuss a few.
Every Scientific Python project has its own issue tracker where users report bugs, suggest UX improvements, and discuss technical problems they are having. This lets developers support users and track improvement to the projects. One way in which you can contribute is by verifying and triaging issues.
For example:
Pull requests (PRs) are the way in which Scientific Python projects incorporate new code. You can help, even if youâre not familiar with them, by:
Another way to contribute to a project is by improving itâs documentation. Documentation is crucial for every Scientific Python project since that is the way users learn how to use it. This doesnât mean you need to write new documentation (which you can by following the docs contributing guide)âthere are other ways you can help too.
Most Scientific Python projects are developed in English, but an increasing number use online platforms such as Crowdin to translate their interface, webpage, and documentation. If you speak a language other than English and feel comfortable translating, this is yet another way you can help.
Every Scientific Python project has a community of volunteers that you can be part of. You can get involved in online conversations and discussions about the projects, offer help to newcomers, come to community meetings, or teach others about the project. You can even help with community outreach by sharing content on Twitter, organizing code sprints, posting newsletter updates, or writing blogs.
As you’ve seen, there are many ways to contribute to Scientific Python! No matter what you have to offer, go ahead reach out to project maintainers: they will be happy to receive all the help they can get.
[DRAFT] This video has not been recorded yet.
A common question from new contributors is: “how do I choose which project to contribute to”? Some people end up contributing to many different projects, while others tend to focus their effort on a single project. And while projects in the ecosystem have a lot in common, each has it’s own community—so there may be differences in culture, style, and decision-making processes. Ultimately, which projects you contribute to will depend a lot on your own personal interests and goals.
Some projects like NumPy are used by many projects in the ecosystem. Such projects are mature and relatively full-featured. Given their central role in the ecosystem, working on these projects can have a huge impact. However, making changes to these projects may be more challenging than in newer and less central projects. It’s not uncommon even for core developers to have their pull requests go through iterations for months before being merged.
For example, because NumPy affects pretty much the entire ecosystem, it is going to be very difficult to contribute larger features to and usually requires a NumPy Enhancement Proposal (NEP) to be approved before work is started on it. Enhancement Proposals are fairly common for core projects in the ecosystem and consist of a writeup of the planned changes, including a summary of the implementation, pros and cons of it, and sometimes a proof of concept coded up. It is then discussed and iterated on before a decision is made.
On the other hand, projects such as NetworkX may just require a review or two and basic tests before your changes are merged.
It’s worth remembering this distinction when deciding how much time you’d like to invest.
The open source Scientific Python community functions differently from a normal work environment because it is largely comprised of people contributing in their free time, from different time zones. As such, contributors and maintainers may not always be able to get back to you immediately.
Since so many community members are volunteers, any and all contributions are highly valued. Maintainers always want to help, but they are often over-subscribed and may miss notifications or read something and forget to respond. If you haven’t heard back from them in a few days, it’s usually safe to give them a friendly ping to check.
Getting to know the developer community is a great way to learn more about the projects and find a great fit. There are many ways to begin interacting with project communities:
[DRAFT] This video has not been recorded yet.
How should you choose which project to work on? There are many projects in the ecosystem to choose from so itâs important to find one related to something youâre interested in or is a project you already use. For example, if youâre interested in working with images, it might be worth looking into implementing algorithms in scikit-image.
Typically, it is easier to contribute to smaller projects—but you also want to choose a project thatâs active enough so that the developers can review your code and provide mentorship. There may also be more issues and ideas to work on.
Before diving into a project, take a look at their open issues and pull requests, see how maintainers interact with the community, and decide if it would be good fit for you.
For a more detailed discussion, also take a look at our Choosing a project
video, linked below.
As with any trade, there are certain fundamental tools you should learn. Since the ecosystem is built in Python, you’ll need to know how to program in that language. Other tools we use daily include:
Take a look below for links on how to learn these tools.
Now that youâve chosen a project to contribute to, itâs time to get set up. Most projects have a file called CONTRIBUTING in the root of the repository that will tell you how to set up your development environment, propose changes, etc. Developer documentation will also explain testing and review procedures, and whatever else you need to know.
When first contributing to a project, itâs best to start with small, self-contained issues. Often, maintainers will label issues with the “good first issue” label, so take a look at those first. Examples of a good first issue include fixing a small bug, adding tests, fixing documentation typos, or writing up simple documentation.
It is not uncommon to get stuck while making your first contribution. Don’t panic! Try to find the real-time chat or discussion forum for the project, and ask for assistance there. The maintainers will be happy to help!
For more details, also check out our First contribution
video.
Once youâre comfortable making small changes to the project, you can start taking on bigger features. There are many different ways to help: you may, e.g., implement new features, write documentation, refactor and clean up code, improve testing, work on build infrastructure, and so forth.
No matter what you contribute, or whether you contribution is big or small, it is much appreciated.
[DRAFT] This video has not been recorded yet.
Before you start, make sure you have the following:
There are some links below the video to help you get these elements ready in case you are missing some.
Now, we can get started.
Go to the projectâs repository and click the âForkâ button at the top left of the page. This will create a copy of the repository in your own account.
On your new fork, click the green âCodeâ button and copy the link that appears there to get the URL for cloning it.
Now, open your terminal (or Git Bash, if youâve installed Git for Windows) and type the command âgit cloneâ followed by pasting the URL you just copied. With this, you now have a local copy of your fork.
Finally, change to the directory of the repo you just cloned and add the the projectâs repo as the âupstreamâ remote repository by typing the following:
git remote add upstream https://github.com/.git
Most open source projects have their own contributing guide, which explains the steps needed for setting up your development environment. Youâll usually find them in the root directory of the repo. We recommend that you create a new environment for this.
To create and activate a new Conda environment, type the following commands in your terminal (or Anaconda Prompt on Windows):
conda create -n [NAME] python=3
conda activate [NAME]
After you have created your new Conda environment, you need to install the projectâs necessary dependencies (This depends on which project we will be using for this video):
conda install âŚ
Now we need to select the issue we want to fix on the issues tab (Add link of Project’s issue tracker to display in video here) issue tracker of the repository and reproduce it in the development version of our project. (Not sure this applies, again it depends on the project).
First create a branch for your work. Run the following command in your command line:
git checkout -b [BRANCH NAME]
Open your editor or IDE in the file that you need to solve the issue and save your changes.
(Not sure this applies)
Now, you are ready to add and commit your changes with a descriptive message. Type the following command in your terminal:
git commit -a -m âdescriptive messageâ
Finally, push your new branch with your changes to your fork on GitHub:
git push -u origin [BRANCH NAME]
Enter your GitHub username and password if requested.
Now, you can submit your changes to the projectâs repo.
Go to the projectâs repository on Github, and you will see the option to open a Pull Request. You also have to make sure that you select the correct branch to merge your changes.
You have now made your first contribution to open source!
[DRAFT] This video has not been recorded yet.
The Scientific Python ecosystem is a collection of open-source scientific software packages written in Python. It is a broad and ever-expanding set of algorithms and data structures that grew around NumPy, SciPy, and matplotlib.
The ecosystem includes a wide variety of tools: some more specialized to specific domains such as biological imaging or astronomy, and others quite general for tasks such as data management and high-performance computing.
It includes projects such as Pandas (for data analysis), NetworkX (for graph computation), scikit-learn (for machine learning), and scikit-image (for image processing).
Here is a curated selection of packages available in the ecosystem:
[DRAFT] This video has not been recorded yet.
Before installing Scientific Python libraries, you need to have Python itself installed. There are two, largely equivalent, ways of doing that, and we describe both below.
If you have a working version of Python on your system already (check
by running python3
), you can skip to setting up a virtual environment.
This is the official Python distribution, which uses the pip
package manager. pip
installs packages from Python Package Index, or PyPI for short.
Download the installer from https://www.python.org/downloads/.
A virtual environment is a workspace into which you can install Python libraries, separate from what is being used by your operating system.
Create a new virtual environment in a directory called
py3
:
python -m venv py3
Start using it as follows:
source py3/bin/activate
Also, make sure you have pip
installed—that is Python’s default package manager:
python -m ensurepip
You are now ready to install Scientific Python packages using pip
! For example:
pip install ipython numpy scipy
You should now be able to run IPython (the interactive Python shell) to try out NumPy:
$ ipython
In [1]: import numpy as np
In [2]: np.linspace(0, 10, 5)
Out[2]: array([ 0. , 2.5, 5. , 7.5, 10. ])
Mambaforge is a small Python distribution based around the mamba package manager, and installs packages from the community repository conda-forge.
Mamba is a bit different from Python’s pip
package manager in that
it can, in addition to Python libraries, also install compilers,
libraries, and so forth.
Download the latest version from GitHub. Run the installer, and when it asks you “Do you wish the installer to initialize Mambaforge?” enter “yes”.
A virtual environment is a workspace into which you can install Python libraries, separate from what is being used by your operating system.
Create a new virtual environment in a directory called
py3
:
mamba create -p py39
Mamba uses conda
to switch between virtual environments. Start using
the new environment as follows:
conda activate ./mamba39
You are now ready to install Scientific Python packages using mamba
!
For example:
mamba install ipython numpy scipy
You should now be able to run IPython (the interactive Python shell) to try out NumPy:
$ ipython
In [1]: import numpy as np
In [2]: np.linspace(0, 10, 5)
Out[2]: array([ 0. , 2.5, 5. , 7.5, 10. ])
[DRAFT] This video has not been recorded yet.
Scientific Python is built on the Python programming language. Using Scientific Python therefore requires having a firm grasp of Python itself. We suggest reading through the official tutorial, doing an online tutorial on exercism, or using any of the countless resources that exist online or in print.
Learning a new language can be challenging, but Python is fun—so keep trying and hang in there! The community is there to help you along the way.
So let’s cover some basics.
Python is an interpreted language: that means that it reads a text file with instructions and executes those one by one.
The easiest way to create a text file is in a text editor, like Spyder or VSCode.
We can do that right now. Let’s create a file called hello.py
:
print("Hello world")
And then run it:
python hello.py
hello
That’s it, your first Python program!
You can also play around with Python code interactively in IPython:
[launch IPython and run:]
In [1]: def fibonacci(n):
...: a, b = 0, 1
...: for i in range(0, n):
...: a, b = b, a + b
...: return a
...:
In [2]: fibonacci(10)
Out[2]: 55
Another ways to play with Python code is in Jupyter Lab. This is an interactive web application for typing in and executing Python code. Let me show you how to do a simple plot in Jupyter:
[Open Jupyter Lab; create notebook; import matplotlib as plt; plt.plot([1, 2, 3])]
You can head over to https://try.jupyter.org to test it out.
What distinguishes most scientific codes from general ones is that they operate on collections of numbers. These are often represented as NumPy arrays—they are fast, and they have convenient syntax.
Let’s generate 1000 random numbers and square them:
[In IPython]
import numpy as np
import matplotlib.pyplot as plt
### Generate 1000 random numbers, store in x
x = np.random.random(size=1000)
### Square them and store in y
y = x**2
### Plot the results!
plt.plot(x, y)
plt.show()
We’ll post a list of links below the video where you can learn more:
By far the best way to learn, however, is to start coding!
The first thing to do when stuck is to read the documentation. Note that almost all libraries ship with documentation right at your fingertips!
[illustrate how to look up the docstring for np.linspace
]
If you are still stuck, join the community forum at https://discuss.scientific-python.org or reach out to the relevant package on its mailing list.
Good luck!