Online Repositories for Code and Data
Author
Open science is a collaborative research movement designed to make scientific processes and outputs accessible, transparent, and reusable for everyone. Online repositories play a big role in this because they allow us to share our code, data, and results with others. In this notebook, you are going to learn about two such repositories: GitHub and the Open Science Framework (OSF). GitHub is primarily a platform for hosting and collaborating on code repositories, while OSF is a platform for managing research projects. You are not expected to write any code in this notebook; the purpose is to familiarize yourself with the interfaces of GitHub and OSF and build a mental model of how these services work.
Section 1: Creating a GitHub Repository
Background
GitHub is a platform for hosting code repositories that is based on the version control system Git. This allows GitHub to track data and know exactly who changed what and when. In this section, you are going to create a GitHub repository and add some data to it. When creating a repository, you can choose between public (everyone can access) and private (only you and explicitly added collaborators can access). If your project permits it (i.e., there is no sensitive data that you can’t share), it’s usually a good idea to make the repository public.
Exercise: If you don’t have a GitHub account yet, sign up here. Then, create a new public repository with a README.md as shown in the screenshots below.
Exercise: Edit the README.md file and commit the changes as shown in the screenshots below.
Exercise: Upload a file to your repository as shown in the screenshot below.
Exercise: Open the commit history of your repository by clicking on the field marked in the screenshot below. You can click on any commit to see exactly what was changed.
Section 2: Collaborating on Open Source Repositories
Background
Open source software is inherently collaborative, and popular open source libraries often have thousands of contributors. To contribute to an existing project, you first fork it, which creates a copy of the repository that you own. After you make modifications to your fork, you can create a pull request, which asks the owner of the original repository to merge your changes into their project. Contributing to open source projects is a great way to learn more about programming. In many open source repositories, you can go to the issues and search for those labeled good first issue to get started.
Exercise: Go to the iBOTS-Collaboration repository and fork it as shown in the screenshot below. This will create a copy of the repository on your own account.
Exercise: On your fork of the repository, open the participants folder and add a new file named <your name>.txt with a little message and commit the changes.
Exercise: Open a pull request (PR). A PR asks the author of the original repository to incorporate your changes into their repository.
Section 3: Create a Research Project on OSF
GitHub is great for hosting code, but the options for storing data are very limited. The Open Science Framework (OSF) works naturally with non-code assets like datasets, presentations, or questionnaires. OSF also provides persistent data storage with rich metadata and permanent identifiers that allow others to find and cite your work. This makes OSF a great tool for open science. In this section, you are going to create an OSF project, add files and metadata to it, and learn how to organize a project into independent components. Finally, you’ll see how to link a GitHub repository to your OSF project to get the best of both worlds: GitHub for programming workflows and OSF for organizing your research projects.
Exercise: If you don’t have an OSF account yet, sign up here. Then, on the dashboard, create a new project as shown in the screenshot below. If you are in the EU, it’s a good idea to select a storage location within the EU to comply with data protection laws.
Exercise: Make your project public as shown in the screenshot below. When creating public projects, be careful not to expose sensitive data!
Exercise: Edit the Wiki as shown in the screenshots below to provide a short description of your project.
Exercise: Edit the metadata as shown in the screenshot below to add a license and DOI (digital object identifier) to your project. A DOI gives you a permanent identifier that others can use to cite your work, and a license tells others how they can use it. For open science projects, consider the CC-BY 4.0 license, which allows others to freely use your work as long as they give you attribution.
Exercise: OSF also allows you to upload data (up to 5/50 GB on a private/public project). Go to the Files menu as shown in the screenshot below, create two folders called data and manuscript, and add a file to each.
Exercise: Instead of organizing your data in one big project, you can create multiple subprojects or components. This allows you to organize your work in a modular fashion and store more data (each component is its own project with a separate data limit). Create a new component in your project. Then, click on that component and configure it to be public as well.
Exercise: You can also link your GitHub repository to the OSF project. Go to the Add-ons menu and select GitHub as shown in the screenshot below. Then follow the instructions to link your account. Once the account is linked, you can select a repository to link to the OSF project.
After adding the GitHub repository, it should appear in the file preview of the OSF project, as shown in the screenshot below.