Valeria Gelardi




Analysis of the Structure and the Collaborative Dynamics of GitHub Projects


The recent spread of social networks and ICT systems has allowed for a huge availability of data on social phenomena and collective behaviour. This has induced a deep change in social dynamics field, that moved from an essentially theoretical approach to a strongly data driven one. In such framework, the present work aims at exploring the collaboration dynamics and the organisational structures within the GitHub platform. Moreover, the purpose is using success and popularity as feedbacks to check whether some particular structures exist that are associated with more efficiency, better results and subsequently more innovative features in the development of the code. GitHub is based on the Git revision control system and is currently the most important platform for open source coding, counting millions of repositories and active users. Moreover, the complete timeline of GitHub activity is publicly accessible on the GitHub Archive website. GitHub is therefore a particularly suitable system to observe and analyse collective social behaviours and collaborative dynamics. The collaboration among users fosters an uninterrupted flow of new ideas which actualise in many different events such as the creation of new projects and updating of existing ones through code modifications. The analysis required a preliminary selection of the data downloaded from GitHub Archive in order to create a database containing all the necessary information about projects activity. The analysis carried out on this database was mostly inspired by previous research on innovation dynamics in the framework of complex systems. Every project was mapped in a network structure in order to observe dynamically the development and the modifications of the code. Some metrics were defined that could estimate the collaboration degree among users and the organization of the workload within the developing branches. Other metrics were chosen in order to evaluate both the success and the popularity reached by a project and its potential innovation. Correlation analysis between the metrics and the indexes above mentioned allow for some evaluations about the interdependence between attention received and structural features of the projects. This thesis work follows up several quantitative analyses on GitHub presented in literature and proposes a new visualisation of internal structures and collaborative dynamics within GitHub projects. Moreover, identifying successful patterns could help in highlighting the most influential and pioneering projects and encouraging their development.