Social Network Analysis for Computer Scientists

This course deals with computer science (CS) aspects of social network analysis (SNA), and is open to all students in the master computer science programme at Leiden University.

Course information

Lectures: Fridays from 9:00 to 10:45 in Gorlaeus lecture room BM1.33
Lab sessions: Fridays from 11:00 to 12:45 in Gorlaeus lab room DM0.09 and DM0.17
Prerequisites: a CS bachelor with courses on Algorithms, Data Structures and Data Mining
Literature: provided papers and book chapters (free and digitally available)
Examination: based on presentation, paper, programming, peer review and participation (no exam)
Brightspace link: 2526-S1 Social Network Analysis for Computer Scientists
Study guide link: Social Network Analysis for Computer Scientists
Study points: 6 ECTS

Course staff: dr. Frank Takes (f.w.takes@liacs.leidenuniv.nl, room BE 3.07), Rachel de Jong MSc (room BE 3.03)
Assistants: Gamal Adel Elgamal MSc (room BE 2.23), Bart Holterman, Thanos Kalligeris and Bart Westhoff

Need help? Ask your questions during the lab sessions. If it is more urgent, walk by the lecturer or assistant's offices. If they are not around, contact snacs@liacs.leidenuniv.nl.

Network with 1458 nodes and 1948 edges.

Course schedule

Date

Lecture (9:00-10:45)

Lab session (11:00-12:45)

Fri Sep 5, 2025

Lecture 0: Course information
Lecture 1: Introduction and small world phenomenon

Instruction: Introduction to Gephi
Work on Assignment 1

Fri Sep 12, 2025

Lecture 2: Advanced concepts and centrality
Lecture 2.5: Course project

Instruction: Introduction to NetworkX
Work on Assignment 1

Fri Sep 19, 2025

Lecture 3: Network projection and community detection
Example presentation

Work on Assignment 1

Fri Sep 26, 2025

Lecture 4: Propogation-based centrality and structure of the web

Work on Assignment 1

Mon Sep 29, 2025

Deadline for Assignment 1 (AoE; hand in via Brightspace)

Fri Oct 3, 2025

No lecture (university closed; 3 October)

No lab session

Fri Oct 10, 2025

Lecture 5: Network evolution and human traversal

Instruction: Course project contribution
Work on Assignment 2

Fri Oct 17, 2025

Lecture 6: Network processes and Network science challenges

Instruction: REL Compute Lab
Work on Assignment 2

Fri Oct 24, 2025

Track A (BW 0.17, Frank):
4 Anomaly detection
16 Community detection
21 Graph compression
28 Influence spread

Track B (DM 1.15, Rachel):
5 Anonymity in networks
25 Influence spread
37 Network embeddings
46 Shortest paths

Track C (DM 1.19, Gamal):
Starts at 10:00
3 Anomaly detection
39 Network embeddings

Work on Assignment 2

Mon Oct 27, 2025

Deadline for Assignment 2 (AoE; hand in via Brightspace)

Fri Oct 31, 2025

Track A (BW 0.17, Frank):
8 Anonymity in networks
20 Core/periphery structure
38 Network embeddings
48 Shortest paths

Track B (DM 1.15, Rachel):
1 Anomaly detection
9 Centrality estimation
12 Centrality estimation
14 Community detection

Track C (DM 1.19, Gamal):
15 Community detection
22 Graph compression
26 Influence spread
35 Network motifs

Instruction: Course paper planning
Work on Course project

Fri Nov 7, 2025

Track A (BW 0.17, Frank):
7 Anonymity in networks
10 Centrality estimation
29 Link prediction
49 Visualization algorithms

Track B (DM 1.15, Rachel):
34 Network motifs
42 Sampling from networks
47 Shortest paths
50 Visualization algorithms

Track C (DM 1.19, Gamal):
2 Anomaly detection
27 Influence spread
51 Visualization algorithms
52 Visualization algorithms

Work on Course project

Thu Nov 13, 2025

"Deadline" for a draft of the first half of the Course project paper (bring 2 printed copies to tomorrow's peer review session)

10.

Fri Nov 14, 2025

Peer review session (in team pairs)

Work on Course project

11.

Fri Nov 21, 2025

Track A (BW 0.17, Frank):
17 Core/periphery structure
33 Network motifs
41 Sampling from networks
44 Sampling from networks

Track B (DM 1.15, Rachel):
No session

Track C (DM 1.19, Gamal):
11 Centrality estimation
13 Community detection
31 Link prediction
45 Shortest paths

Work on Course project

Mon Nov 24, 2025

(Optional) Deadline for a preliminary version of the Course project paper for course staff feedback (AoE; hand in via Brightspace)

12.

Fri Nov 28, 2025

Track "late" (BW 0.17, Frank), starts at 10:00:
6 Anonymity in networks
43A Sampling from networks

Track B (DM 1.15, Rachel):
18 Core/periphery structure
23 Graph compression
30 Link prediction
36 Network motifs

Track C (DM 1.19, Gamal):
19 Core/periphery structure
32 Link prediction
40 Network embeddings
43 Sampling from networks

Work on Course project

Thu Dec 4, 2025

"Deadline" for a substantial part of the Course project code and experimental pipeline (have it available digitally for review tomorrow)

13.

Fri Dec 5, 2025

Code review session (in team pairs) - Collaborative document 2025

Work on Course project

14.

Fri Dec 12, 2025

No lecture

Work on Course project

Dec 14, 2025

Deadline for final Course project paper and accompanying code (AoE; hand in via Brightspace)

Dec 17, 2025

Deadline for retake assignment to replace failed assignment(s) (AoE; hand in via Brightspace on top of failed assignment)

Dec 19, 2025

Course end. Grades are sent to student administration.

Jan 31, 2026

Course project retake deadline

Lab session - Week 8 - Course project paper - Team session

The main goal of this lab session is to ensure that your team is all set for making serious progress writing the course project paper in the coming weeks.

Course project paper
Earlier, you have made a project planning. One upcoming deadline is the first version of your paper for the peer review session in 2 weeks.

Template. If you have not yet done so, download the paper template, make sure you can compile it in LaTeX, and understand how to use it. Remember to fill in the meta information.
Related work. If you have not yet done so, spend some time at Google Scholar investigating a) what other papers exist on this topic, b) which relevant papers cite your paper, and c) what important references are presented in your paper. You will likely want to include some of these in your paper as well. At least make sure that you are aware of the major works that cite your paper, and that you reference some of these works in your introduction when you sketch the context of your work.
Project contribution. For the course project, you go beyond the one paper that you were assigned, comparing techniques from multiple papers, for example comparing different algorithms or methods, using different validation metrics, or testing on (a) larger (number of diverse) datasets. Try to get your intended contribution written down in the appropriate section(s) of your paper.
Collaboration. How will you work together on the paper? Git? Overleaf? (the university provides free pro licenses). How will you share code, data and results with each other, and ensure integration of it all?
What is a good paper?. What defines a good paper? Discuss with your project partner to align your expections. When are you "done"? When is the paper "sufficient"? When is it "good"? And how about "excellent"?
Writing. Make serious progress with writing the first 3 to 4 sections of your course project paper in the coming two weeks, so that you have something substantial for the peer review session.
Planning. Look at the planning you made early October. Have you reserved time to review each other's texts? Are adjustments necessary?

As always: please ask course staff for help if you are still unsure about certain aspects of your project.

Need to freshen up or improve academic writing skills? See Academic writing: a practical guide and the The academic phrasebank. There is also the Leiden Science Skills platform.

Lab session - Week 6 - Introduction to REL Compute

The main goal of this short lab session is to get to know the data science lab, nowadays known as REL Compute.

If you are working remotely, learn how-to set up remote access to the LIACS Research and Education Laboratory (REL)

REL Compute Lab
The REL website provides necessary information and documentation. The emailadres to contact in case of access issues or other technical problems is rel@liacs.leidenuniv.nl.

Become familiar with the lab and how to run code on it, and how to place data within the lab. This may be handy for Assignment 2 and/or the course project. Remember:

/home/sXXXXXX, your homedirectory ~, is for your own code (don't put large stuff there),
/local is for local storage on the current machine you are are on,
/data is for storing data across data science lab machines.

Think of setting up passwordless login, consider using ProxyJump to avoid the gateway, and perhaps use sshfs to mount your remote homedirectory. Some IDEs also offer all of this functionality.

Instructions on setting up remote SSH compute (Oct 2025)

Assignment 2
Continue with the practical part of Assignment 2.

Lab session - Week 5 - Course project planning

The main goal of this lab session is to get started with both the Course project, and then with Assignment 2.

Course project
Below are some topics you can discuss and investigate together with your project team partner.

Project schedule. Have a look at the deadlines for the course project in the schedule on this website, and create your own sensible planning with your team partner.
Paper. Read your course project paper, and spend some time on Google Scholar investigating
a) what other papers exist on this topic (identify at least 3)
b) which relevant papers cite your paper, and for what reason (application, improvement, etc.)
c) what important references to previous work are presented in your paper (identify at least 3)
Project contribution. For the course project, you have to do something original. Ideally, this goes beyond the one paper that you were assigned, comparing techniques from multiple papers, for example comparing different algorithms or methods, using different validation metrics, or testing on (a) larger (number of diverse) datasets. Write down in at most 150 words what you plan to do for your course project, and feel free to discuss this with the lecturer or an assistant for feedback, this week or the next two. Also see the generic instructions for the Course project.

Done? Get started with Assignment 2.

Lab session - Week 2 - NetworkX

The main goal of this lab session is to become familiar with NetworkX (a Python package to analyze networks for research purposes).
All relevant information on NetworkX can be found in the NetworkX online documentation.

Your Python environment

For this lab session, you need a working Python environment. For this, there are two options:

Use your own self-installed Python environment, and choose your own editor and way of running code (via the command line, an IDE or an interactive notebook). Proceed to Instructions for today
Alternatively, you can use the desktop machines in the student computer rooms. One easy way to create a Python environment on the university computers, is to use conda. As of 2024, conda appears to be installed already. Therefore, you can likely skip step 1--3 below, and start at step 4.

Download Miniconda by running the following command in your terminal:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Install Miniconda:
bash Miniconda3-latest-Linux-x86_64.sh
The installer will ask you some questions. Normally, the standard location suffices and you do not have to add conda to PATH.
After installation finishes, close your terminal so that the changes take effect. Upon opening a new terminal, activate the conda virtual environment with:
source miniconda3/bin/activate
If all goes well, you will see that (base) appears in front of your shell prompt. This means that conda's base environment is active.

Create a new conda environment:
conda create --name snacs
And activate it with:
conda activate snacs
You might use conda for other courses as well, and version conflicts can arise when you install many packages. Using separate environments prevents this.
NOTE: some packages can make your conda environment take up a lot of space. So try not to use an unnecessary number of environments with duplicate packages. You can also remove an environment that you don't need anymore, see the conda cheatsheet.
Packages can easily be installed with conda. To install NetworkX, run:
conda install networkx
Getting started
- Running code from the command line. Now you should be able to run your code from the command line by typing
  python3 scriptname.py
- Running code in interactive Jupyter notebook from the university environment:
  conda install -c conda-forge jupyterlab
  After installing that once, the notebook/lab can be started using:
  jupyter lab
  (When using Jupyter through SSH, so when it is not running on the machine that you are currently working on, make sure to use port forwarding whilst SSH-ing, using options -g -L 8888:127.0.0.1:8888. Also add options --no-browser and --port=8888 when starting up Jupyter.)

Instructions for today: Lab session on NetworkX

Take some time to do the NetworkX tutorial.
Have a look at functionality to read and write graphs from/to disk, and in particular learn how to import an edge list. Understand the input format and ways of including things like Weight and other attributes of edges.
A lot of the common network metrics you may want to compute are implemented as NetworkX function or NetworkX algorithm. Become familiar with these, for example by computing measures such as degree assortativity, clustering, density, diameter and average distance.
While at it, why not take a look at how NetworkX relates to other data formats?
Try to load the network from the first lab session (small-gephiready.tsv) into NetworkX, and investigate some characteristic properties of the network, such as the degree distribution and distance distribution. Use theread_edgelist function and remember to select the correct separator (tab), and pay attention to the header of the file.
Get the Epinions network from the SNAP repository. You may need to fiddle with the precise header, but it is already in edge list format. Compute some common characteristics such as the degree distribution, and visualize them using appropriate figures or distributions, for example using pyplot/Matplotlib.
What are the differences between NetworkX and Gephi in terms of visualization and analysis capabilities?

Done? Proceed with Exercise 2 of Assignment 1.

Looking for a challenge? Check out these three alternatives (that you can also use instead of NetworkX throughout the course, if you prefer (but for which there is less help available)):

Python igraph, a Python version of the R igraph package.
Graph-Tool, a python graph analysis toolkit that leaves the hard computation to parallel (OpenMP) C++ code.
SNAP, which is entirely written in C++ and has many interesting features. Nowadays, there is also a Python version.

Running intro problems regarding prior knowledge on python programming? See the reading material at the bottom of this course website.

Lab session - Week 1 - Gephi

About social network analysis tools and packages. There exist different tools and packages for social network analysis. In this course, we cover two of them in this course, with complementary advantages:

Gephi, an easy-to-use tool with a graphical interface useful for visualization and quick analysis of relatively small network data (this week, see below).
NetworkX, an extensive Python package for network analysis that can handle larger network datasets and computations (next week).

Learning goals. The main goal of this lab session is to become familiar with Gephi (experimental beta-software to visualize networks for research purposes) and its input format. At the end of this session you should be able to:

Know how to use Gephi for social network analysis
Import and visualize raw network data with labeled nodes and labeled and/or weighted edges (directed or undirected),
Understand how to map edge and node size and color to structural network properties such as the node degree and edge type,
Know how to apply filters to the visualization, for example to focus only on the giant component,
Export a vector graphic PDF of your network for reuse in for example a presentation or paper,
Export computed node data for reuse in another program.

There is no deliverable for this lab session, but you are assumed to know the tool afterwards. Practice more at home if needed.

Instructions for today: Walk through the complete Gephi tutorial.

Note that the tutorial briefly covers topic such as centrality and communities, that will not be covered extensively until Lecture 2.

Done? Get started with the practical part of Assignment 1. You can download the datafiles here. If you want to analyze huge.tsv, you will have to get it from the shared folder in the ISSC Linux or LIACS DS lab environment, as stated in the assignment.

Course project

Teams work on a course project for 60% of the course grade. The project is about a certain topic (see list below) related to social network analysis, and has a paper as end product. The explanation below was also given in Lecture 2. The project consists of:

Giving a 15-20 minute presentation (plus 5-8 minutes for questions) of a paper corresponding to the topic. At least a Powerpoint/PDF presentation has to be given. If well advanced into the project (later in the semester), some demonstration of an implementation or visualization could also be given. Teams are also expected to provide feedback on some of the presentations given by their fellow students during the lectures.
Presentation pre-check: it is highly recommended to gather feedback on a draft version of the slides of your presentation no later than the Tuesday before your presentation; you can also show us your draft slides in the lab session the week before, to receive feedback.
Making a small contribution, i.e., doing something new compared to the paper on which the project is based. For example, a new tweak to an existing algorithm, a large number of datasets to test the algorithms to find a relation between the network characteristics and the performance, a new performance metric to evaluatie the algorithms, a new type of visualization of the algorithm or results, an improvement of a proof related to the algorithm, etc. In case of doubt about the contribution, contact the course staff well in advance; you can always ask for help.
Gathering and implementing the algorithms and/or techniques from the different papers, and running experiments on at least five large real-world network datasets. Teams will also give feedback on the code produced by other teams in the code review session. Some papers introduce multiple techniques. In that case, choose a logical subset to compare, and motivate your choice.
Datasets can for example be found at SNAP, BigDND, Networks Repository, Netzschleuder and (hopefully temporarily offline) KONECT and ICON.
Certain topics need particular datasets (e.g., with timestamps, signed links, etc.), which should of course be taken into account when selecting datasets.
Writing one 6 to 10 page paper. In the paper the different techniques are analyzed and compared in detail using extensive experiments. The paper, to be written in LaTeX, has to follow the format of an actual scientific paper. Your paper is divided into various logical sections and possible subsections using at least 6, but no more than 10 two-column pages. Students will also give feedback on the paper produced by another team in the peer review session.

Template

A template snacspaper.tex (with many instructions for content per section inside the TeX comments(!)) is available, which you can compile with pdflatex into a pdf. You also need this acmart.cls-file. We are using an adjusted version of the 2-column ACM conference proceedings template. Of course, feel free to use any additional LaTeX packages. Likely, you will need tikz, graphicx, etc.

Topics

This list of project topics is shown below. Please choose a topic (so, choose a number) with your team (consisting of two students). Register your choice in Brightspace. Topics can be chosen up to 3 times; we will split the group into 3 parallel tracks.
If you are retaking the course because you failed the project last year, you cannot choose the same topic as last year.

Anomaly detection:
oddball: Spotting Anomalies in Weighted Graphs
A Community-Aware Approach for Identifying Node Anomalies in Complex Networks
Anonymity in networks:
Anonymizing Social Networks
Privacy and uniqueness of neighborhoods in social networks
Centrality estimation:
Approximating Betweenness Centrality
Fast approximation of betweenness centrality through sampling
Centrality Estimation in Large Networks
Computing top-k Closeness Centrality Faster in Unweighted Graphs
Community detection:
Fast unfolding of communities in large networks
An information-theoretic framework for resolving community structure in complex networks
From Louvain to Leiden: guaranteeing well-connected communities
Stochastic blockmodels and community structure in networks
Core/periphery structure:
Hierarchical core-periphery structure in networks (plus new code)
Graph compression:
The webgraph framework I: compression techniques
Influence spread and virality:
Efficient influence maximization in social networks
Cost-effective outbreak detection in networks
Link prediction:
New perspectives and methods in link prediction
Predicting positive and negative links in online social networks
Supervised temporal link prediction in large-scale real-world networks
Network motifs:
Finding Conserved Patterns in Multilayer Networks
A Faster Algorithm for Detecting Network Motifs
Local Higher-Order Graph Clustering
Network embeddings:
node2vec: Scalable Feature Learning for Networks
GraRep: Learning Graph Representations with Global Structural Information
Sampling from networks:
Sampling from large graphs
Effects of missing data in social networks
Shortest paths:
Fast shortest path distance estimation in large networks
Visualization algorithms:
ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software
OpenOrd: an open-source toolbox for large graph layout
Efficient, High-Quality Force-Directed Graph Drawing

Note: scientific papers (ACM, Elsevier, etc.) can often only be opened from within the university domain (or from home via university SSH/Citrix/VPN/etc.). IEEE Explore papers can often be opened by looking them up via computer.org. Alternative links and preprints of papers can often be found through Google Scholar by searching for "Title of the paper". Contact course staff if you have tried all of these options and are still not able to access the paper (do not pay!).

Reading material

In the past, students have expressed interest in additional reading material to help freshen up on skills and knowledge required for this course.

To brush up on your Python programming skills, you could do Lecture 0, 1 and 4 of these Scientific Python Lectures.
To freshen up your LaTeX skills, you could do for example this LaTeX tutorial by Overleaf.
MIT has a nice online course on discrete mathematics. Relevant chapters: 5, 6 and 7.
The (Dutch) Leiden courses Algoritmiek (Algorithms) and Datastructuren (Data structures) might be also interesting to browse through.
Basic knowledge of concepts in machine learning and data mining are provided, e.g., at the Wikipedia machine learning page.

Past editions

The course was also given in 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 and 2024.