Social Network Analysis for Computer Scientists

This is the old course page! See the new page for recent information!

This course deals with computer science (CS) aspects of social network analysis (SNA), and is open to all students in the master computer science programme at Leiden University.

Next edition of the course is in Fall 2020. Information will be available near the end of summer break.
Comments from your evaluation forms of the fall 2019 edition of the course are available.

Course information

Lectures: Fridays from 11:15 to 13:00 (Sep. 6 - Dec. 13) in Snellius room 174
Lab sessions: Fridays from 9:15 to 11:00 in room 306/308 (not on Sep. 6)
Prerequisites: a CS bachelor with courses on Algorithms, Data Structures and Data Mining
Literature: provided papers and book chapters (free and digitally available)
Examination: based on presentation, paper, programming, peer review and participation (no exam)
Blackboard course code: 4343SNACS-1920FWN
Study points: 6 ECTS

Lecturer: dr. Frank Takes - f.w.takes@liacs.leidenuniv.nl, room 157b
Assistant lecturer: Anna Latour MSc - a.l.d.latour@liacs.leidenuniv.nl, room 123
Course assistant: Antonio Barata MSc - a.p.pereira.barata@liacs.leidenuniv.nl, room 126
Student assistants: Hanjo Boekhout Bsc - h.d.boekhout@umail.leidenuniv.nl and Jeroen Rook BSc - j.g.rook@umail.leidenuniv.nl.

Network with 1458 nodes and 1948 edges.

Need help? Ask your questions during the lab sessions. If it is more urgent, walk by the lecturer or assistant's office. If they are not around, contact snacs@liacs.leidenuniv.nl.

Course schedule

	Date	First slot (9:15-11:00)	Second slot (11:15-13:00)
1.	Sep 6, 2019	No activities	Lecture 0: Course organization Lecture 1: Introduction and small world phenomenon
2.	Sep 13, 2019	Gephi tutorial Work on Assigment 1	Lecture 2: Advanced concepts and centrality
3.	Sep 20, 2019	NetworkX tutorial Work on Assigment 1	Lecture 3: Network projection and community detection Presentation 0: Diameter computation
4.	Sep 27, 2019	Work on Assigment 1	Lecture 4: Web structure and propagation-based centrality
	Sep 30, 2019	Deadline for Assignment 1 (hand in via Blackboard)
	Oct 3, 2019	No lecture	No lecture
5.	Oct 11, 2019	Course project planning Work on Assigment 2	Lecture 5: Network dynamics & Processes on networks
6.	Oct 18, 2019	Work on Assigment 2	Two parallel sessions of two presentations each: Presentation 1 (405, 11:15): Sampling networks Presentation 2 (405, 12:15): Influence spread and virality 2 Presentation 3 (174, 11:15): Community detection 4 Presentation 4 (174, 12:15): Link prediction in signed networks
7.	Oct 25, 2019	Course project paper Work on Assigment 2	Presentation 5 (405, 11:15): Community detection 2 Presentation 6 (405, 12:15): De-Anonymization of networks Presentation 7 (174, 11:15): Shortest paths 1 Presentation 8 (174, 12:15): Data errors in networks
	Oct 28 29, 2019	Deadline for Assignment 2 (hand in via Blackboard)
8.	Nov 1, 2019	Work on course project	Presentation 9 (405, 11:15): Motifs in multilayer networks Presentation 10 (405, 12:15): Community detection 1 Presentation 11 (174, 11:15): Anomaly detection in networks Presentation 12 (174, 12:15): Community detection 3
	Nov 7, 2019	Deadline for bringing (printed, in two fold) a first version of the course project paper
9.	Nov 8, 2019	Work on course project	Peer review session
10.	Nov 15, 2019	Work on course project	Presentation 13 (405, 11:15): Link prediction Presentation 14 (174, 11:15): Closeness centrality 2 (top-k) Presentation 15 (174, 12:15): Motifs in temporal networks
	Nov 18, 2019	Deadline (optional) for handing in a preliminary version of the course project paper (via ~~Blackboard~~ e-mail)
11.	Nov 22, 2019	Work on course project	Code review session Final collaborative document A still editable version (visit at own risk)
12.	Nov 29, 2019	Work on course project Presentation 16 (402, 10:15): Network embeddings 1 Presentation 25 (403, 10:15): Motifs in networks	Presentation 17 (174, 11:15): Shortest paths 2 Presentation 18 (405, 12:15): Closeness centrality 1 Presentation 19 (174, 11:15): Influence spread and virality 1 Presentation 20 (174, 12:15): Resilience of networks
13.	Dec 6, 2019	Finalize course project	Presentation 21 (405, 11:15): Personalized PageRank Presentation 22 (405, 12:15): Betweenness centrality 1 Presentation 23 (174, 11:15): Neighborhood approximation Presentation 24 (174, 12:15): Visualization algorithms 2
	Dec 9, 2019	Deadline for Course Project
	Dec 16, 2019	Retake deadline for assignments
	Dec 16, 2019	Deadline for filling in the individual Course Project Evaluation Form (hand in via Blackboard)
	Dec. 20, 2018	Course end. Grades will be submitted to the student administration

Lab session - Week 7 - Course project paper

The main goal of this lab session is to get started with the course project paper.

Course project paper
During the lab session of two weeks ago, you have made a project planning. One upcoming deadline is the first version of your paper for the peer review session.

Template. Download the paper template and make sure you can compile it and have a way of collaborating on the paper with your team partner.
Related work. If you have not yet done so, spend some time at Google Scholar investigating a) what other papers exist on this topic, b) which relevant papers cite your paper, and c) what important references are presented in your paper. You will likely want to include some of these in your paper as well.
Project contribution. For the course project, you have to do something original. Ideally, this goes beyond the one paper that you were assigned, comparing techniques from multiple papers, for example comparing different algorithms or methods, using different validation metrics, or testing on (a) larger (number of diverse) datasets. Try to get this written down in the appropriate section(s) of your paper.
Writing. Make serious progress with writing the first 3 to 4 sections of your course project paper in the coming two weeks, so that you have something substantial for the peer review session.

Assignment 2
Continue with Assignment 2.

Lab session - Week 5 - Course project planning

The main goal of this lab session is to get started with both the course project and with Assignment 2, and to get to know the data science lab.

Data Science Lab
The data science lab website provides necessary information and documentation. Become familiar with the lab and how to run code on it, and how to place data within the lab. Remember: your homedirectory is for your own code, /local is for local storage on the current machine you are are on, and /data is for storing data across data science lab machines. You may want to use the lab's facilities for course assignments and your course project.

Course project
Below are some topics you can discuss and investigate together with your project team partner.

Project schedule. Have a look at the deadlines for the course project in the schedule on this website, and create a sensible planning with your team partner.
Paper. Read your course project paper, and spend some time at Google Scholar investigating a) what other papers exist on this topic, b) which relevant papers cite your paper, and c) what important references are presented in your paper.
Data repositories. Check out the data repositories of real (social) networks such as SNAP and KONECT and think of datasets suitable for your course project.
Project contribution. For the course project, you have to do something original. Ideally, this goes beyond the one paper that you were assigned, comparing techniques from multiple papers, for example comparing different algorithms or methods, using different validation metrics, or testing on (a) larger (number of diverse) datasets. Write down in at most 200 words what you plan to do for your course project, and feel free to discuss this with the lecturer or an assistant for feedback. Also see the generic instructions for the Course Project.

Assignment 2
Get started with the practical part of Assignment 2.

Lab session - Week 3 - NetworkX

The main goal of this lab session is to become familiar with NetworkX (a Python package to analyze networks for research purposes).
All relevant information on NetworkX can be found in the NetworkX online documentation.
We will again use the UNIX/Linux workstations for this lab session, and there is again no deliverable.

Installation.. To use Python in your own environment, first go (or make) the directory where you will work in for the course. For example ~/Documents/SNACS
Next, create a virtual environment with the following command:
virtualenv -p /usr/bin/python3.5 snacsenv
You can activate this environment with the following command:
source snacsenv/bin/activate
You can install the packages you want with the pip command:
pip install numpy matplotlib networkx

Now you should be able to run your code from the command line. Next time you log in and need to use the environment, you only need to use the source command
Bonus: If you want to use a jupyter notebook instead, this is possible with the following commands:
pip install jupyter jupyter notebook --ip=127.0.0.1 —port=8888

Take some time to do the NetworkX tutorial.
Have a look at functionality to read and write graphs from/to disk, and in particular learn how to import an edge list.
A lot of the common network metrics you may want to compute are implemented as NetworkX function or NetworkX algorithm. Become familiar with these, for example by computing measures such as degree assortativity, clustering, density, diameter and average distance.
Try to load the network from the first lab session (/vol/share/groups/liacs/scratch/SNACS/small-gephiready.tsv) into NetworkX, and investigate some characteristic properties of the network, such as the degree distribution. Use the read_edgelist function and remember to select the correct separator (tab).
Get the Epinions network from the SNAP repository. Compute some common characteristics such as the degree distribution, and visualize them using appropriate figures of distributions, for example using Matplotlib.
What are the differences between NetworkX and Gephi in terms of visualization and analysis capabilities?
Proceed with Exercise 2 of Assignment 1.

Looking for a challenge? Check out Graph-Tool, a python graph analysis toolkit that leaves the hard computation to parallel (OpenMP) C++ code. Or snap, which is entirely written in C++ (although there is now also a Python plugin).

Lab session - Week 2 - Gephi

We will use the UNIX/Linux Ubuntu workstations in the Snellius computer rooms for this lab session. This ICT FAQ may provide links to more information on how to use these workstations. Download Gephi 0.9.2 for windows or Gephi 0.9.2 for Linux (mirrors provided because of slow speed of gephi.org downloads).

Learning goals

The main goal of this lab session is to become familiar with Gephi (experimental beta-software to visualize networks for research purposes) and its input format. At the end of this session you should be able to import and visualize raw network data with labeled nodes and labeled and/or weighted edges (directed or undirected), and you should understand how to map edge and node size and color to structural network properties such as the node degree. You should be able to export computed node data and be able to export a vector graphic PDF of your network. There is no deliverable for this lab session.

Instructions

Extract and run Gephi locally on a student workstation (running it from your homedirectory is possible, but may be slow), for example on the local disk in /scratch/gephi. Installation instructions, if needed, can be found on the Gephi website.
Run Gephi by starting the Linux executable bin/gephi. If this does not immediately work, you may have to adjust one line in etc/gephi.conf so that Gephi can find the right Java version (and remove the # in front of the line):
jdkhome="/usr/lib/jvm/java-8-openjdk-amd64/"
The Gephi website has various official tutorials that you can walk through; in particular Quick start, Visualization and Layouts.
Note: The Quick Start Guide is for a slightly older version of Gephi. On p. 9-10, contrary to what is indicated in the getting started guide, the ranking module is part of the appearance module (since Gephi 0.9), and does not appear as a separate tab. The “Ranking result” table no longer exists, and you can skip that part of the tutorial. In the tutorial on Visualization, on p. 9 the 3-D visualisation option was bugged and has been removed in Gephi 0.9. You can skip that part of the tutorial.
You may instead benefit from using a more modern tutorial (courtesy of Derek Geene at UCD).
Try out some of the Gephi datasets and see if you can understand how they differ in terms of density, degree distribution, giant components, clustering and average distance.
Learn how to import raw data from this short data import tutorial. Visualize the network represented by the edge list /vol/share/groups/liacs/scratch/SNACS/small-gephiready.tsv, also found here.
Compute the degree for each node and export it to a file (which you can for example reuse in other plotting tools).
Make a new, very simple, edge list input file of a network with say 10 nodes and 20 edges, and import it into Gephi. Now add to the edge input file (in some order) link (un)direction, link weights and link labels.
Also import a node list file with node labels and other properties. Be sure to append it to your existing workspace.
For example, have nodes represent students, and link represent friendships, and choose node and edge labels accordingly. Make sure that you are able understand the difference in file input and visualization output, and that you are comfortable with importing and exporting data into and from Gephi.
Play around with some different visualization algorithms such as ForceAtlas 2 (for example, with scaling set to a higher value and with stronger gravity checked) and Fruchterman Reingold. Export your colored visualization to PDF so that you have it as a vector graphic.
Get started with Exercise 2 (the practical part) of Assignment 1. You can also download the smaller datafiles medium.tsv and large.tsv. If you want to analyze huge.tsv, you will have to get it from the shared folder as stated in the assignment.

Course project

Teams work on a course project for 60% of the course grade. The project is about a certain topic related to social network analysis, and the project consists of:

Giving a 30-minute presentation (+15 minutes for questions) of the paper corresponding to the topic. At least a Powerpoint/PDF presentation and some demonstration of an implementation or visualization has to be given. Teams are also expected to provide feedback on some of the presentations given by their fellow students during the lectures.
Gathering and implementing the algorithms and/or techniques from the different papers, and running experiments on at least five large real-world network datasets. Thus, some programming has to be done. Teams will also give feedback on the code produced by other teams in the code review session. Some papers introduce multiple techniques. In that case, choose a logical subset to compare, and motivate your choice.
Datasets can for example be found at SNAP, KONECT, BigDND, Networks Repository, ICON, LASAGNE and ASU. Certain topics need particular datasets (e.g. with timestamps, signed links, etc.), which should of course be taken into account when selecting datasets.
As a rule, the course project paper should always have one novel (new) aspect compared to the papers on which it is based. For example, a new tweak to an existing algorithm, a large number of datasets to test the algorithms to find a relation between the network characteristics and the performance, a new performance metric to evaluatie the algorithms, a new type of visualization of the algorithm or results, an improvement of a proof related to the algorithm, etc. In case of doubt about the contribution, contact the course staff; you can always ask for help.
Writing one 6 to 10 page paper. In the paper the different techniques are analyzed and compared in detail using extensive experiments. The paper, to be written in LaTeX, has to follow the format of an actual scientific paper. Students will also give feedback on the paper produced by another team in the peer review session.

Instructions for the course project paper are available.

Project topics

This list of projects is shown below. Take a look at the paper, before choosing a topic. E-mail the lecturer at f.w.takes@liacs.leidenuniv.nl with your topic preference and names of the two team members. First come, first serve. Your topic is not confirmed until you receive an e-mail from me. You have until September 26 to e-mail your preferred topic. After that, a final division of topics and formation of teams will be made on September 27 during the lecture. Prefix [Chosen] indicates a topic is no longer available.

[Chosen] Anomaly detection in networks
[Chosen] Betweenness centrality 1
Betweenness centrality 2
[Chosen] Closeness centrality 1
[Chosen] Closeness centrality 2 (top-k)
[Chosen] Community detection 1
[Chosen] Community detection 2
[Chosen] Community detection 3
[Chosen] Community detection 4
[Chosen] Data errors in networks
Diameter computation
[Chosen] De-Anonymization of networks
Graph compression
[Chosen] Influence spread and virality 1
[Chosen] Influence spread and virality 2
[Chosen] Link prediction
[Chosen] Link prediction in signed networks
[Chosen] Motifs in networks
Motifs in multilayer networks
[Chosen] Motifs in temporal networks
[Chosen] Neighborhood approximation
[Chosen] Network embeddings 1
Network embeddings 2
[Chosen] Personalized PageRank
[Chosen] Resilience of networks
[Chosen] Sampling networks
[Chosen] Shortest paths 1
[Chosen] Shortest paths 2
Triangle counting
Visualization algorithms 1
[Chosen] Visualization algorithms 2
Visualization algorithms 3

This list is still updated regularly as teams are formed and topics are chosen.

Note: scientific papers (ACM, Elsevier, etc.) can often only be opened from within the university domain (or from home via SSH/VPN/etc.). IEEE Explore papers can often be opened by looking them up via computer.org. Alternative links and preprints of papers can often be found through Google Scholar by searching for "title of the paper".

Reading material

Some students have expressed interest in additional reading material to help freshen up on skills and knowledge required for this course.

To brush up on your Python programming skills, you could do Lecture 0, 1 and 4 of these Scientific Python Lectures.
To freshen up your LaTeX skills, you could do for example this tutorial.
MIT has a nice online course on discrete mathematics. Relevant chapters: 5, 6 and 7.
The (Dutch) Leiden courses Algoritmiek (Algorithms) and Datastructuren (Data structures) might be also interesting to browse through.
Basic knowledge of the concepts in machine learning and data mining are provided, for example, at the Wikipedia machine learning page.

Course description

See the e-Studyguide for a more general description.

Topics include: SNA from a CS perspective (graph representation, complexity issues, examples), Graph Structure (power law, small world phenomenon, clustering coefficient, hierarchies), Paths and Distances (neighborhoods, radius, diameter), Spidering and Sampling (BFS, forest fire, random walks), Graph Compression (graph grammars, bitwise tricks, encryption, hashing), Centrality (degree centrality, closeness centrality, betweenness centrality, rating and ranking), Centrality and Webgraphs (HITS, PageRank, structure of the web), Community Detection (spectral clustering, modularity), Visualization (force-based algorithms, Gephi, NodeXL), Graph Models (random graphs, preferential attachment), Link Prediction (structure, semantics, prediction algorithms, graph mining), Contagion (diffusion of information, spreading activation, gossipping) and Privacy and Anonymity ((de-)anonymizing graphs, ethical aspects, privacy issues) and various other topics that have been added over the years but are not yet in the list above.

The course was also given in 2014, 2015, 2016, 2017 and 2018.