This course deals with computer science (CS) aspects of social network analysis (SNA), and is open to all students in the master computer science programme at Leiden University.
If you want to participate and are in a different programme, then you should contact the lecturer in advance.


Course information

Lectures: Fridays from 11:00 to 12:45 (Sep. 7 - Dec. 7) in Snellius room 174
Lab sessions: Fridays from 9:00 to 10:45 in room 306/308
Prerequisites: a CS bachelor with courses on Algorithms, Data Structures and Data Mining
Literature: provided papers and book chapters (free and digitally available)
Examination: based on presentation, paper, programming, peer review and participation (no exam)
Study points: 6 ECTS

Lecturer: dr. Frank Takes - f.w.takes@liacs.leidenuniv.nl, room 157b
Assistant lecturer: Anna Latour MSc - a.l.d.latour@liacs.leidenuniv.nl, room 123
Course assistant: Antonio Barata MSc - a.p.pereira.barata@liacs.leidenuniv.nl, room 150
Student assistant: Hanjo Boekhout Bsc - h.d.boekhout@umail.leidenuniv.nl

Need help? Ask your questions during the lab sessions. Otherwise contact snacs@liacs.leidenuniv.nl
[Network visualization image]

Network with 1458 nodes and 1948 edges.


Course schedule

  Date First slot (9:00-10:45) Second slot (11:00-12:45)
1. Sep 7, 2018 No activities Lecture 0: Course organization
Lecture 1: Introduction and small world phenomenon
2. Sep 14, 2018 Gephi tutorial
Work on Assigment 1
Lecture 2: Advanced concepts & centrality
(including course project information)
3. Sep 21, 2018 NetworkX tutorial
Work on Assigment 1
Lecture 3: Network projection & community detection
Lecture 3.1: Presentation 0
4. Sep 28, 2018 Work on Assigment 1 Lecture 4: Networks on the web
Oct 1, 2018 Deadline for Assignment 1: Click here to upload your assignment
5. Oct 5, 2018 DS Lab, Course project planning
Work on Assigment 2
Lecture 5: Network dynamics & Processes on networks
6. Oct 12, 2018 Work on Assigment 2
Lecture 6: Network motifs & Network science applications
7. Oct 19, 2018 Work on Assigment 2
Student presentation 1 (11:00): Network sampling
Student presentation 2 (12:00): Community detection 2
8. Oct 26, 2018 Work on Assigment 2
Work on course project paper
Student presentation 3 (11:00): Influence spread and virality 1
Student presentation 4 (room 174, 12:00): Top-k closeness centrality
Student presentation 5 (room 313, 12:00): Shortest paths 1
Oct 29, 2018 Deadline for Assignment 2
9. Nov 2, 2018 Work on course project paper Student presentation 6 (room 174, 11:00): Influence spread and virality 2
Student presentation 7 (room 174, 12:00): Betweenness centrality
Student presentation 8 (room 408, 11:00): Neighborhoods
Student presentation 9 (room 408, 12:00): Diameter computation
Nov 9, 2018 Deadline for the first half of the course project paper (for the peer review session)
10. Nov 9, 2018 Work on course project Peer review session
11. Nov 16, 2018 Work on course project Student presentation 10 (room 174, 11:00): Counting triangles
Student presentation 11 (room 174, 12:00): Visualization algorithms 2
Student presentation 12 (room 408, 11:00): Community detection 1
Student presentation 13 (room 408, 12:00): Link prediction
Nov 19, 2018 Deadline (optional) for preliminary version of course project paper
12. Nov 23, 2018 Work on course project Student presentation 14 (room 174, 11:00): Shortest paths 2
Student presentation 15 (room 174, 12:00): Graph compression
Student presentation 16 (room 408, 11:00): De-anonymization of networks
Student presentation 17 (room 408, 12:00): Network data errors
Nov 30, 2018 Deadline for a substantial amount of code / experimental pipeline of the course project (for the code review session)
13. Nov 30, 2018 Code review session Student presentation 18 (room 174, 11:00): Link prediction in signed networks
Student presentation 19 (room 174, 12:00): Personalized PageRank
Student presentation 21 (room 408, 11:00): Closeness centrality
Student presentation 22 (room 408, 12:00): Visualization algorithms 1
Dec 7, 2018 Work on course project paper No lecture
Dec 12, 2018 Deadline for the final version of the course project paper
Dec 19, 2018 Retake deadline for assignments
Dec. 21, 2018 Course end. Grades will be submitted to the student administration

Lab session Friday October 5 in room 306/308

The main goal of this lab session is to get started with both the course project and with Assignment 2, and to get to know the data science lab.

Data Science Lab
The data science lab website provides necessary information and documentation. Become familiar with the lab and how to run code on it, and how to place data within the lab. Remember: your homedirectory is for your own code, /local is for local storage on the current machine you are are on, and /data is for storing data across data science lab machines. You may want to use the lab's facilities for course assignments and your course project.

Course project
Below are some topics you can discuss and investigate together with your project team partner.

  1. Project schedule. Have a look at the deadlines for the course project in the schedule on this website, and create a sensible planning with your team partner.
  2. Paper. Read your course project paper, and spend some time at Google Scholar investigating a) what other papers exist on this topic, b) which relevant papers cite your paper, and 3) what important references are presented in your paper.
  3. Data repositories. Check out the data repositories of real (social) networks such as SNAP and KONECT and think of datasets suitable for your course project.
  4. Project contribution. For the course project, you have to do something original. Ideally, this goes beyond the one paper that you were assigned, comparing techniques from multiple papers, for example comparing different algorithms or methods, using different validation metrics, or testing on (a) larger (number of diverse) datasets. Write down in at most 300 words what you plan to do for your course project, and discuss this with the lecturer or an assistant for feedback.

Assignment 2
Get started with the practical part of Assignment 2.
Note that the large Twitter dataset are also available in the data science lab in the folder /data/SNACS/.


Lab session Friday September 21 in room 306/308

We will again use the UNIX/Linux workstations in Snellius room 306/304 for this lab session.
There is again no deliverable for this lab session.
The main goal of this lab session is to become familiar with NetworkX (a Python package to analyze networks for research purposes).
All relevant information on NetworkX can be found in the NetworkX online documentation.

Python instructions to use NetworkX: NetworkX should be installed on all machines.
Unfortunately, the past years have learned us that machines sometimes exhibit diversity in consistency in package availability on these machines. In some years, instructions to get NetworkX running could be found in /vol/share/software/datascience/2016-README.
If you run into issues, it may be productive to forget about preinstalled packages and use Anaconda and install required packages yourself.

To use Python in your own environment:
wget http://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh
bash Anaconda3-4.3.0-Linux-x86_64.sh

Accept questions with 'yes'.
source anaconda3/bin/activate
You should now have your own environment in which you can install packages as you wish.

  1. Take some time to do the NetworkX tutorial.
  2. Have a look at functionality to read and write graphs from/to disk.
  3. A lot of the common network metrics you may want to compute are implemented as NetworkX function. Become familiar with these.
  4. Try to load the network from the first lab session (/vol/share/groups/liacs/scratch/SNACS/small-gephiready.tsv) into NetworkX, and investigate some characteristic properties of the network, such as the degree distribution. Use the read_edgelist function.
  5. Get the Epinions network from the SNAP repository. Compute some common characteristics, and visualize them using appropriate distributions.
  6. What are the differences between NetworkX and Gephi in terms of visualization and analysis capabilities?
  7. Proceed with Exercise 2 of Assignment 1.

Looking for a challenge? Check out Graph-Tool, a python graph analysis toolkit that leaves the hard computation to parallel (OpenMP) C++ code. Or snap, which is entirely written in C++ (although there is now also a Python plugin).


Lab session Friday September 14 in room 306/308

We will use the UNIX/Linux Ubuntu workstations in the Snellius computer rooms for this lab session.
This ICT FAQ may provide links to more information on how to use these workstations.
If you really must, you can, at the risk of bugs and problems that course personnel can probably not help you solve, attempt to use Windows or the even more unstable version of Gephi for Macintosh.


Learning goals

The main goal of this lab session is to become familiar with Gephi (experimental beta-software to visualize networks for research purposes) and its input format. At the end of this session you should be able to import and visualize raw network data with labeled nodes and labeled and/or weighted edges (directed or undirected), and you should understand how to map edge and node size and color to structural network properties such as the node degree. You should be able to export computed node data and be able to export a vector graphic PDF of your network. There is no deliverable for this lab session.


Instructions

  1. Extract and run Gephi locally on a student workstation (running it from your homedirectory may be slow), for example on the local disk in /scratch/gephi. Installation instructions, if needed, can be found on the Gephi website.
  2. Run Gephi by starting the Linux executable bin/gephi. If this does not immediately work, you may have to adjust one line in etc/gephi.conf so that Gephi can find the right Java version (and remove the # in front of the line):
    jdkhome="/usr/lib/jvm/java-8-openjdk-amd64/"
  3. The Gephi website has various official tutorials that you can walk through; in particular Quick start, Visualization and Layouts.
    Note: The Quick Start Guide is for a slightly older version of Gephi. On p. 9-10, contrary to what is indicated in the getting started guide, the ranking module is part of the appearance module (since Gephi 0.9), and does not appear as a separate tab. The “Ranking result” table no longer exists, and you can skip that part of the tutorial. In the tutorial on Visualization, on p. 9 the 3-D visualisation option was bugged and has been removed in Gephi 0.9. You can skip that part of the tutorial.
    You may instead benefit from using a more modern tutorial (courtesy of Derek Geene at UCD).
  4. Try out some of the Gephi datasets and see if you can understand how they differ in terms of density, degree distribution, giant components, clustering and average distance.
  5. Learn how to import raw data from this short data import tutorial. Visualize the network represented by the edge list /vol/share/groups/liacs/scratch/SNACS/small-gephiready.tsv.
    Compute the degree for each node and export it to a file (which you can for example reuse in other plotting tools).
  6. Make a new, very simple, edge list input file of a network with say 10 nodes and 20 edges, and import it into Gephi. Now add to the input file (in some order) link (un)direction, link weights, node labels and link labels. For example, have nodes represent students, and link represent friendships, and choose node and edge labels accordingly. Make sure that you are able understand the difference in file input and visualization output.
  7. Play around with some different visualization algorithms such as ForceAtlas 2 (for example, with scaling set to a higher value and with stronger gravity checked) and Fruchterman Reingold. Export your colored visualization to PDF so that you have it as a vector graphic.
  8. Get started with Exercise 2 (the practical part) of Assignment 1.

Course project

Teams work on a course project for 60% of the course grade, is about a certain topic related to social network analysis, and consists of:

  • Giving a 30-minute presentation (+15 minutes for questions) on one of the papers on their topic. Some topics include overview papers, these should not be presented. A paper presenting at least some experiments, should be chosen. At least a Powerpoint/PDF presentation and some demonstration of an implementation or visualization will be required. Teams are also expected to provide feedback on some of the presentations given by their fellow students during the lectures.
  • Gathering and implementing the algorithms and/or techniques from the different papers, and running experiments on at least five large real-world network datasets. Thus, some programming has to be done. Teams will also give feedback on the code produced by other teams in the code review session. Some papers introduce multiple techniques. In that case, choose a logical subset to compare, and motivate your choice.
    Datasets can for example be found at SNAP, KONECT, BigDND, Networks Repository, ICON, LASAGNE and ASU. Certain topics need particular datasets (e.g. with timestamps, signed links, etc.), which should of course be taken into account when selecting datasets.
    As a rule, the course project paper should always have one novel (new) aspect compared to the papers on which it is based. For example, a new tweak to an existing algorithm, a large number of datasets to test the algorithms to find a relation between the network characteristics and the performance, a new performance metric to evaluatie the algorithms, a new type of visualization of the algorithm or results, an improvement of a proof related to the algorithm, etc. In case of doubt about the contribution, contact the course staff; you can always ask for help.
  • Writing one 6 to 10 page paper. In the paper the different techniques are analyzed and compared in detail using extensive experiments. The paper, to be written in LaTeX, has to follow the format of an actual scientific paper. Students will also give feedback on the paper produced by another team in the peer review session.

Instructions for the course project paper are available.


Project topics

This list of projects is shown below. Take a look at the paper, before choosing a topic. E-mail the lecturer at f.w.takes@liacs.leidenuniv.nl with your topic preference and names of the two team members. First come, first serve. Your topic is not confirmed until you receive an e-mail from me. You have until September 28 to e-mail your preferred topic. After that, a final division of topics and formation of teams will be made on October 6 during the lecture. Prefix [Chosen] indicates a topic is no longer available.

  1. [Chosen] Shortest paths 1
  2. [Chosen] Shortest paths 2
  3. [Chosen] Diameter computation
  4. [Chosen] Network sampling
  5. [Chosen] Graph compression
  6. Betweenness centrality 1
  7. Betweenness centrality 2
  8. [Chosen] Closeness centrality
  9. [Chosen] Top-k closeness centrality
  10. [Chosen] Link prediction
  11. [Chosen] Influence spread and virality 1
  12. [Chosen] Influence spread and virality 2
  13. [Chosen] Community detection 1
  14. [Chosen] Community detection 2
  15. [Chosen] Visualization algorithms 1
  16. [Chosen] Visualization algorithms 2
  17. Network motifs 1
  18. Temporal network motifs 2
  19. [Chosen] De-Anonymization of networks
  20. [Chosen] Neighborhoods
  21. [Chosen] Link prediction in signed networks
  22. [Chosen] Personalized PageRank
  23. [Chosen] Network data errors
  24. [Chosen] Counting triangles

This list is still updated regularly as teams are formed and topics are chosen.

Note: scientific papers (ACM, Elsevier, etc.) can often only be opened from within the university domain (or from home via SSH/VPN/etc.).
IEEE Explore papers can often be opened by looking them up via computer.org. Alternative links and preprints of papers can often be found through Google Scholar by typing in the title of the paper.


Reading material

Some students have expressed interest in additional reading material to help freshen up on skills and knowledge required for this course.


Course description

See the e-Studyguide for a more general description.

Topics include: SNA from a CS perspective (graph representation, complexity issues, examples), Graph Structure (power law, small world phenomenon, clustering coefficient, hierarchies), Paths and Distances (neighborhoods, radius, diameter), Spidering and Sampling (BFS, forest fire, random walks), Graph Compression (graph grammars, bitwise tricks, encryption, hashing), Centrality (degree centrality, closeness centrality, betweenness centrality, rating and ranking), Centrality and Webgraphs (HITS, PageRank, structure of the web), Community Detection (spectral clustering, modularity), Visualization (force-based algorithms, Gephi, NodeXL), Graph Models (random graphs, preferential attachment), Link Prediction (structure, semantics, prediction algorithms, graph mining), Contagion (diffusion of information, spreading activation, gossipping) and Privacy and Anonymity ((de-)anonymizing graphs, ethical aspects, privacy issues) and various other topics that have been added over the years but are not yet in the list above.

The course was also given in 2014, 2015, 2016 and 2017.