Social Network Analysis for Computer Scientists

This is the old course page! See the new page for recent information!

This course deals with computer science (CS) aspects of social network analysis (SNA), and is part of the master computer science programme at Leiden University.

News:
- The textual comments from the 2014 evaluation forms are available here.
- All papers have been graded. You should have received an e-mail containing your team's grade. Grades have also been submitted to the student administration.
- Looking for a skilled and ambitious student to do an 18 or 42 ECTS master project on Corporate Social Network Analysis. Contact me if you are interested!

Course information

Time and place: Fridays from 11:15 to 13:00 in the fall semester of 2014 in Snellius room 403.
Prerequisites: basic knowledge of Algorithms and Data Mining
Literature: provided papers (no book)
Examination: based on presentation, paper, programming, peer review and participation (no exam). Specifically, an individual homework assignment (20% of the grade) and a course project in teams, consisting of a presentation (30%), course project paper (50%) and small peer review assignment as main deliverables.
Study points: 6 ECTS
Lecturer: Frank Takes - ftakes@liacs.nl, Snellius room 156a

Course schedule

	Date	Topic(s)
1.	Sep 5, 2014	Lecture 1: Introduction and small world phenomenon
2.	Sep 12, 2014	Lecture 2: Centrality, distance measures and densest subgraphs Example presentation: Determining the Diameter of Small World Networks
	Sep 19, 2014	Lab session from 9.00 to 10.45 in room 302
3.	Sep 19, 2014	Lecture 3: Structure of the web and propagation-based centrality algorithms Deadline for choosing a course project topic
4.	Sep 26, 2014	Deadline for handing in the homework assignment Lecture 4: Temporal social network analysis
5.	Oct 10, 2014	Student presentation 1: Link prediction Homework evaluation and course project progress discussion
6.	Oct 17, 2014	Student presentation 2: Diameter computation Student presentation 3: Visualization
7.	Oct 24, 2014	Student presentation 4: Graph compression/representation Student presentation 5: Community detection
8.	Oct. 31, 2014	Deadline for first three paper sections (conditionally extended from Oct. 24) Peer review session On-site peer review in team pairs
9.	Nov 7, 2014	Student presentation 6: Graph partitioning Student presentation 7: Virality and outbreak detection (2)
10.	Nov 14, 2014	Student presentation 8: Virality and outbreak detection (1) Student presentation 9: Sampling methods
11.	Nov 21, 2014	Optional deadline for intermediary paper "check" Student presentation 12: Shortest path computation Student presentation 11: Neighborhoods
12.	Nov 28, 2014	Student presentation 12: Closeness centrality Course evaluation and feedback
13.	Dec. 4, 2014 Dec. 5, 2014	Evaluation sessions cf. schedule Evaluation sessions cf. schedule
	Dec. 12, 2014	23:59:59: Deadline for full course project paper (conditionally extended from Dec. 5)

Network with 1458 nodes and 1948 edges.

Lab session Friday September 19 in room 302

We will use the UNIX workstations in room 302 for this lab session. If you really must, you can attempt to use Windows, fully at your own risk (you will at least need to edit gephi.conf so that it points to the correct Windows JRE version 6 or 7 (and not 8)). Some information on the workstations is provided by the ISSC, see their pages on the workstations, SSH access and remote access to UNIX systems with a configuration similar to the workstations.
There is no deliverable for this lab session. The main goal of this lab session is to become familiar with Gephi (experimental beta-software to visualize networks for research purposes) and its input format.

Extract and run Gephi locally on a student workstation, for example in your homedirectory (might be slow) or on the local disk in /scratch/gephi
Take some time to do the three Gephi tutorials: Quick start, Visualization and Layouts.
Try out some of the Gephi datasets
Make a very simple edge list input file of a network with say 10 nodes and 20 edges, and import it into Gephi. Now add to the input file (in some order) link direction, link weights, node labels and link labels. Make sure that you are able understand the difference in file input and visualization output.
Play around with some different visualization algorithms such as ForceAtlas 2 and Fruchterman Reingold.
Compute the PageRank of each node, and then use the "Ranking" tab on the left side to visualize the node's PageRank values using node color or node size.
Use the built-in Modularity algorithm to do some community detection, and visualize the result using the Partition tab on the left side. Vary the resolution parameter and observe the result.
Get started with Exercise 3 of the homework assignment.

Course project

Teams work on a course project, consisting of:

One 30-minute presentation (+15 minutes for questions) on an algorithm from a recent CS paper on a course-related problem (30% of the grade). At least a Powerpoint/PDF presentation and some demonstration of an implementation or visualization will be required. Students are also expected to provide feedback on some of the presentations given by their fellow students during the lectures.
One 10 to 16 page paper (50% of the grade). Teams should study two to four additional papers that consider algorithms for solving the same problem as discussed in the presentation. The algorithms should be implemented by the students and then applied to at least five large social network datasets. Thus, some programming has to be done. In the paper the different techniques are analyzed and compared in detail using extensive experiments. The paper, to be written in LaTeX, has to follow the format of an actual scientific paper, and should be improved until students and lecturer agree on the contents. Students will also give feedback on the paper produced by another team. Datasets can for example be found at SNAP, KONECT, LASAGNE and ASU.

Topics

Suggestions for topics are listed below, and you are encouraged to pick one of these topics (as a team). You are also allowed to suggest your own problem (topic) in the field of social network analysis, assuming you are have found at least three scientific papers that suggest different techniques or algorithms for solving this problem (so, addressing this same topic). Please choose a topic together with your team mate before September 26, and inform the lecturer via e-mail of your choice. Also feel free to suggest a free date slot for your presentation. Your topic choices will be processed on a first-come-first-serve basis.

[Chosen by team "TB"] Shortest path computation: present
"Fast shortest path distance estimation in large networks" [Pot09] and study
"Fast and accurate estimation of shortest paths in large graphs" [Gub10],
"A highway-centric labeling approach for answering distance queries on large sparse graphs" [Jin12] and perhaps
"Fast exact shortest-path distance queries on large networks by pruned landmark labeling" [Tak13]
[Chosen by team "PL"] Diameter computation: present
"Finding the Diameter in Real-World Graphs" [Cre10] and study
"Fast and Simple Approximation of the Diameter and Radius of a Graph" [Boy06],
"Fast computation of empirically tight bounds for the diameter of massive graphs" [Mag09] and perhaps
"Determining the diameter of small world networks" [Tak11]
[Chosen by team "RS"] Sampling methods: present
"Sampling from Large Graphs" [Les06] and study
"Walking in Facebook: A Case Study of Unbiased Sampling of OSNs" [Gjo2010] and
"Crawling Online Social Graphs" [Ye2010]
[Chosen by team "TL"] Graph compression/representation: present
"The webgraph framework I: compression techniques" [Bol2004] and study
"GPS: a graph processing system" [Sal2013],
"On compressing social networks" [Chi2009] and perhaps
"GraphChi: Large-Scale Graph Computation on Just a PC" [Kyr2012]
[Chosen by team "FS"] Closeness centrality: present
"Fast Approximation of Centrality" [Epp04] and study
"Centrality estimation in large networks" [Bra07] and
"Ranking of Closeness Centrality for Large-Scale Social Networks" [Kaz08]
[Chosen by team "KY"] Neighborhoods: present
"ANF: a fast and scalable tool for data mining in massive graphs" [Pal02] and study
"R-MAT: A Recursive Model for Graph Mining" [Cha02] and perhaps
"Fast Approximation of Centrality" [Epp04]
[Chosen by team "SR"] Virality and outbreak detection (1): present
"Efficient influence maximization in social networks" [Che09] and study
"A data-based approach to social influence maximization" [Goy11] and
"How to win friends and influence people, truthfully: influence maximization mechanisms for social networks" [Sin12]
[Chosen by team "DG"] Virality and outbreak detection (2): present
"Scalable influence maximization for prevalent viral marketing in large-scale social networks" [Che10] and study
"Competitive Influence Maximization in Social Networks" [Bha07],
"Social Network Sensors for Early Detection of Contagious Outbreaks [Chr10] and perhaps
"CELF++: optimizing the greedy algorithm for influence maximization in social networks" [Goy11]
[Chosen by team "ID"] Visualization algorithms: present
"Graph drawing by force-directed placement" [Fru91] and study
"An algorithm for drawing general undirected graphs" [Kam89] and
"Topology Preserving Constrained Graph Layout" [Tol09],
[Chosen by team "The Chairmen"] Link prediction: present
"Link Prediction using Supervised Learning" [Has06] and study
"Link Prediction in Relational Data" [Tas03] and
"New perspectives and methods in link prediction" [Lic10] and
[Chosen by team "EN"] Community detection: present
"An Algorithm for Partitioning the Nodes of a Graph" [Bar81] and study
"A clustering algorithm based on graph connectivity" [Har00],
"Finding and evaluating community structure in networks" [New04] and perhaps
"Community structure in social and biological networks" [Gir02]
[Chosen by team "OB"] Graph Partitioning: present
"A fast and high quality multilevel scheme for partitioning irregular graphs" [Kar06] and study
"An Efficient Heuristic Procedure for Partitioning Graphs" [Ker70] and
"Partitioning graphs into balanced components" [Kra09]

If you choose a topic, then you are also encouraged to look up additional papers in this problem domain to study and compare to the paper that you presented, as the papers to study are merely "simple" suggestions. For some topics, there exists an overview/review paper that discusses many more papers that study the same (type of) problem.

Scientific papers (ACM, Elsevier, etc.) can often only be opened from within the university domain (or from home via SSH/nxClient/VPN/RDP).
IEEE Explore papers can be opened by looking them up via computer.org. Alternative links and preprints of papers can often be found through Google Scholar.

Instructions for the course project paper »

Course description 2014-2015

This course deals with the computer science aspects of social network analysis. With topics such as big data and data science becoming more and more popular, the study of large datasets of networks (or graphs), is becoming increasingly important. Examples of such networks include webgraphs, communication and collaboration networks and perhaps most notably (online) social networks (such as Facebook and Twitter). With millions of nodes and possible billions of links, traditional graph algorithms are often too complex and unable to solve trivial algorithmic and data mining related problems. Typical tasks in this field include community detection, clustering, outlier detection, link prediction but also more fundamental problems such as efficient retrieval, storage, and compression of graph data and computational problems such as computing shortest paths and other descriptive graph properties. Also see the description in the e-Studyguide.