Social Network Analysis for Computer Scientists

This is the old course page! See the new page for recent information!

News:
- The textual comments from the 2015 evaluation forms are available here.
- All work has been graded. The final grades have been sent to the student administration. You can also come pick up your grade until Friday December 18 in room 157b.

Course information

Lectures: Fridays from 11:15 to 13:00 in the fall semester of 2015 in Snellius room ~~403~~ 402
Lab sessions (not every week): Fridays from 9:00 to 10:45 in Snellius room 302/304
Lecturer: dr. Frank Takes - f.w.takes@liacs.leidenuniv.nl, Snellius room 157b
Student assistant: Anna Latour - a.l.d.latour@umail.leidenuniv.nl
Prerequisites: basic knowledge of Algorithms and Data Mining
Literature: provided papers and perhaps some free textbooks
Examination: based on presentation, paper, programming, peer review and participation (no exam)
Study points: 6 ECTS

Network with 1458 nodes and 1948 edges.

Course schedule

	Date	Lab session	Topic
1.	Sep 4, 2015	No lab session	Lecture 1: Introduction and small world phenomenon
2.	Sep 11, 2015	Gephi & Assignment 1	Lecture 2: Network topology
3.	Sep 18, 2015	NetworkX & Assignment 1	Lecture 3: Distance-based centrality measures Presentation 0: Determining the diameter of small world networks
4.	Sep 25, 2015	Work on Assignment 1	Lecture 4: Web structure and propagation-based centrality
5.	Oct 2, 2015	Work on Assignment 2	Deadline for Assignment 1 Lecture 5: Network evolution and Community detection
	Oct 9, 2015	No lab session	No lecture
6.	Oct 16, 2015	Work on Assignment 2 Presentation 1: Virality and Outbreak Detection (1) (Team: NetEx)	Presentation 2: Sampling Methods (Team: NS) Presentation 3: Shortest Paths (1) (Team: /)
7.	Oct 23, 2015	Work on Assignment 2 ~~Presentation 5: Community Detection (Team: Landscape)~~	Presentation 4: Personalized PageRank (Team: Team 0) Presentation 6: Shortest Paths (2) (Team: WY)
8.	Oct 30, 2015	Presentation 7: Neighborhoods (Team: Omega) Presentation 8: Betweenness Centrality (Team: :(){:\|&;:)	Guest Lecture by André Vermeij (Kenedict Analytics)
8.	~~Oct 30~~ Nov 2, 2015		Deadline for Assignment 2
9.	Nov 6, 2015	Presentation 9: Graph Partitioning (Team: m=1) Presentation 10: Link Prediction (Team: HJ)	On-site code review in teams
10.	Nov 13, 2015	Presentation 11: Visualization (Team: n=2) Presentation 12: Graph Compression (Team: Team 2)	On-site peer review in teams
	Nov 20, 2015	No lab session	No lecture
	Nov 20 23, 2015		Optional deadline for intermediary paper "check"
11.	Nov 27, 2015	Presentation 13: Virality & Outbreak Detection (2) (Team: DC) Presentation 14: Closeness Centrality (Team: "Group")	Presentation 15: Diameter (Team: Team 1) Presentation 16: Link prediction in signed networks (Team: "Homework")
	Dec 4 11, 2015		Deadline for course project paper
12.	Dec 4, 2015	Course evaluation sessions (according to schedule)
13.	Dec 11, 2015	Course evaluation sessions (according to schedule)

There is no lecture on October 9 and no lecture on November 20.

Assignments

About Assignment 2

- Assignment 2 asks you to apply some centrality measures. You should motivate how you apply these measures: to the giant component or the full network, and to the directed or undirected version of the network.
- Assignment 2 can also be handed in via Dropbox

About Assignment 1

- Assignment 1 datafiles are also available via http due to ongoing share access issues: medium.in and large.in (use Save-As option in your browser!).
- A clarification: exercise 3.4 should be read as "Judging from the datafile, could this network be undirected?" If you interpreted it differently, just write down your assumptions to avoid any confusion.
- If you want, you can reuse the LaTeX/TikZ source of the graph in part 2 of Assignment 1.
- Assignments can also be handed in via Dropbox (for example, when you have large visualization PDFs that cannot be e-mailed).

Lab session Friday September 18 in room 302/304

We will again use the UNIX/Linux workstations in Snellius room 302/304 and 305 for this lab session.
There is no deliverable for this lab session. The main goal of this lab session is to become familiar with NetworkX (a Python package to analyze networks for research purposes).
All relevant information on NetworkX can be found in the NetworkX online documentation.

Take some time to do the NetworkX tutorial.
Try to load the network from the first lab session (/vol/share/groups/liacs/scratch/SNACS/small-gephiready.in) into NetworkX, and investigate some characteristic properties of the network, such as the degree distribution.
Generate a visualization of this network with NetworkX, and output it to pdf. Play around with the visualization parameters.
Get the Epinions network from SNAP. Can you visualize it?
Compute the betweenness centrality (a measure of node centrality/importance, as we will learn later) and visualize the result by making the size of a node dependent on this measure.
What are the differences between NetworkX and Gephi in terms of visualization and analysis capabilities?
Proceed with Exercise 3 of Assignment 1.

Installation issues: it turned out that many machines unexpectedly did not have NetworkX installed. On the student workstations, this can be solved by installing it in a virtual environment or adding the NetworkX source folder to your homedirectory. The latter may again result in all kinds of issues. In such cases, it may be nice to try Anaconda.

Alternative tool: check out Graph-Tool, a python graph analysis toolkit that leaves the hard computation to parallel (OpenMP) C++ code.

Lab session Friday September 11 in room 302/304

We will use the UNIX/Linux workstations in Snellius room 302/304 for this lab session. If you really must, you can attempt to use Windows, fully at your own risk (you will at least need to edit gephi.conf so that it points to the correct Windows JRE version 6 or 7 (and not 8)). Some information on the workstations is provided by the ISSC, see their pages on the workstations, SSH access and ICT FAQ on remote access to Linux systems with a configuration similar to the workstations.
There is no deliverable for this lab session. The main goal of this lab session is to become familiar with Gephi (experimental beta-software to visualize networks for research purposes) and its input format.

Extract and run Gephi locally on a student workstation (running it from your homedirectory may be slow or not work at all), for example on the local disk in /scratch/gephi. Installation instructions, if needed, can be found at the Gephi website.
Take some time to do the three Gephi tutorials at the Gephi website:
- Quick start,
- Visualization and
- Layouts
Try out some of the Gephi datasets.
Visualize the network represented by the edge list /vol/share/groups/liacs/scratch/SNACS/small-gephiready.in
Play around with some different visualization algorithms such as ForceAtlas 2 and Fruchterman Reingold.
Compute the betweenness centrality (a measure of node centrality/importance, as we will learn later) of each node via the "Network Diameter" button, and then use the "Ranking" tab on the left side to visualize the node's betweenness values and degree values using node color and node size.
Make a new, very simple, edge list input file of a network with say 10 nodes and 20 edges, and import it into Gephi. Now add to the input file (in some order) link direction, link weights, node labels and link labels. Make sure that you are able understand the difference in file input and visualization output.
Get started with Exercise 3 of Assignment 1.

Course project

Teams work on a course project for 50% of the course grade. For this project they pick one of the topics listed below. Each topic has an associated list of papers which have to be studied and compared. One paper has to be presented. So the project consists of:

Giving one 30-minute presentation (+15 minutes for questions) on one of the papers (40% of the project grade). At least a Powerpoint/PDF presentation and some demonstration of an implementation or visualization will be required. Students are also expected to provide feedback on some of the presentations given by their fellow students during the lectures.
Writing one 6 to 10 page paper (60% of the project grade). Teams study papers that consider algorithms for solving the same problem as discussed in the presentation. The algorithms should be implemented by the students and then applied to at least five large social network datasets. Thus, some programming has to be done. In the paper the different techniques are analyzed and compared in detail using extensive experiments. The paper, to be written in LaTeX, has to follow the format of an actual scientific paper, and should be improved until students and lecturer agree on the contents. Students will also give feedback on the paper produced by another team. Datasets can for example be found at SNAP, KONECT, LASAGNE and ASU.

Topics

Suggestions for topics are listed below, and you are encouraged to pick one of these topics (as a team). You are also allowed to suggest your own problem (topic) in the field of social network analysis, assuming you are have found at least three scientific papers that suggest different techniques or algorithms for solving this problem (so, addressing this same topic). Please choose a topic together with your team mate before September 25, and inform the lecturer via e-mail of your choice. Also feel free to suggest a free date slot for your presentation. Your topic choices will be processed on a first-come-first-serve basis.

If you choose a topic, then you are also encouraged to look up additional papers in this problem domain to study and compare to the paper that you presented, as the papers to study are merely "simple" suggestions. For some topics, there exists an overview/review paper that discusses many more papers that study the same (type of) problem.

Scientific papers (ACM, Elsevier, etc.) can often only be opened from within the university domain (or from home via SSH/nxClient/VPN/etc.).
IEEE Explore papers can be opened by looking them up via computer.org. Alternative links and preprints of papers can often be found through Google Scholar.

Instructions for the course project paper »

Project Topics

(Team: /) Shortest paths (1) :
"Fast shortest path distance estimation in large networks" [Pot09]
"Fast and accurate estimation of shortest paths in large graphs" [Gub10],
"A highway-centric labeling approach for answering distance queries on large sparse graphs" [Jin12] and perhaps
"Fast exact shortest-path distance queries on large networks by pruned landmark labeling" [Tak13]
(Team: WY) Shortest paths (2) :
"IS-Label: an independent-set based labeling scheme for point-to-point distance querying" [Wai13]
"Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme" [Mia12],
"Orion: Shortest Path Estimation for Large Social Graphs" [Zha10]
(Team: Team 1) Diameter computation:
"Finding the Diameter in Real-World Graphs" [Cre10]
"Fast and Simple Approximation of the Diameter and Radius of a Graph" [Boy06],
"Fast computation of empirically tight bounds for the diameter of massive graphs" [Mag09] and perhaps
"Determining the diameter of small world networks" [Tak11]
(Team: NS) Sampling methods:
"Sampling from Large Graphs" [Les06]
"Walking in Facebook: A Case Study of Unbiased Sampling of OSNs" [Gjo2010] and
"Crawling Online Social Graphs" [Ye2010]
(Team: Team 2) Graph compression/representation:
"The webgraph framework I: compression techniques" [Bol2004]
"GPS: a graph processing system" [Sal2013],
"On compressing social networks" [Chi2009] and perhaps
"GraphChi: Large-Scale Graph Computation on Just a PC" [Kyr2012]
(Team: :(){:|&;:) Betweenness centrality:
"A measure of betweenness centrality based on random walks" [New05]
"Approximating Betweenness Centrality" [Bad07],
"Better Approximation of Betweenness Centrality" [Gei08] and perhaps
"A faster algorithm for betweenness centrality" [Bran01]
(Team: "Group") Closeness centrality:
"Fast Approximation of Centrality" [Epp04]
"Centrality estimation in large networks" [Bra07] and
"Ranking of Closeness Centrality for Large-Scale Social Networks" [Kaz08]
(Team: Omega) Neighborhoods:
"ANF: a fast and scalable tool for data mining in massive graphs" [Pal02]
"R-MAT: A Recursive Model for Graph Mining" [Cha02] and perhaps
"Fast Approximation of Centrality" [Epp04]
(Team: NetEx) Virality and outbreak detection (1):
"Efficient influence maximization in social networks" [Che09]
"A data-based approach to social influence maximization" [Goy11] and
"How to win friends and influence people, truthfully: influence maximization mechanisms for social networks" [Sin12]
(Team: DC) Virality and outbreak detection (2):
"Scalable influence maximization for prevalent viral marketing in large-scale social networks" [Che10]
"Competitive Influence Maximization in Social Networks" [Bha07],
"Social Network Sensors for Early Detection of Contagious Outbreaks [Chr10] and perhaps
"CELF++: optimizing the greedy algorithm for influence maximization in social networks" [Goy11]
(Team: Team 0) Personalized PageRank:
"Quick Detection of Top-k Personalized PageRank Lists" [Avr11]
"Fast algorithms for Top-k personalized PageRank queries" [Gup08],
"Fast incremental and personalized PageRank" [Bah10] and perhaps
(Team: n=2) Visualization algorithms:
"Graph drawing by force-directed placement" [Fru91]
"An algorithm for drawing general undirected graphs" [Kam89] and
"Topology Preserving Constrained Graph Layout" [Tol09],
(Team: HJ) Link prediction:
"Link Prediction using Supervised Learning" [Has06]
"Link Prediction in Relational Data" [Tas03] and
"New perspectives and methods in link prediction" [Lic10]
(Team: "Homework") Link prediction in signed networks:
"Predicting positive and negative links in online social networks" [Les10]
"Predicting Trust and Distrust in Social Networks " [DuB11] and
" Friend or frenemy?: predicting signed ties in social networks " [Yan12]
(Team: Landscape) Community detection:
"An Algorithm for Partitioning the Nodes of a Graph" [Bar81]
"A clustering algorithm based on graph connectivity" [Har00],
"Finding and evaluating community structure in networks" [New04] and perhaps
"Community structure in social and biological networks" [Gir02]
(Team: m=1) Graph Partitioning:
"A fast and high quality multilevel scheme for partitioning irregular graphs" [Kar06]
"An Efficient Heuristic Procedure for Partitioning Graphs" [Ker70] and
"Partitioning graphs into balanced components" [Kra09]

Course description

This course deals with the computer science aspects of social network analysis. With topics such as big data and data science becoming more and more popular, the study of large datasets of networks (or graphs), is becoming increasingly important. Examples of such networks include webgraphs, communication and collaboration networks and perhaps most notably (online) social networks (such as Facebook and Twitter). With millions of nodes and possible billions of links, traditional graph algorithms are often too complex and unable to solve trivial algorithmic and data mining related problems. Typical tasks in this field include community detection, clustering, outlier detection, link prediction but also more fundamental problems such as efficient retrieval, storage, and compression of graph data and computational problems such as computing shortest paths and other descriptive graph properties. Also see the description in the e-Studyguide.

Topics include: SNA from a CS perspective (graph representation, complexity issues, examples), Graph Structure (power law, small world phenomenon, clustering coefficient, hierarchies), Paths and Distances (neighborhoods, radius, diameter), Spidering and Sampling (BFS, forest fire, random walks), Graph Compression (graph grammars, bitwise tricks, encryption, hashing), Centrality (degree centrality, closeness centrality, betweenness centrality, rating and ranking), Centrality and Webgraphs (HITS, PageRank, structure of the web), Community Detection (spectral clustering, modularity), Visualization (force-based algorithms, Gephi, NodeXL), Graph Models (random graphs, preferential attachment), Link Prediction (structure, semantics, prediction algorithms, graph mining), Contagion (diffusion of information, spreading activation, gossipping) and Privacy and Anonymity ((de-)anonymizing graphs, ethical aspects, privacy issues).

The course was also given in 2014, see SNACS 2014