The JUNG (Java Universal Network/Graph) Framework

Technical Report UCI-ICS 03-17
School of Information and Computer Science
University of California, Irvine

Joshua O'Madadhain, Danyel Fisher, Scott White, and Yan-Biao Boey
School of Information and Computer Science
University of California, Irvine
Irvine, CA 92697-3425
jmadden@ics.uci.edu, danyelf@acm.org, scottw@eml.cc, yboey@uci.edu


Table of Contents


Introduction

Overview

JUNG is an open-source software library that provides a common and extendible language for the modeling, analysis, and visualization of data that can be represented as a graph or network. It is written in Java, which allows JUNG-based applications to make use of the extensive built-in capabilities of the Java API, as well as those of other existing third-party Java libraries.

The JUNG architecture is designed to support a variety of representations of entities and their relations, such as directed and undirected graphs, multi-modal graphs, graphs with parallel edges, and hypergraphs. It provides a mechanism for annotating graphs, entities, and relations with metadata. This facilitates the creation of analytic tools for complex data sets that can examine the relations between entities as well as the metadata attached to each entity and relation.

The current distribution of JUNG includes implementations of a number of algorithms from graph theory, data mining, and social network analysis, such as routines for clustering, decomposition, optimization, random graph generation, statistical analysis, and calculation of network distances, flows, and importance measures (centrality, PageRank, HITS, etc.).

JUNG also provides a visualization framework that makes it easy to construct tools for the interactive exploration of network data. Users can use one of the layout algorithms provided, or use the framework to create their own custom layouts. In addition, filtering mechanisms are provided which allow users to focus their attention, or their algorithms, on specific portions of the graph.

Related Work

JUNG was created out of a perceived need for a general, flexible, and powerful API for manipulating, analyzing, and visualizing graphs and networks. There exist several other tools for visualizing and manipulating networks, some of the more prominent of which are UCINET, Pajek, R, and GFC.

UCINet (http://www.analytictech.com/ucinet_5_description.htm) and Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek/) are stand-alone applications that each provide a number of tools for visualizing and analyzing networks. However, they cannot be conveniently addressed programmatically by other applications, which makes them not well-suited to process large numbers of graphs. Furthermore, they are tools rather than libraries, so users cannot write their own routines that take advantage of the capabilities of existing code.

R (http://www.r-project.org) is a programming language geared primarily towards the statistics community, which provides many advanced statistical routines. However, it doesn't have convenient access to the extensive Java API (for such functions as database connectivity, and Web support), and therefore it is difficult to build real-world applications on top of R. Furthermore, R does not currently provide native sparse graph data structures, which are necessary to write efficient algorithms for large networks, which are often found in real-world data sets.

GFC (http://www.alphaworks.ibm.com/tech/gfc) is a Java graph drawing-oriented API released by IBM. It is specific to using Java's AWT/Swing mechanisms for rendering, contains few graph manipulation algorithms, is no longer actively supported, and is not open-source. (In this, it is similar to a number of other network-related software libraries.)


Graphs, Vertices, and Edges

Basic Properties and Operations

Graphs, vertices, and edges each have several properties that can be extracted from them, and operations that they can perform (or have performed upon them). The operations listed below are all guaranteed to be defined and to behave as specified for all JUNG graphs, vertices, and edges. Depending on the specific type of graph, vertex, or edge, and on the implementation used, a given graph, vertex, or edge object may have other available properties and/or operations.

Graphs:

Vertices:

Edges:

Types

JUNG defines types using Java interfaces (which specify what methods any implementations of the interface must provide), abstract classes (which provide generalized skeletal implementations of the interfaces to speed the development of new implementations, but which cannot be instantiated by users), and implementation classes (which are what users create and use).

The graph package contains specifications (in the form of Java interfaces), at various levels of abstraction, for graphs, vertices, and edges.

Interfaces

The ArchetypeGraph, ArchetypeVertex, and ArchetypeEdge interfaces specify the behavior of generalized graphs, vertices, and edges; they are designed to encompass all types of graphs, including directed and undirected graphs, graphs with attached data (e.g., weighted edges), hypergraphs, and graphs with parallel edges. All graph, vertex, and edge implementations should implement the appropriate one of these interfaces (or an interface which inherits from these interfaces). The methods listed above are those available to objects which implement one of these interfaces.

The Graph, Vertex, and Edge interfaces inherit from the Archetype interfaces, and specify the behavior for (binary) graphs in which each edge connects exactly two vertices; this specialization allows a number of additional methods to be defined.

The Directed and interfaces specify the behavior and capabilities of directed graphs and edges. A DirectedEdge is a type of Edge which imposes an ordering on its incident vertices. DirectedGraph is a tagging interface for implementations of Graph whose edge set consists of implementations of DirectedEdge.

The UndirectedGraph and UndirectedEdge interfaces are the corresponding interfaces for undirected graphs and edges.

Abstract Classes

The AbstractSparseGraph, AbstractSparseVertex, and AbstractSparseEdge classes are designed for sparse graphs (ones in which the number of edges is only a few times as large as the number of vertices). They may not be the best implementations for representing and manipulating dense graphs (ones in which most vertices are connected to most other vertices).

Implementation Classes

The DirectedSparse{Graph, Edge, Vertex} and UndirectedSparse{Graph, Edge, Vertex} classes extend the Abstract classes for strictly directed and strictly undirected graphs; the graph and edge classes implement the DirectedGraph and DirectedEdge interfaces, respectively.

Creating and Adding

Creating a graph may be done in three ways. First, one can call the constructor for the desired type of graph:

	DirectedGraph g = new DirectedSparseGraph();

which creates a new directed sparse graph and assigns it to a variable of type DirectedGraph.

Second, one can also create a graph by reading it in from a file. Currently, JUNG can read simple Pajek and GraphML (http://graphml.graphdrawing.org/) files, and can write Pajek files.

Third, one can generate a graph algorithmically, either with a user-defined method, or with one of the classes that JUNG provides for creating random graphs.

Once you have created a graph, you can create vertices and add them to this graph:

	Vertex v1 = (Vertex) g.addVertex(new DirectedSparseVertex());
	Vertex v2 = (Vertex) g.addVertex(new DirectedSparseVertex());

and once you have vertices, you can connect them with edges:

	DirectedEdge e = (DirectedEdge) g.addEdge(new DirectedSparseEdge(v1, v2));

Note that creating vertices/edges and adding them to a graph are actually two different operations, which we combine here into a single line of code. The two-stage nature of this process makes it possible to create "orphaned" vertices/edges that are not part of a graph. This was done as a compromise between common practices in Java APIs regarding the side effects of constructors, and the semantics of graphs. However, the behavior of the JUNG edge and vertex methods, with the exception of getGraph(), is unspecified on orphaned vertices/edges. The JUNG Project implementations will never create orphaned vertices/edges, and we strongly recommend that users follow this practice by nesting the call to the vertex/edge constructor inside the call to the graph method that adds its argument to the graph (as in the examples above).

Some constraints to keep in mind:

If any of these constraints are violated, the error will be caught at runtime, and a FatalException will be thrown. These constraints are not guaranteed to be "fail-fast" (that is, violations may not be reported immediately), although several of them are fail-fast.

Copying and Equivalency

You can make a copy of a graph, or copy a vertex or edge from one graph (the original graph) to another graph (the target graph).

Copying a vertex or edge does three things:

Copying a graph does three things:

The following code creates a graph, creates two vertices and an edge and adds them to this graph, then copies each vertex and edge from the original graph to a new target graph.

	Graph original = new DirectedSparseGraph();
	Vertex v1_orig = original.addVertex(new DirectedSparseVertex());
	Vertex v2_orig = original.addVertex(new DirectedSparseVertex());
	DirectedEdge e_orig = original.addEdge(new DirectedSparseEdge(v1, v2));

	Graph target = new DirectedSparseGraph();
	Vertex v1_copy = v1.copy(target);
	Vertex v2_copy = v2.copy(target);
	DirectedEdge e_copy = e_orig.copy(target);

The vertices v1_copy and v2_copy are equivalent to the vertices v1_orig and v2_orig, respectively, and the edge e_copy is equivalent to the edge e_orig. Thus, for example, the statement

	v1_orig == v1_copy.getEquivalentVertex(original);

evaluates to true in the context of the code given above. Furthermore, as a convenience, the Java equals method has been implemented to respect this equivalence relation, so

	v1_orig.equals(v1_copy);

also evaluates to true.

There are some restrictions that govern when and where vertices and edges may be copied:

Removing Vertices and Edges

To remove a vertex or edge from a graph, call the appropriate removal method:

	g.removeEdge(e);
	g.removeVertex(v1);

Removing an edge from a graph will not affect any other part of the graph. Removing a vertex from a graph may cause the edges that are incident to that vertex to be removed if these edges would otherwise become ill-formed. (An ill-formed edge is one that is incident to the wrong number of vertices. In graphs where edges are defined to connect exactly two vertices, removing a vertex will result in the removal of all of its incident edges.)

Removing an element from a graph does not free the memory used by that object. (In fact, you can remove an element from a graph and then re-insert it in that graph or in a different graph). As with all Java programs, the Java garbage collector is responsible for freeing the memory for an object once it is no longer being used. Removing an element from a graph also does not remove it from any user data structures (discussed in the section entitled "User Data"); users are responsible for updating the user data as necessary.


User Data

Users can associate data with graphs, edges, or vertices in two ways: class extension and the built-in JUNG annotation mechanism.

Class Extension

Users can extend the classes provided so that they include the variables/properties (and methods for manipulating those fields) that the user desires. This mechanism is most appropriate for applications which are designed to operate on a specific data set, each of whose elements have known properties. For instance, a network representing a highway system might store, for each segment of highway between interchanges (i.e., edge), the length of that segment.

The ability to extend the JUNG classes is a feature of the Java language, and is not specific to JUNG. However, class extenders should note that the AbstractSparse classes use the Java Object.clone() method to copy Vertices, Edges, and Graphs; therefore, copies of such objects will be "shallow" copies, as defined by Java.

This sample code creates a class that extends DirectedSparseVertex and carries with it some data.

	class Person extends DirectedSparseVertex 
	{
    	    private String name;
    	    private List publications;     

	    public Person( String name, List publications ) 
	    {
       		this.name = name;
	        this.publications = publications;
    	    }

	    public List getPublications() { return publications; }
	}

User Data Repositories

JUNG provides a built-in mechanism, the UserData class, for annotating graph elements with data. This mechanism is most appropriate for handling data which is either temporary or idiosyncratic (i.e., data which not every graph element of that type will have or need).

Each of the JUNG graph, vertex, and edge implementations extends UserData, which provides the following operations:

(The purpose and semantics of copy actions are discussed in the section below entitled Copying User Data.)

Here is a simple example of how data may be stored, accessed, modified, and removed using the user data repositories:

	Vertex v = (Vertex) g.addVertex(new DirectedSparseVertex());
	Vertex w = (Vertex) g.addVertex(new DirectedSparseVertex());
	String name_key = "name";
	String current_address_key = "address";
	String current_student_key = "student";
	v.addUserDatum(name_key, "Carl Jung", UserData.SHARED);
	w.addUserDatum(name_key, "Sigmund Freud", UserData.SHARED);
	v.addUserDatum(current_address_key, "Vienna, Austria", UserData.SHARED);
	v.addUserDatum(current_student_key, w, UserData.REMOVE);  // Freud is a student of Jung
	...
	String v_name = v.getUserDatum(namekey);
	v.setUserDatum(current_address_key, "Basel, Switzerland", UserData.SHARED);
	v.removeUserDatum(current_student_key);  // Freud is now no longer Jung's student

This example shows that userdata can contain any Java object, including other vertices.

Copying User Data

When a graph element a is copied (with the copy method), the newly created element b calls importUserData(a), which attempts to copy each of the objects in a's user data repository to b's user data repository. The behavior of each such copy attempt will depend on the copy action that was specified when the corresponding user data element was created.

The interface UserDataContainer contains an interface called CopyAction, which consists of a single method signature, onCopy(value, source, target). importUserData(a) retrieves the copy action (which is an implementation of CopyAction) for each element in a's user data repository. This copy action then calls onCopy(datum, a, b), and based on the result, decides what to do with the specified datum.

JUNG provides three different implementations of CopyAction: UserData.CLONE, UserData.REMOVE, and UserData.SHARED.

UserData.CLONE's version of onCopy() returns a copy of the user datum, as defined by the Java clone() method; importUserData then places this copy in the target graph element's user data repository. This clone is completely independent of the original. (If the user datum does not support the clone() method, onCopy will throw the Java CloneNotSupportedException.)

UserData.SHARED's version of onCopy() returns a reference to the original user datum; importUserData then places this reference in the target graph element's user data repository. Thus, any changes to this user datum that are made by one of the graph elements that share this user datum will be reflected in all such graph elements.

UserData.REMOVE's version of onCopy() returns null; that is, user data that is created with this copy action will not be copied by the copy() method.

Decorators, Indexers, and Labellers

JUNG includes a few classes that show how the user data repositories may be used in a structured fashion; two of these classes are Indexer and StringLabeller.

An Indexer contains methods that create a mapping between the vertices of a graph and the integers {0, 1, ... n-1} (where n is the number of vertices in the graph). It provides mechanisms to get the index of a given vertex (getIndex(v)) and to get the vertex with a specified index (getVertex(i)). Among other things, Indexer thus makes it convenient to arrange a set of vertices in an array, using each vertex's index as an index into the array.

A StringLabeller is similar to an Indexer; it provides facilities for fetching vertices given strings (labels) and vice versa. However, the labels are user-defined and thus need not follow any particular pattern. Vertices that have not been labelled simply will not be accessible by the indexer.


Filtering

The JUNG filtering mechanism removes selected vertices and edges from input graphs, and returns new graphs. These new graphs are copies of the original, containing all the same vertices and edges except for those that have been removed. A Filter takes in a Graph, and returns an UnassembledGraph.

An UnassembledGraph is a temporary storage mechanism for vertices and edges: it holds all the vertices (and at least all the edges) that will be placed into the final, filtered graph. In some circumstances, just knowing which vertices pass the filter is sufficient; this information can be accessed directly from the UnassembledGraph with the calls getUntouchedEdges() and getUntouchedVertices(), which return the set of edges that passed the filter, and the set of vertices that passed the filter, respectively. However, most of the time, one wants to access the new graph that passes the filter; this is done with the UnassembledGraph method called assemble(), which builds the new graph. assemble() copies every vertex that passed the filter into the new graph, and then copies each edge that passed the original filter into the new graph if both of its incident vertices also passed the filter (thus ensuring that the resulting graph is well-formed). Note that this means that some edges returned by getUntouchedEdges() will not be copied into the new graph.

assemble() can be slow, so it is sometimes desirable to string together several filters in a row, and not call assemble until the last Filter has been run. This is done by creating a filter that implements the EfficientFilter interface. An EfficientFilter is a type of Filter that can filter an UnassembledGraph, and return another UnassembledGraph. A filter which examines structural properties of graphs is probably not appropriate to implement as an EfficientFilter, because UnassembledGraphs may contain incorrect topology information (in particular, as noted above, the edge set may include some ill-formed edges). It is the responsibility of the user to determine whether a given filtering mechanism can be implemented as an EfficientFilter.

While a user can write a custom filter merely by implementing the interface, it is often easiest to extend one of the two provided base Filter classes, VertexAcceptFilter and EdgeAcceptFilter. Both of these require the user to write a method--boolean acceptVertex(vertex) or boolean acceptEdge(edge), respectively. By default, these are not declared to be EfficientFilters; however, users may certainly create extensions of these filters that are EfficientFilters.

The SerialFilter mechanism applies a series of filters sequentially to a specified graph, in the order in which they were added to the SerialFilter. As the filters are applied, it checks to see whether each one is an EfficientFilter, and calls assemble as necessary.

The LevelFilter interface was designed to be used in conjunction with the GraphDraw mechanism (described in the section on visualization). LevelFilters are filters that take an integer parameter, which is used to determine the operation of the filter (for instance, filtering all edges with weight less than the value of this parameter). With a LevelFilter, a slider on a visualization can be tied directly into the Filter, and thus can allow the user to control this parameter directly, and generate a dynamically changing graph.


Visualization

JUNG provides mechanisms for laying out and rendering graphs. The current renderer implementations use the Java Swing API to display graphs, but they may be implemented using other toolkits.

In general, a visualization is accomplished with

Thus, by selecting one of each of these three, it is possible to coordinate drawing. The default implementation traverses the Layout, asking it for locations of vertices, and then paints them individually with the Renderer inside the Swing component. In addition, the GraphDraw infrastructure simplifies many of these transformations by packaging the VisualizationViewer, the Renderer, and the Layout together. Users may then customize this viewer as appropriate. (Sample code is available in the GraphDraw documentation.)

JUNG also includes utilities and support classes that facilitate customization of a graph visualization. For instance, FadingVertexLayout provides a mechanism that can be used to create fading effects when vertices are filtered out and subsequently restored.


Algorithms

JUNG provides several different categories of different graph and network algorithms. A selection of them is listed here.

Clustering

A cluster is a collection of objects that are all similar to each other in some way. In a network, similarity is often based on topological properties such as connectivity, but can also be based on the properties of vertices or edges in the network. Clustering algorithms provided by JUNG include EdgeBetweennessClusterer, which computes clusters for a graph based on the betweenness property of the edges, and WeakComponentClusterer, which finds all weak components in a given graph, where a weak component is defined as a maximal weakly connected subgraph of that graph.

Topology, Paths, and Flows

These algorithms perform operations on (and calculate properties of) graphs that relate to the graph's topology (that is, the structures and substructures formed by the ways that the vertices are linked together by edges). Topological algorithms that JUNG provides include BFSDistanceLabeler, which labels each vertex in a graph with the length of the shortest unweighted path from a specified vertex in that graph; KNeighborhoodExtractor, which returns the subgraph of a graph whose vertices are separated by no more than k edges from a specified vertex; EdmondsKarpMaxFlow, which labels each edge in a directed, edge-weighted graph with the flow along that edge which is consistent with the maximum flow for the graph; and DijkstraShortestPath, which calculates the length of the shortest weighted path from a specified vertex to that of each vertex in that vertex's graph.

Importance

Network importance algorithms measure the importance of each vertex (or edge) according to a set of criteria that is usually based on the positioning of the vertex/edge relative to the rest of the graph.

Some of the provided algorithms assume that they are given a Markov network: a directed weighted graph in which the vertices represent states, the edges represent possible state transitions, and the edge weights represent transition probabilities. The stationary probability for a vertex v in such a network is the limiting probability that, given an arbitrary starting state and a large number of transitions, the current state will be that of v.

Importance-based algorithms that JUNG provides include BetweennessCentrality, which labels each vertex and edge in a graph with a value that is derived from the number of shortest paths that pass through them; PageRankWithPriors, which ranks each vertex in a modified Markov network according to its stationary probability, relative to a specified set of root vertices; HITS, which ranks each vertex in a graph according to the "hubs-and-authorities" importance measures; and KStepMarkov, which ranks each vertex according to a fast approximation of the PageRankWithPriors algorithm.

Statistics

JUNG provides several classes that analyze graphs and calculate various statistical measures on them, including DegreeDistributions and GraphStatistics.


Future Work

The first JUNG release provided many of the tools and elements that are most commonly required for writing software that manipulates, analyzes, and visualizes graphs. Future releases are expected to include the following features, several of which are currently under development. These features should significantly expand the set of available tools and enhance users' abilities to write robust code.


Acknowledgements

The authors of JUNG wish to thank their research advisers (Padhraic Smyth and Paul Dourish) for their support during this project, as it evolved from a few weeks' project in support of other research into a full open-source development effort which has lasted several months. We would also like to thank Sourceforge (http://sourceforge.net) for their hosting of this project, and IBM for providing the Eclipse (http://www.eclipse.org) IDE for Java; these free services and tools allowed us to concentrate on development rather than infrastructure. This material is based upon work that was supported in part by the National Science Foundation under Grant No. IIS-0083489 and by the Knowledge Discovery and Dissemination (KD-D) Program.


References

Vlado Batagelj and Andrej Mrvar. Pajek: Program for Large Network Analysis. http://vlado.fmf.uni-lj.si/pub/networks/pajek/

Steve Borgatti, Martin Everett and Lin Freeman. UCINET: Software for Social Network Analysis. http://www.analytictech.com/ucinet_5_description.htm

Ulrik Brandes, Markus Eiglsperger, Michael Kaufmann, and Jürgen Lerner. The GraphML File Format. http://graphml.graphdrawing.org/

Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis G. Tollis. Graph Drawing: Algorithms for the Visualization of Graphs. Prentice Hall, 1999.

Edsger W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik 1 (1959), 269-271.

Jack Edmonds and Richard Karp. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM, 19(2):248-264, April 1972.

Robert Gentleman and Ross Ihaka. The R Project for Statistical Computing. http://www.r-project.org

Michelle Girvan and Mark Newman. Community structure in social and biological networks. http://www.santafe.edu/sfi/publications/wpabstract/200112077

IBM Corporation. GFC: Graph Foundation Classes for Java. http://www.alphaworks.ibm.com/tech/gfc

Jon Kleinberg. Authoritative sources in a hyperlinked environment. www.cs.cornell.edu/home/kleinber/auth.ps

Mark Newman. Who is the best connected scientist? A study of scientific coauthorship networks. http://www.santafe.edu/sfi/publications/wpabstract/200012064

Joshua O'Madadhain, Danyel Fisher, Scott White, and Yan-Biao Boey. JUNG: The Java Universal Network/Graph Framework. http://jung.sourceforge.net

Sun Microsystems. The Java Platform. http://java.sun.com

Scott White and Padhraic Smyth. Algorithms for discovering relative importance in graphs. Proceedings of Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington D.C., 2003