Visualize Your Site’s Link Graph With NodeXL

I would like to share a recent new tool finding that hopefully can help you to analyze your site’s linking with an easier graphic visualization.

I’m a very visual person and when I get a list like this of a site’s links from Google Webmaster Tools, Screaming Frog or any other crawler I struggle to get the information I usually want as fast as I need:

Screaming Frog InLinks

Don’t get me wrong: I love and happily pay for Screaming Frog and I’m thankful to have Google Webmaster Tools, but I wish to easily visualize the information I get from these tools to better analyze how the different site pages and areas are internally linked in my site for example, or which are the most externally linked to areas of my site and how they connect to each other with the information I can also get from tools like Open Site Explorer or Majestic SEO.

Because of this I’ve been looking for a solution for some time and I have recently found a good option to visualize a site’s linking data: NodeXL.

NodeXL (that stands for Network Overview, Discovery and Exploration for Excel) is an easy to use free open-source Excel template that generates a network graph from vertices and edges (pages and links in our case) data in Excel.

I’ve been playing a bit with it (since I just found it some days ago) but I think it has a great potential for our link analysis work and I would like to share the possibilities I’ve tried. Let’s start by downloading the template from this page:

You’ll quickly see that the bad news for Mac users (as myself) is that NodeXL only works under Windows XP, Vista and 7, and since it is an Excel template you’ll also need Office 2007 or 2010 of course, nonetheless you can also run it from a Mac with VMWare Fusion, Parallels or even as it is commented in this post: by creating a Amazon EC2’s micro instance with Windows XP and using the Microsoft Remote Desktop Client for Mac OS X.

After you have installed the template it will appear inside the “NodeXL” directory that can be found in Programs:

When you open it you’ll have the following screen with the Edges tab opened where you need to insert the Vertex 1 and Vertex 2 of the edges, in our case the links page’s source and destination. The right area of the Excel is where the graph is generated:

Let’s say I want to visualize my site internal linking structure. To do this I go to Screaming Frog‘s export option and get all of the Successful (2xx status code) URLs:

I import the CSV to Excel, use the comma delimiter to get the desired columns for the data:

Import Link Data to Excel from Screaming Frog

Then I copy / paste them (after a bit of pruning to keep only the desired pages) to the NodeXL template, inserting the data of the Source and Destination columns of the CSV to the Vertex 1 and 2 columns and I then just need to click on the “Show or Refresh Graph” option from the right area in the Excel… and voilà! now I have an internal link graph from my site:

NodeXL Graph

As you can see the default graph is a bit complicated to analyze but NodeXL provides many personalization options in the NodeXL tab that will let us customize it and get a clearer view from it, for example we can select if we want it to be a directed or undirected graph, the layout of the graph and then in the “prepare data” option we can count and merge duplicated edges too:

NodeXL: Merge Duplicate Edges

We can also go to the “Groups” option to better organize the graph according to our desired criteria:

NodeXL Group

And if we go to the Vertices tab from the Excel we can also start modifying their look and feel: color, opacity, shape, size and even label the desired vertices to easily identify them in the graph:

NodeXL Vertices

An interesting option is the one to add more graph metrics where we can select to show each vertex (in our case, our site pages) PageRank (measuring the importance of each one of them using the original algorithm):

NodeXL PageRank

By selecting these options we get the following more easy to understand, colored graph that we can also “browse” by selecting the desired vertices from the excel list and they will be highlighted in the graph showing how they are connected to each other. Also we will get their PageRank information:

NodeXL Graphic

If we really just want to see the connection from a specific set of vertices we can also select them from the list and go to the “Export” option and select to open a new NodeXL workbook with the selection:

Export NodeXL

By doing this we get this much clearer and smaller graph of the specifically desired areas we want to analyze, although by default again the graph is a bit grayish and need to be customized.

In this case I selected the “Autofill Columns” option where I chose to color each edge according to their weight (remember how I merged the duplicated edges at the beginning but also selected the “count edges and store the value in the edge weight column” option?):

NodeXL Home Graph

This is how we get this colored graph where we can identify the strength of each edge, or in our case, the number of links between the pages of the area we want to analyze. We can also select the “Subgraph Images” option to generate a subgraph for each vertex:

NodeXL Subgraph

That will generate the following small graphs from each vertex, that can be exported as images (as also the main graph of course):

NodeXL Subgraph

We can use NodeXL in this or many other ways to analyze our site incoming links to identify where we need to push more or our internal linking structure to see if we have a Web architecture problem, etc. … and not only for our site, our competitors too 😉

NodeXL also offers documentation (although is not that complete) and a GraphGallery, where by registering for free we can download many templates with impressive graphs:

NodeXL Graph Gallery

Nice isn’t it? I hope this is just the beginning so we can have much more and better ways to visualize our data soon and have a much easier SEO analysis work 😀

