How to Conduct a Link Audit
With so many websites being stung by Google’s Penguin and Panda algorithms of late, as well as a host of others receiving manual penalties from Google’s webspam team, we thought we’d shed a little light on link audits.
Think of a link audit as like spring cleaning for your link portfolio. Whilst very few of us relish the idea of donning our metaphorical marigolds and getting the mop and bucket out to clean up many years’ worth of links, it really is a necessity as long as Google keeps updating its algorithms and penalising poor quality links. Getting on top of the issue now, could save you a lot of pain down the road.
Stage 1: Google Webmaster Tools
Google Web Master Tools (GWT) will be your first port of call as it allows you to download three reports straight into Excel or Google Docs that show you a selection of the links that Google has found for your site. These reports can be found by logging into GWT and clicking on Search Traffic > Links to your Site and then the “More” option underneath the “Who links the most” (click on “Download this table” to export). This will show you root domains ordered by those that are linking to you the most. You also need to click on “Download more sample links” and “Download latest links” to export the other two link reports from GWT.
Stage 2: Other Third Party Link Reports
GWT is by no means an exhaustive source for link, so if you don’t have one already you need to sign up for a free Majestic account (which you can do for your own domain so long as you are able verify your own account). Majestic has a very extensive data set and will allow you to pull off two more reports on your historic links and recently found links. If you have an account with Moz, Ahrefs, or any other tool you should pull link report off from these.
Stage 3: Pulling it all Together
Next thing to do is to mash all the reports together into one big spreadsheet and delete the duplicates to make it manageable. I’d recommend using Excel over Google Docs but whatever you find easiest is fine. Once you’ve de-duplicated by link URL, you need to make sure you can see how many links you've got coming from each root domain.
I’d highly recommend working from a list of domains and not individual URLs when conducting your link audit as you’ll find hundreds or even thousands of individual links coming from a single web site. Being able to see the number of links form each root domain is important though as it allows you to quickly see site-wide links, which Google doesn’t particularly like and often come from spammy sites or when you’ve someone’s linked to your site in the footer of every page (which can often be a web design issue).
Pulling out root domains can involve a pretty complex formula in Excel so I’d recommend downloading the excellent URL Tools which gives you a range of simple formulas that can quickly pull out subdomains and root domains from a list of URLs. Another method is to use Ontolo’s online tool to remove duplicate hostnames.
Stage 4: Link and Anchor Text Distribution
One of the first things you should be able to see when you’ve built your link report and condensed it to just the root domains is the anchor text distribution as well as the distribution of pages on your site being linked to. With regard to anchor text distribution you need to be looking for a good spread of keywords across a range of long and short tail keyphrases. If all your links are targeting just 2 or 3 highly commercial keywords then this could potentially be a problem.
It’s rare that the linked to pages on your site will be evenly distributed. Whilst it’s normal to have more links to your homepage than to a blog post you wrote two years ago, you should still be looking out for excessive linking to a single page, especially if that page is an obscure inner page in your site.
Stage 5: Identifying Bad Links
Identifying toxic links is partly a numbers game and partly a case of using your own judgement. First thing to do is to filter on the top level domains (TLD) and get rid of (or at the very least scrutinise) links coming from TLDs that would be unlikely to link naturally to you like .ru or .ck or .xxx. These are often a sign of spammy sites or scraped content. If your site or business is confined to one country or audience then you have to ask yourself the question of why someone in another country would be linking to you (possibly it could be completely legitimate but often it’s a sign of a spammy link). The same goes for directory links, unless they’re from reputable sites.
The main rule when identifying bad links from good is to remember that it’s not necessarily about site quality but whether the link is natural or not. On the whole, it should be quite easy to spot 'fake' or ‘toxic’ links as they just don't look right, whereas you may find a lot of perfectly natural looking links on terrible sites, which you can just leave as they are. The identification of bad links is deserving of another article all of itself but bear in mind that it’s very easy to over scrutinise each link, which isn’t a good idea when you’ve got thousands and thousands to get through.
Stage 6: Removing Bad Links
Yet again, this is a subject that probably deserves an article all of its own. The removal of bad links is the final stage of your link audit and should be conducted properly by emailing the owners / webmasters of each offending site and politely asking that the link be removed. One of the important things to remember here is to make a record of all your emails and requests, as you may need to present this to Google if you are ever hit by a future penalty. It’s good practice to do this as well.
Once you’ve emailed two or three times then it’s safe to assume you won’t be getting a response and then you can resort to using the Google Disavow Tool.
There are many articles and resources looking at link audits you can find online. One of my favourites is this one from Chuck Price writing in Search Engine Land.