Gary Angel, President of digital measurement and data analytics consulting firm Semphonic (www.semphonic.com), explains that the topographic nature of websites prevents basic statistical techniques from working well in the digital realm. He identifies methods for solving the unique problems that digital data and website structure present, and highlights techniques which provide a completely new set of opportunities for effectively measuring, analysing, and optimising your digital properties to drive better online performance.
The Problem with Statistical Analysis in Web Analytics
The practice of Digital Analytics is built on a few simple, largely unquestioned assumptions. The first of these assumptions is “intentionality”. When a visitor looks at a page about a topic or product, we assume they have an interest in the product or topic – that the behavior was intentional. The second key assumption is “influence”. When a visitor views a page and then subsequently does something we consider a success, we assume that the earlier page view influenced the subsequent action. These two assumptions – intention and influence – are so deeply ingrained in Web analytics practice that we hardly ever even bother to think about them.
That’s dangerous, because the way visitors traverse a website is controlled, to some extent, by the options and pathways provided. Like a magician “forcing” the pick of a card, we exert significant control over where visitors go and how they get to key locations by the way we structure the website. So when we infer “intention” or “influence”, we may actually only be measuring our own little sleights of hand.
Think about it this way: websites are very much like city streets. Some pathways are big and broad, others small and narrow. Often, there’s no direct way to get from Point A to Point B. No analyst would ever be foolish enough to think that a straightforward correlation model would work for analysing city traffic. Yet, surprisingly, many have made exactly that same mistake when it comes to websites.
Basic statistical analysis techniques aren’t designed to handle data sets where the data is topographically arranged – and the structure of websites creates a deep topology to web data.
Simple correlation analysis, for example, does nothing to separate out the impact of site structure. So pages that are closely related navigationally are almost always highly correlated. This makes it impossible to interpret true intention of users or the true influence of pages and, is therefore, almost completely useless.
Creating a Topographical Analysis
So any real analysis of visitor behaviour will have to take account of topology before it will be possible to measure correlation and infer intentionality or influence. In effect, you have to remove your sleight of hand from the equation.
One of the easiest ways to do this is to create a logical model of the site (rather like a sitemap) and then count distances between nodes in the hierarchy. We call this a topographical design. Even better, a behavioural topology model can be built showing how users actually navigate the Website and from this model, distance between nodes can be calculated based either on the distance in the tree or the actual number of average clicks between points.
Creation of a behavioral topology model is truly a foundational project in digital analytics. Without it, every analysis you do of your Website is likely to be deeply flawed. With a behavioral topology come numerous new analytic opportunities that few Web analysts have explored. These models also open up the opportunity to use classic statistical analysis techniques more fully. By creating objective measures of distance and a true topography of the website, these models make it possible to look at the relationship between content and outcome on the website while controlling for the site’s inherent structure.