Composites, multi-sourced files, multi-verse files, call them what you will, these data files are a hot topic in B2B marketing, despite the fact they’ve been around for years. But what are they and what benefits do they offer over traditional data files?
They sound simple. A multi-sourced file comprises several data sets from different sources, brought together in one unified file held in a database. But that’s where our understanding differs from what others consider a multi-sourced file to be, and creating such a file to maximise the benefits is a far from easy task.
Tackling integration
A multi-sourced file can be more than just a file made up of data from various sources. To be truly composite, the files need to be mapped and matched against each other and fully integrated using statistical rules and a measure of experience. Simply bringing together different data sources without integrating them does not create a composite file.
Integration is key. It requires the ability to handle huge amounts of information and is pivotal to creating coherence, quality, coverage and breadth of data that are the key benefits of a database powered by multi-sourced files.
Data integration itself throws up challenges, such as which data supplier to believe when information on a firm differs. When bringing together different data sources, it’s not unusual for an identical business record from two suppliers to contain conflicting information, such as different business addresses or employee numbers. Which record do you trust? It might seem like a headache, but in reality it represents an opportunity to enhance and improve the coherence and quality of the data record by applying specific rules developed through long experience on which data source to believe when there is disagreement.
The selection process
Each data supplier has particular areas of strength and some are stronger with regards to specific fields of information; for instance they are more likely to have the correct SIC code, so we would trust and use their data. On other occasions, for certain information fields we might go with the majority view. Or we might go with the minority rule as experience has taught us that a particular supplier has updated the information first. So in this way we also extract valuable information via cases of disagreement. The coherence this creates ensures a high quality of data that a single source file would never be able to match and that a non-integrated multi-source file would also fall short on.
The way in which multiple data sources are integrated is also important and we retain and utilise the history of data from our sources to maximise accuracy, coherence and consistency. This history of information also allows for new variables.
Another major benefit of the multi-source file is the coverage and breadth of data created as well as the volume of records generated. There isn’t a data supplier in the UK that manages to include all live companies. So to give up on one component supplier would mean giving up on thousands of records and impact on the coherence and cross-validation of the data.
Quality over quantity
But bigger is not always necessarily better and there is also value in smaller files that feed into a multi-sourced file. You might take information from more than one of the data owners with high coverage, but because that source is focused on generating high coverage, it cannot gather as much depth of information as a specific niche supplier would. This niche data can contain essential information for a particular study, market or acquisition strategy, so it is important not to be too focused on quantity.
Properly integrating data is time-consuming and expensive but does have a real benefit, creating an unrivalled observation platform on the business universe. Databases powered by true multi-source files give a technicolor view of the UK business landscape; with a single source non-integrated file you’ll just get a black and white snapshot. And data, as life, is much duller in black and white.