The world’s greatest detective is the ultimate data Archetype – the science of deduction
As a member of the Sherlock Holmes Society of London, a devotee of the Victorian era, I have spent countless hours reading and re-reading Sir Arthur Conan Doyle’s books. Since his initial appearance in 1887 in the publication of A Study in Scarlet was an immediate success and from there the world has been obsessed.
This attraction isn’t against his physical persona, there was nothing extraordinary there, albeit his thin physique versus his hidden fighting techniques. No, the attraction was against his intellect. In an era where candles, gaslight and grime where the norm; The sciences were starting to gain momentum, Sherlock Holmes represented one of the foremost data scientists in literature.
He entered the scene in a chaotic London where ordinary life was brutal, and often inexplicable crimes dominated the headlines; most famous are the Whitechapel murders with Jack the Ripper. Here Holmes applied his rigid schema to extract signal from all the noise.
From the original literature, old and new movie adaptations, countless TV shows, spinoffs and documentaries we arrive to the same conclusion and are drawn to the idea that any problem may be solved if only we have the right processing power.
The strategic comparison, positive and negative.
Holmes wasn’t simply the first consulting detective; he was a master at abductive reasoning and a data engineering whiz.
Holmes has famously remarked (through Conan Doyle’s pen) that the little things are infinitely the most important at the time that meant a splash of mud, footprints or cigarette ash. These are the equivalent to identifying high value data within a dataset. He simply identified the outlier variables that hold key data to build a model.
There was often mention of catalogues, a famous example that is used over and over is his catalogue of cigarette ash,140 varieties. The equivalent to what we have today as referential data, libraries and dimensions.
Holmes brain was a primitive Vector database; he understood that storage is finite and prioritised his indexing to what was useful information while removing what he denotes as rubbish and not relevant to his system. This ensured low latency and high compression levels when in need to recall facts during a case.
In fact, in the first publication A Study in Scarlet, we clearly see how Watson attempts to classify Holmes capabilities by creating a score card. This showed how Holmes deliberately adopted a structured method anything that did not find in his business model was scraped.
Holmes had great troubleshooting capabilities. Where you have other characters like Lestrade, Watson and other members of the force coming with quick, unreasonable ideas and erroneous conclusions, Holmes systematically broke down the problem and sifted through the plausible and highly improbable scenarios to find the better way to deal with a problem. Although this was entirely a fictional point that Conan Doyle heightening on the lack of knowledge and incompetence of the police force. Nevertheless, it shows how the character looked at breaking down a problem into small pieces and tackled them in an orderly manner.

In his literature Conan Doyle often writes how Holmes sometimes relied on intuition and that lacked statistical significance. He often had a selection bias and worked with a sample size of one (n=1). This made his a logic deterministic, something to which the world had started and continued to increase to a more probabilistic approach. This also meant that he would eventually face a dilemma as the number of permutations increased over time and had to tackle them sequentially.
Holmes is singular, making his brain and himself a high-performance processor. Whilst good at the time, we often see that when he was not around Watson would be in difficulty to extrapolate the data into meaningful information. This is a modern scalable nightmare. A modern strategy would ideally require a data democratisation, meaning that the semi-technical people, like Watson are able to run the same queries and analytics without the need to wait for the master.
If we replicate the data points in the Victorian era and the Holmes stories using today’s technology, the outcome would shift from a reactive approach to a predictive one. After a murder would occur, the data expert would use spatial and/or temporal analysis to identify the conditions that led to such events. Continuously building patterns, transaction logs, data pipelines and if possible integration with IoT sensors.
I find that my work often feels like a Sherlockian investigation. You begin with an unstructured mess of data, unrefined, and confusing. Through a series of logical transformations, you start to reveal the truth hidden within.
Holmes taught me that it is a capital mistake to theorise before one has data. In 2026, we have the data. The challenge is having the Holmesian discipline to see what everyone else merely observes.