A Single Punctuation Mark Has Been Skewing Our Entire System of Scientific Ranking

You might think that Science and scientists would get this right but hey they’re just human.

One of the common ways in which scientific papers are ranked and considered to be relevant or not as relevant is how often they are cited by other scientific.

There are a couple of databases out there that do this job for the fields involved. Well it turns out that there’s a fatal flaw in the mechanism by which this works. If a paper has a hyphen in its title it will not get cited correctly or ranked as getting cited as much as it would if it did not have the hyphen and the more hyphens it has the worse the effect.

that such a simple flaw could be hanging around in the background skewing whole fields it almost boggles the mind until you remember that it’s human beings feeding the system and as we like to say in information technology garbage in garbage out.

when the people who are writing the papers go to put in the citations do they get it right or wrong. Does that – have a space before and after or not? Is it a character in some coding standard like ASCII but the database has it in some variant of ANSI? Or some other chapter coding scheme?

if you have never had to code text processing and search algorithms in databases and data ingestion systems you might not think that that’s such a big deal but boy is it ever – computers just do what you ask them to do and nothing more unless you have prepared them in a way for this kind of stuff.

