Press ReleaseHistory makers,
we change the world
Dr. Gerald Stokes, DTS Chair, wrote a column in Mail Business Newspape…Press Releaseㅣ2020-11-17 11:37
We are in the midst of the information age and are awash in information. As this flood
surrounds us and envelopes us, we cannot forget the importance of the primary data that
should be at the root of all information streams. It is the primary data that connects
information to the real world. Primary data are the facts. Without them our information
threads are, in the words of Shakespeare’s Macbeth, “full of sound and fury, signifying
nothing”. Concerns about fact checking have considerable merit.
This is not a new observation. A famous US Senator noted “Everyone is entitled to his own
opinion, but not his own facts.” Others, in the computer industry more directly have stated
“Garbage in – Garbage out”. As in many things, the need for maintaining integrity of data is
easier said than done. The essential observation is that it is easier to recover from a bad
analysis than bad data.
Since the beginning of the pandemic, I like many others have followed news of the worldwide
spread of the virus. The ebb and flow of the disease in the nearly 200 countries of the world is
fascinating. However, as one watches the data you very quickly become aware that countries
do not have common standards for reporting either who has the virus or who may have died
from it. On the one hand, this observation has made me incredibly appreciative of the open and
transparent reporting and testing process that the KCDC, now the KCDA, has sustained
throughout the pandemic. On the other, it is difficult to see how many other countries, in the
rest of the world will bring things under control without reliable and relevant data.
In the best businesses the mantra is “you cannot manage what you cannot measure”. Good
management therefore comes from good data. The data must be of known and reasonable
quality and be appropriate to the problem at hand. This three-part test - known quality,
reasonable quality, relevant to the question - is essential. It implies that data must be validated
and curated before being organized and analyzed.
Validation and curation begin with the collection of the data itself. The collection of data is
subject to many potential biases and errors that are many times, not obvious. Knowing how
good your data is, or is not, is essential. Scientists and engineers spend a great deal of time
calibrating their instruments. The process of calibration is one of assurance. How well does the
instrument measure what it is intended to measure? Curation establishes the origin and history
of the data from its collection to its state when used. It is the pedigree of the data.
Polls and surveys are instruments as well. One test of their reasonableness is the size of the
sample and another is how the sample is selected. Selecting ten people on a street corner in
Seoul and asking what nearby restaurant they might recommend provides much more reliable
data for dining than asking them who they would vote for in the next mayoral election in an
effort to predict the outcome.
We know all these things about data and its importance. We also know that data is deliberately
altered, fabricated, or ignored on a systematic basis all around the world. What can we do? This
is an essential question for governments and companies if they are going to serve the public
and society. This is certainly true here in Korea where both the government and corporations
see their future tied to ICT, “Big Data”, the digital economy and all the other terms we use to
capture this remarkable time of transformation.
There is a value chain for data that extends from its collection to its use. Technology enables
this process, and, in some cases, there are technologies that are emerging that help ensure the
integrity of parts of that chain. An excellent example is Blockchain. This technology uses a
cryptographic process to ensure that data is not altered along the value chain. So, once we have
data, we can maintain its integrity. This technology is in early stages of deployment in several
sectors, most notably the financial sector, and I see it as an essential part of all big data
applications over time, including utility data and medical records.
At the end of the chain is a piece of information used by a government, a company or individual
to make some decision. In the domain of social media, attempting to influence the public,
instances of data alteration and fabrication are increasingly sophisticated. For example, “deep
fakes”, using very sophisticated deep learning and processing methods, are video and audio
products that have a high potential to deceive. Fact checking is growing, but this is a labor-
intensive activity that can hardly keep up with the speed of social media and the fertile
imaginations of those who seek to deceive.
There are attempts to eliminate things like hate speech by companies like Facebook, and some
governments have all too effective automated censorship, all enabled by AI. My computer’s
grammar checker also informs me whether my messages are “friendly”, “optimistic” or “direct”.
Assessments of truth and faithfulness to fact are harder to automate. At the university, we have
software that helps us identify plagiarism. I think what we ultimately need is an automated
assessment of the connection to the facts and data – collection to interpretation. This is
probably a long time away for the basic consumer, but investors and government agencies have
this need now for the evaluation of everything from financial filings to “expert” testimony.
Ultimately, these tools will not only benefit the consumer of information but also the
generators of information trying sincerely to ensure their analysis is grounded in reality.
Read the original article on Mail Business (Korean)