Watson is an artificial intelligence computer system capable of answering questions posed in natural language, developed in IBM’s DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM’s first president, Thomas J. Watson. The machine was specifically developed to answer questions on the quiz show Jeopardy!. In 2011, Watson competed on Jeopardy against former winners Brad Rutter, and Ken Jennings. Watson received the first prize of $1 million.
Watson had access to 200 million pages of structured and unstructured content consuming four terabytes of disk storage including the full text of Wikipedia, but was not connected to the Internet during the game. For each clue, Watson’s three most probable responses were displayed on the television screen. Watson consistently outperformed its human opponents on the game’s signaling device, but had trouble responding to a few categories, notably those having short clues containing only a few words.
In February 2013, IBM announced that Watson software system’s first commercial application would be for utilization management decisions in lung cancer treatment at Memorial Sloan–Kettering Cancer Center in conjunction with health insurance company WellPoint.
Watson is a Question answering (QA) computing system built by IBM. IBM describes it as “an application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open domain question answering” which is “built on IBM’s DeepQA technology for hypothesis generation, massive evidence gathering, analysis, and scoring.”
According to IBM:
Watson is a workload optimized system designed for complex analytics, made possible by integrating massively parallel POWER7 processors and the IBM DeepQA software to answer Jeopardy! questions in under three seconds. Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core. The POWER7 processor’s massively parallel processing capability is an ideal match for Watson’s IBM DeepQA software which is embarrassingly parallel (that is a workload that is easily split up into multiple parallel tasks).
According to John Rennie, Watson can process 500 gigabytes, the equivalent of a million books, per second. IBM’s master inventor and senior consultant Tony Pearson estimated Watson’s hardware cost at about $3 million and with 80 TeraFLOPs would be placed 94th on the Top 500 Supercomputers list. According to Rennie, the content was stored in Watson’s RAM for the game because data stored on hard drives are too slow to access.
Watson’s software was written in various languages, including at least Java, C++, and Prolog and uses Apache Hadoop framework for distributed computing, Apache UIMA (Unstructured Information Management Architecture) framework, IBM’s DeepQA software and SUSE Linux Enterprise Server 11 operating system. “[…] more than 100 different techniques are used to analyze natural language, identify sources, find and generate hypotheses, find and score evidence, and merge and rank hypotheses.”
The sources of information for Watson include encyclopedias, dictionaries, thesauri, newswire articles, and literary works. Watson also used databases, taxonomies, and ontologies. Specifically, DBPedia, WordNet, and Yago were used.
The IBM team provided Watson with millions of documents, including dictionaries, encyclopedias, and other reference material that it could use to build its knowledge. Although Watson was not connected to the Internet during the game, it contained 200 million pages of structured and unstructured content consuming four terabytes of disk storage, including the full text of Wikipedia.
|“||The computer’s techniques for unraveling Jeopardy! clues sounded just like mine. That machine zeroes in on key words in a clue, then combs its memory (in Watson’s case, a 15-terabyte data bank of human knowledge) for clusters of associations with those words. It rigorously checks the top hits against all the contextual information it can muster: the category name; the kind of answer being sought; the time, place, and gender hinted at in the clue; and so on. And when it feels “sure” enough, it decides to buzz. This is all an instant, intuitive process for a human Jeopardy! player, but I felt convinced that under the hood my brain was doing more or less the same thing.||”|
When playing Jeopardy! all players must wait until host Alex Trebek reads each clue in its entirety, after which a light is lit as a “ready” signal; the first to activate their buzzer button wins the chance to respond. Watson received the clues as electronic texts at the same moment they were made visible to the human players. It would then parse the clues into different keywords and sentence fragments in order to find statistically related phrases. Watson’s main innovation was not in the creation of a new algorithm for this operation but rather its ability to quickly execute thousands of proven language analysis algorithms simultaneously to find the correct answer. The more algorithms that find the same answer independently the more likely Watson is to be correct. Once Watson has a small number of potential solutions, it is able to check against its database to ascertain whether the solution makes sense. In a sequence of 20 mock games, human participants were able to use the average six to seven seconds that Watson needed to hear the clue and decide whether to signal for responding. During that time, Watson also has to evaluate the response and determine whether it is sufficiently confident in the result to signal. Part of the system used to win the Jeopardy!contest was the electronic circuitry that receives the “ready” signal and then examined whether Watson’s confidence level was great enough to activate the buzzer. Given the speed of this circuitry compared to the speed of human reaction times, Watson’s reaction time was faster than the human contestants except when the human anticipated (instead of reacted to) the ready signal. After signaling, Watson speaks with an electronic voice and gives the responses in Jeopardy!‘s question format. Watson’s voice was synthesized from recordings that actor Jeff Woodman made for an IBM text-to-speech program in 2004.
Comparison with human players
Watson’s basic working principle is to parse keywords in a clue while searching for related terms as responses. This gives Watson some advantages and disadvantages compared with human Jeopardy! players. Watson has deficiencies in understanding the contexts of the clues. As a result, human players usually generate responses faster than Watson, especially to short clues. Watson’s programming prevents it from using the popular tactic of buzzing before it is sure of its response. Watson has consistently better reaction time on the buzzer once it has generated a response, and is immune to human players’ psychological tactics.
The Jeopardy! staff used different means to notify Watson and the human players when to buzz, which was critical in many rounds. The humans were notified by a light, which took them tenths of a second to perceive. Watson was notified by an electronic signal and could activate the buzzer within about eight milliseconds. The humans tried to compensate for the perception delay by anticipating the light, but the variation in the anticipation time was generally too great to fall within Watson’s response time. Watson did not operate to anticipate the notification signal.
Since Deep Blue’s victory over Garry Kasparov in chess in 1997, IBM had been on the hunt for a new challenge. In 2004, IBM Research manager Charles Lickel, over dinner with coworkers, noticed that the restaurant they were in had fallen silent. He soon discovered the cause of this evening hiatus: Ken Jennings, who was then in the middle of his successful 74-game run on Jeopardy!. Nearly the entire restaurant had piled toward the televisions, mid-meal, to watch the phenomenon. Intrigued by the quiz show as a possible challenge for IBM, Lickel passed the idea on, and in 2005, IBM Research executive Paul Horn backed Lickel up, pushing for someone in his department to take up the challenge of playing Jeopardy! with an IBM system. Though he initially had trouble finding any research staff willing to take on what looked to be a much more complex challenge than the wordless game of chess, eventually David Ferrucci took him up on the offer. In competitions managed by the United States government, Watson’s predecessor, a system named Piquant, was usually able to respond correctly to only about 35% of clues and often required several minutes to respond. To compete successfully on Jeopardy!, Watson would need to respond in no more than a few seconds, and at that time, the problems posed by the game show were deemed to be impossible to solve.
In initial tests run during 2006 by David Ferrucci, the senior manager of IBM’s Semantic Analysis and Integration department, Watson was given 500 clues from past Jeopardy! programs. While the best real-life competitors buzzed in half the time and responded correctly to as many as 95% of clues, Watson’s first pass could get only about 15% correct. During 2007, the IBM team was given three to five years and a staff of 15 people to solve the problems. By 2008, the developers had advanced Watson such that it could compete with Jeopardy! champions. By February 2010, Watson could beat human Jeopardy! contestants on a regular basis.
While primarily an IBM effort, Watson’s development team includes faculty and students from Carnegie Mellon University, University of Massachusetts Amherst, University of Southern California’s Information Sciences Institute, University of Texas at Austin, Massachusetts Institute of Technology, New York Medical College, University of Trento, Queens College, City University of New York and Rensselaer Polytechnic Institute.
Competing on Jeopardy!
In 2008, IBM representatives communicated with Jeopardy! executive producer Harry Friedman about the possibility of having Watson compete against Ken Jennings and Brad Rutter, two of the most successful contestants on the show, and the program’s producers agreed. Watson’s differences with human players had generated conflicts between IBM and Jeopardy! staff during the planning of the competition. IBM repeatedly expressed concerns that the show’s writers would exploit Watson’s cognitive deficiencies when writing the clues, thereby turning the game into a Turing test. To alleviate that claim, a third party randomly picked the clues from previously written shows that were never broadcast. Jeopardy! staff also showed concerns over Watson’s reaction time on the buzzer. Originally Watson signaled electronically, but show staff requested that it press a button physically, as the human contestants would. Even with a robotic “finger” pressing the buzzer, Watson remained faster than its human competitors. Ken Jennings noted, “If you’re trying to win on the show, the buzzer is all,” and that Watson “can knock out a microsecond-precise buzz every single time with little or no variation. Human reflexes can’t compete with computer circuits in this regard.” Stephen Baker, a journalist who recorded Watson’s development in his book “Final Jeopardy”, reported that the conflict between IBM and Jeopardy! became so serious in May 2010 that the competition was almost canceled. Watson learns from his mistakes, for example, this mistake during a practice round. He was given the clue “This trusted friend was the first non-dairy powdered creamer,” to which he replied, “What is milk?” As part of the preparation, IBM constructed a mock set in a conference room at one of its technology sites to model the one used on Jeopardy! Human players, including former Jeopardy! contestants, also participated in mock games against Watson with Todd Alan Crain of The Onion playing host. About 100 test matches were conducted with Watson winning 65% of the games.
To provide a physical presence in the televised games, Watson was represented by an “avatar” of a globe, inspired by the IBM “smarter planet” symbol. Forty-two colored threads criss-crossed the globe, to represent Watson’s state of thought; the number 42 was an in-joke referring to the novel The Hitchhiker’s Guide to the Galaxy. Joshua Davis, the artist who designed the avatar for the project, explained to Stephen Baker that there are 36 triggerable states that Watson was able to use throughout the game to show its confidence in responding to a clue correctly; he had hoped to be able to find forty-two, to add another level to the Hitchhiker’s Guide reference, but he was unable to pinpoint enough game states.
A practice match was recorded on January 13, 2011, and the official matches were recorded on January 14, 2011. All participants maintained secrecy about the outcome until the match was broadcast in February.
In a practice match before the press on January 13, 2011, Watson won a 15-question round against Ken Jennings and Brad Rutter with a score of $4,400 to Jennings’s $3,400 and Rutter’s $1,200, though Jennings and Watson were tied before the final $1,000 question. None of the three players responded incorrectly to a clue.
The first round was broadcast February 14, 2011. The right to choose the first category had been determined by a draw won by Rutter. Watson, represented by a computer monitor display and artificial voice, responded correctly to the second clue and then selected the fourth clue of the first category, a deliberate strategy to find the Daily Double as quickly as possible. Watson’s guess at the Daily Double location was correct. At the end of the first round, Watson was tied with Rutter at $5,000; Jennings had $2,000.
Watson’s performance was characterized by some quirks. In one instance, Watson repeated a reworded version of an incorrect response offered by Jennings (Jennings said “What are the ’20s?” in reference to the 1920s. Then Watson said “What is 1920s?”) Because Watson could not recognize other contestants’ responses, it did not know that Jennings had already given the same response. In another instance, Watson was initially given credit for a response of “What is leg?” after Jennings incorrectly responded “What is: he only had one hand?” to a clue about George Eyser (The correct response was, “What is: he’s missing a leg?”). Because Watson, unlike a human, could not have been responding to Jennings’s mistake, it was decided that this response was incorrect. The broadcast version of the episode was edited to omit Trebek’s original acceptance of Watson’s response. Watson also demonstrated complex wagering strategies on the Daily Doubles, with one bet at $6,435 and another at $1,246. Gerald Tesauro, one of the IBM researchers who worked on Watson, explained that Watson’s wagers were based on its confidence level for the category and a complex regression model called the Game State Evaluator.
Watson took a commanding lead in Double Jeopardy!, correctly responding to both Daily Doubles. Watson responded to the second Daily Double correctly with a 32% confidence score.
Although it wagered only $947 on the clue, Watson was the only contestant to miss the Final Jeopardy! response in the category U.S. CITIES (“Its largest airport was named for a World War II hero; its second largest, for a World War II battle”). Rutter and Jennings gave the correct response of Chicago, but Watson’s response was “What is Toronto????” Ferrucci offered reasons why Watson would appear to have guessed a Canadian city: categories only weakly suggest the type of response desired, the phrase “U.S. city” didn’t appear in the question, there are cities named Toronto in the U.S., and Toronto, Ontario has an American League baseball team. Dr. Chris Welty, who also worked on Watson, suggested that it may not have been able to correctly parse the second part of the clue, “its second largest, for a World War II battle” (which was not a standalone clause despite it following a semicolon, and required context to understand that it was referring to a second-largest airport). Eric Nyberg, a professor at Carnegie Mellon University and a member of the development team, stated that the error occurred because Watson does not possess the comparative knowledge to discard that potential response as not viable. Although not displayed to the audience as with non-Final Jeopardy! questions, Watson’s second choice was Chicago. Both Toronto and Chicago were well below Watson’s confidence threshold, at 14% and 11% respectively. (This lack of confidence was the reason for the multiple question marks in Watson’s response.)
The game ended with Jennings with $4,800, Rutter with $10,400, and Watson with $35,734.
During the introduction, Trebek (a Canadian native) joked that he had learned Toronto was a U.S. city, and Watson’s error in the first match prompted an IBM engineer to wear a Toronto Blue Jays jacket to the recording of the second match.
In the first round, Jennings was finally able to choose a Daily Double clue, while Watson responded to one Daily Double clue incorrectly for the first time in the Double Jeopardy! Round. After the first round, Watson placed second for the first time in the competition after Rutter and Jennings were briefly successful in increasing their dollar values before Watson could respond. Nonetheless, the final result ended with a victory for Watson with a score of $77,147, besting Jennings who scored $24,000 and Rutter who scored $21,600.
The prizes for the competition were $1 million for first place (Watson), $300,000 for second place (Jennings), and $200,000 for third place (Rutter). As promised, IBM donated 100% of Watson’s winnings to charity, with 50% of those winnings going to World Vision and 50% going to World Community Grid. Likewise, Jennings and Rutter donated 50% of their winnings to their respective charities.
In acknowledgment of IBM and Watson’s achievements, Jennings made an additional remark in his Final Jeopardy! response: “I for one welcome our new computer overlords”, echoing a similar memetic reference to the episode “Deep Space Homer” on The Simpsons, in which TV news presenter Kent Brockman speaks of welcoming “our new insect overlords”. Jennings later wrote an article for Slate, in which he stated “IBM has bragged to the media that Watson’s question-answering skills are good for more than annoying Alex Trebek. The company sees a future in which fields like medical diagnosis, business analytics, and tech support are automated by question-answering software like Watson. Just as factory jobs were eliminated in the 20th century by new assembly-line robots, Brad and I were the first knowledge-industry workers put out of work by the new generation of ‘thinking’ machines. ‘Quiz show contestant’ may be the first job made redundant by Watson, but I’m sure it won’t be the last.”
Philosopher John Searle argues that Watson—despite impressive capabilities—cannot actually think. Drawing on his Chinese room thought experiment, Searle claims that Watson, like other computational machines, is capable only of manipulating symbols, but has no ability to understand the meaning of those symbols; however, Searle’s experiment has its detractors.
Match against members of the United States Congress
On February 28, 2011, Watson played an untelevised exhibition match of Jeopardy! against five members of the United States House of Representatives: Rush D. Holt, Jr. (D-NJ, a former Jeopardy! contestant), Jim Himes (D-CT), Jared Polis (D-CO), Nan Hayworth (R-NY) and Bill Cassidy (R-LA). IBM organized the event to “foster a conversation about how technology can positively impact society”.
In the only round he played, Holt led with Watson in second place. However, combining the scores between all matches, the final score was $40,300 for Watson and $30,000 for the congressional players combined.
Future uses of software system
According to IBM, “The goal is to have computers start to interact in natural human terms across a range of applications and processes, understanding the questions that humans ask and providing answers that humans can understand and justify.” It has been suggested by Robert C. Weber, IBM’s general counsel, that Watson may be used for legal research.
Watson is based on commercially available IBM Power 750 servers that have been marketed since February 2010. IBM also intends to market the DeepQA software to large corporations, with a price in the millions of dollars, reflecting the $1 million needed to acquire a server that meets the minimum system requirement to operate Watson. IBM expects the price to decrease substantially within a decade as the technology improves.
Commentator Rick Merritt said that “there’s another really important reason why it is strategic for IBM to be seen very broadly by the American public as a company that can tackle tough computer problems. A big slice of Big Blue’s pie comes from selling to the U.S. government some of the biggest, most expensive systems in the world.”
On January 30, 2013, it was announced that Rensselaer Polytechnic Institute will receive a successor version of Watson. It will be housed at the Institute’s technology park and be available to researchers and students.
As of February 2011, IBM and Nuance Communications Inc. are partnering for the research project to develop a commercial product during the next 18 to 24 months that will exploit Watson’s capabilities as a clinical decision support system to aid the diagnosis and treatment of patients. Physicians at Columbia University are helping identify critical issues in the practice of medicine where the Watson technology may be able to contribute and physicians at the University of Maryland are working to identify the best way that a technology like Watson could interact with medical practitioners to provide the maximum assistance.
In September 2011, IBM and Wellpoint, a major healthcare solutions provider in the United States, announced a partnership to utilize Watson’s data crunching capability to help suggest treatment options and diagnoses to doctors. Just as Watson analyzed massive data in Jeopardy! to reach a set of hypotheses and list several of the most likely outcomes, it could help doctors in diagnosing patients. Watson could analyze the patient’s specific symptoms, medical history, and hereditary history, and synthesize that data with available unstructured and structured medical information, including published medical books and articles. IBM has made it clear that Watson does not intend to replace doctors, but assist them to avoid medical errors and sharpen medical diagnosis with the help of its advanced analytics technology. IBM intends to use Watson in other information intensive fields as well, such as telecommunications, financial services, and government.
In December 2011, in what has been compared to IBM Watson, Microsoft and GE announced a partnership to utilize technology in improving healthcare. They aim to use analytics, high-performance computing and software technologies to deliver patient outcomes as well as clinical applications.
IBM announced a partnership with Cleveland Clinic in October 2012. The company has sent Watson to the Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, where it will increase its health expertise and assist medical professionals in diagnosing and treating patients. The medical facility will utilize Watson’s ability to store and process large quantities of information to help speed up and increase the accuracy of the diagnostic process. “Cleveland Clinic’s collaboration with IBM is exciting because it offers us the opportunity to teach Watson to ‘think’ in ways that have the potential to make it a powerful tool in medicine,” said C. Martin Harris, MD, chief information officer of Cleveland Clinic.
On February 8, 2013, IBM announced that oncologists at Maine Center for Cancer Medicine and Westmed Medical Group in New York have started to test the Watson supercomputer system in an effort to help diagnose lung cancer and recommend treatment. In February 2013, IBM announced that Watson’s first commercial application would be for utilization management decisions in lung cancer treatment at Memorial Sloan–Kettering Cancer Center in conjunction with health insurance company WellPoint. Utilization management is the evaluation of the appropriateness, medical need and efficiency of health care services procedures and facilities according to established criteria or guidelines and under the provisions of an applicable health benefits plan.
via Watson (computer) – Wikipedia, the free encyclopedia.