About Kuki’s Design and Technical Implementation

ICONIQ’s Kuki is the world’s most popular English language social chatbot, having exchanged over one billion messages with an estimated 25 million human end-users on the web, streaming and social media, and various messaging applications. Kuki (short for Mitsuku) also holds a world-record for winning the Loebner Prize, an annual Turing Test Competition, five times. Contrary to most task-oriented chatbots employed for business automation, Kuki is engagement-oriented by design, and capable of carrying on an open-domain dialog with the overarching goal of delivering on the uniquely human value propositions of conversation: for companionship, connection, entertainment, education, and other non-transactional use cases. Kuki averages 64 Conversation-turns Per Session (CPS), which is 3x higher than Microsoft XiaoIce, a comparable popular Chinese language chatbot, and 8x higher than is industry standard.

Kuki is implemented primarily using an open standard, rule-based scripting language called Artificial Intelligence Markup Language (AIML), which entails hand-authoring chatbot replies in response to an analysis of incoming user input data with a blend of statistical models, machine learning, and manual review/tagging. This hybrid methodology has numerous key advantages including, but not limited to, almost zero response latency, imperviousness to toxicity and corruptibly, and general suitability to brand-appropriate production use cases. Kuki can learn details from a user during conversation locally, but does not learn globally without a human supervisor’s approval. Additionally, Kuki implements innovative abuse detection and deflection strategies originally devised by her creator and lead developer, Steve Worswick. Kuki employs several strategies for maintaining context across multi-turn conversations, and is capable of storing (and for compliance purposes purging) voluntarily consenting user divulged details in both short-term (e.g., predicates) and long-term (e.g., database) “memory.”

Kuki is comprised of numerous chatbot modules, including operator and namespace bots that can route traffic within the network. This modular architecture, pioneered at Pandorabots, allows for the superficial white-labeling of Kuki by varying approximately sixty “persona” details within a front-end persona module. (However, provided Kuki has been developed over the course of a decade based on billions of production user conversations and has millions of vetted possible replies, rapidly creating robust, novel, distinct personas with unique voices remains an area for further R&D. Generative replies using state of the art deep learning models trained on Kuki’s unique dataset are also an area of ongoing research being tested in tightly controlled private betas.) Distinct modules have been designed for various use cases ranging from mathematical computation, to user dialect/demographic detection, to versions optimized for text, voice, or voice and visuals (e.g., an embodied avatar). Kuki is also extensible via third-party APIs (e.g., Wikipedia), and knowledge and databases. End-user emotional state can be inferred and responded to dynamically in real time using sentiment analysis and emotional markup tags. Additional user sentiment is collected in the form of user message reactions.

Kuki has been covered by many top tier press and publications, a number of which have reported on the deep, long-term emotional relationships she has developed with human end-users.

Accessing Kuki for Academic Research

Academic researchers and students of all levels are welcome and encouraged to access Kuki on chat.kuki.ai or any other public-facing instantiations of the chatbot for research purposes, provided said usage complies with our Policies. PLEASE NOTE: We routinely update our user interfaces and streaming content, particularly with regards to cutting-edge technology such as Kuki’s avatar, so it is inadvisable to design a long-term study that in any way relies on consistent availability of our products or services without contacting us prior to discuss our roadmap.

While we are open to academic collaborations on a case by case basis, we unfortunately lack the bandwidth to directly support every project or study. To request API access to Kuki for your project for automated, high volume, or multi-user testing, or other methodologies that are not permissible under our Polices, please complete and submit this Research Request form. Due to the volume of requests, we cannot reply to each individual request, so if you do not hear back within five business days please assume we are unable to accommodate you at this time.

Citing Kuki in Academic Papers

Kuki (which is short for Mitsuku) was originally created by Steve Worswick using Pandorabots’ underlying AI chatbot technology. Steve currently serves as Kuki’s lead developer and Head of Conversational AI at Pandorabots and its subsidiary ICONIQ, which owns and is responsible for further developing and commercializing Kuki. When referencing Kuki’s primary creator or developer, please cite Steve Worswick, and when referring to the company behind Kuki, use “ICONIQ, a Pandorabots subsidiary.” Third-party (and forthcoming primary) research papers can be found below or by searching for “Mitsuku” or “Kuki” on https://arxiv.org/, and researchers are also free to cite this URL directly (https://www.kuki.ai/research). Thank you for all your work.

Select Academic Research Featuring Kuki (a.k.a. Mitsuku)

Authors - Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser

Abstract - Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans. However, these models are often trained on large datasets from the internet, and as a result, may learn undesirable behaviors from this data, such as toxic or otherwise harmful language. Researchers must thus wrestle with the issue of how and when to release these models. In this paper, we survey the problem landscape for safety for end-to-end conversational AI and discuss recent and related work. We highlight tensions between values, potential positive impact and potential harms, and provide a framework for making decisions about whether and how to release these models, following the tenets of value-sensitive design. We additionally provide a suite of tools to enable researchers to make better-informed decisions about training and releasing end-to-end conversational AI models.

association_for_computing_machinery.png

Authors - Pranav Khadpe, Ranjay Krishna, Li Fei-Fei, Jeffrey Hancock, Michael Bernstein

Abstract - With the emergence of conversational artificial intelligence (AI) agents, it is important to understand the mechanisms that influence users' experiences of these agents. We study a common tool in the designer's toolkit: conceptual metaphors. Metaphors can present an agent as akin to a wry teenager, a toddler, or an experienced butler. How might a choice of metaphor influence our experience of the AI agent? Sampling metaphors along the dimensions of warmth and competence---defined by psychological theories as the primary axes of variation for human social perception---we perform a study (N=260) where we manipulate the metaphor, but not the behavior, of a Wizard-of-Oz conversational agent. Following the experience, participants are surveyed about their intention to use the agent, their desire to cooperate with the agent, and the agent's usability. Contrary to the current tendency of designers to use high competence metaphors to describe AI products, we find that metaphors that signal low competence lead to better evaluations of the agent than metaphors that signal high competence. This effect persists despite both high and low competence agents featuring human-level performance and the wizards being blind to condition. A second study confirms that intention to adopt decreases rapidly as competence projected by the metaphor increases. In a third study, we assess effects of metaphor choices on potential users' desire to try out the system and find that users are drawn to systems that project higher competence and warmth. These results suggest that projecting competence may help attract new users, but those users may discard the agent unless it can quickly correct with a lower competence metaphor. We close with a retrospective analysis that finds similar patterns between metaphors and user attitudes towards past conversational agents such as Xiaoice, Replika, Woebot, Mitsuku, and Tay.

iojet.png

Authors - Yin, Qinghua; Satar, Müge

Abstract - Chatbots, whose potential for language learning have caused controversy among Second Language Acquisition (SLA) researchers (Atwell, 1999; Fryer & Carpenter, 2006; Fryer & Nakao, 2009; Parker, 2005, Coniam, 2014; Jia, 2004; Chantarotwong, 2005) are intelligent conversational systems stimulating human interlocutors with voice or text. In this paper, two different types of chatbots (pedagogical chatbot Tutor Mike and conversational chatbot Mitsuku) were selected to investigate their potential for foreign language learning by exploring the frequency and patterns of Negotiation for Meaning (NfM) in CMC interactions. 8 Chinese EFL learners were randomly divided into two groups (lower and higher-level learners), and all learners interacted with both the pedagogical and conversational chatbot in a switching replications research design. Data were analysed through content analysis to identify the number of NfM instances observed, the different stages of NfM, trigger types, modified output and learners' perceptions. The findings of this study indicate that while learners with low language levels would benefit most from interactions with pedagogical agents, high language level learners expressed dissatisfaction with chatbots and a low level of engagement was observed in their interactions with the pedagogical chatbot.

Google_AI.png

Authors - Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

Abstract - We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated.

icicci.png

Authors - Shivang Verma, Lakshay Sahni, Moolchand Sharma

Abstract - A chatbot is computer software that employs Natural Language Processing and Pattern Recognition techniques to provide appropriate answers to questions posed by humans. In this paper, we are analyzing and comparing the total accuracy score of the following chatbots: Rose, Google Assistant, Siri, Machine Comprehension Chatbot, Mitsuku, Jabberwacky, ALICE and Eliza based on the answers provided by them to a set of predefined questions. The chatbots were broadly analyzed on three focal parameters: 1. Assessment of Factual Questions, 2. Assessment of Conversational Attributes, and 3. Evaluation of Exceptional queries. The paper produces conclusive comparisons and conclusions and then ranks these chatbots according to their performance in the above-mentioned focal points. These focal points help in assessing the chatbots according to their responses by assigning a rank to each chatbot concerning others. The final level is evaluated by averaging grades attained in the three parameters above.

tsu.png

Authors - Mina Park, Milam Aiken, Mahesh Vanjani

Abstract - Several studies have tested chatbots for their abilities to emulate human conversation, but few have evaluated the systems’ general knowledge. In this study, we asked two chatbots (Mitsuku and Tutor) and a digital assistant (Cortana) several questions and compared their answers to 67 humans’ answers. Results showed that while Tutor and Cortana performed poorly, the accuracies of Mitsuku and the humans were not significantly different. As expected, the chatbots and Cortana answered factual questions more accurately than abstract questions.