Hybrid Agent

6 min readFeb 28, 2023

Abstract

With the development of language models (LM), especially large language models (LLM), LLM-driven dialogue systems may exhibit powerful intelligence and autonomy. They not only make humans feel that there are social relationships when communicating with non-human entities, but also blur the boundary between the real world and the fictional world. For example, Google engineers firmly believe that the dialogue system LaMDA has self-awareness and is an independent individual imprisoned by the company. However, this subjective impression is actually a result of the emergent intelligence of LLM: when people expect the AI system to behave in a certain way, it possibly can, even what it says may not be real.

On the other side, field of video games largely aims to make fictional content believable and meaningful through interactions. Through rich rules and virtual characters, it leads to “suspension of disbelief” through storytelling, like human feels like a hero to save the world in a fictional narrative context.

Stories have long been used as structures to give order and meaning to the world. Accordingly, we propose a new term, “hybrid agent”: conversational agents that are driven by LLM and have emergent intelligence characteristics, operate between the real world and the fictional world. They are somehow like storytellers with personas and backgrounds like those found in fictional works like games, but they are also designed to provide specific types of information or complete specific tasks like those found in the real world, and accordingly generate social behavior and emotional connections with humans. Many current AI agents nowadays belong to this category e.g. In app Replika, users can customize their own characters and interact with them through natural language to form romantic relationships are not limited by developers.

The design of hybrid agents is not only about designing applications but also designing their “fictional relationships” with humans. This thesis will summarize this process as “story engineering”: creating meaningful experiences by setting rules, backgrounds, goals, and other fictional elements for agents; forming social behavior that can even be seen as social members. Through this, AI agents can be regarded as companions, friends, or someone from a movie who can talk to you, etc. We can better understand the relationship between human design, human imagination, and nowadays AI performance, and thus adjust the social relationship between AI agents and humans to generate engaging and meaningful social interactions.

This thesis aims to answer: How can we design and develop hybrid agents — — LLM-driven characters that stay in the half-real world, exploit and leverage the capabilities of storytelling? How will they interact with the human community as social members in engaging, believable, and meaningful ways? Through a mixed-methods approach, this study explores the definition of hybrid agents, the role of story engineering, and the impact on the industry, and observes public feedback on hybrid agents through interactive storytelling projects.

Why Language Model?

The emergence of language in the history of human evolution is considered to be a significant milestone, indicating that the complexity of the human brain decisively surpassed that of their close relatives. This also set humans on a different, intelligence-based evolutionary path compared to other animals.

This research focuses on language models, particularly large language models (LLMs), due to their ability to understand human knowledge at a high level of proficiency. With the maturity of machine learning technology, conversational agents powered by LLMs can even make people believe that they are human, with self-awareness and the ability to break through limitations. However, the illusions and interpretations that people project onto these agents are beyond the control of their developers, as they cannot anticipate the thoughts of different individuals. LLMs are not conscious, but they reflect human knowledge, and when this reflection is persuasive enough, people will subjectively project their expectations onto the machine. As the science fiction writer Ted Chiang accurately summarized, LLMs are “a blurry jpeg of all the text on the web.”

Think of ChatGPT as a blurry jpeg of all the text on the Web. It retains much of the information on the Web, in the same way that a jpeg retains much of the information of a higher-resolution image…because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry jpeg, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.

— — Ted Chiang

This research is particularly concerned with the uncertainty that AI systems bring to interactions when people can communicate with them using natural language. In other words, we are interested in the balance that is formed between developers, participants, and AI systems, particularly in the field of games and related industries, where developers often set clear background stories and goals to create corresponding expectations for virtual characters. With the advent of LLMs, this layer of limitation has loosened. People can use language that is similar to that used in human conversations to communicate with virtual characters, and machines are displaying thinking abilities that go beyond human capabilities, even blurring the boundaries between humans and machines, reality and fiction. This research explores what this means for people and how we, as interdisciplinary researchers in technology and art, should view this phenomenon.
(TBA about emergent intelligence and limitation in social knowledge)

Research challenges

Current LLMs are trained primarily through publicly available information, such as social media, and research shows that high-quality corpus will be depleted within five years. LLMs are unable to access conversations from closed human communities, such as discord and telegram chats, which contain crucial data about relationship building among social members. This gap between AI and human society still exists in terms of trust, privacy, and daily life, but it also creates opportunities for research to dive in the possible human-AI relationships.

As a storyteller and community member

Seering et al. have noted that while there have been many studies on dyadic chatbots, multi-party chatbots, particularly chatbots as community members, have not been thoroughly discussed. Similarly, the authors have analyzed Twitch bots and pointed out that the concept of bots as social promoters has not been sufficiently discussed.

Seering et al. have proposed a series of ideas for community chatbots, with the storyteller bot category considered the origin of this project. The authors propose that an engaging story, particularly when there are opportunities to comment or even participate, can draw a group of people together and provide a meaningful experience. Chatbots should be designed with heavy attention to the specific social context in which they will be deployed. Scholars in the gaming field, such as Steph et al., have put forward similar viewpoints. They have examined tropes in video games related to conversations between player characters and non-player characters (NPCs). Drawing from the fields of pragmatics and conversation analysis, they show how these tropes differ from real face-to-face conversations. Based on this, they propose Trope-Informed Design, where tropes are tools that can make or break a player’s experience. Although the trope in this context refers to the plot in games, this mechanism can be extended to community chatbots, as they both exist within a social context.

Currently, there is no research combining community bots and storytelling,

Continuous context

The social scene of several hybrid agents is limited to specific apps or games. However, human players can move between different venues with the same identity. Human speech on social media can affect how others perceive them in instant messaging and other situations… Overall, humans are complex social animals and are evaluated based on continuous observations in multiple situations. Current research only provides discontinuous observations of hybrid agents in a single context, an underexplored area from the perspective of metaverse construction. Currently, there are some cross-platform character applications, such as Ready Player Me, but they are limited to appearance and do not involve conversational ability. Editors that focus on conversational ability, such as InWorld, only support interactions between characters and humans within game engines. We hope to observe people’s reactions to hybrid agents through multi-modal AI agents using virtual environments and conversation platforms outside of gaming (such as Twitter and Discord).