LightBerry: The Social Operating System for Robots

LightBerry: The Social Operating System for Robots

Lightberry is building the social operating system for robots. It aims to give humanoid robots agency, emotional intelligence, and the ability to interact naturally with humans. Social robotics emerged in the late 1990s when Cynthia Breazeal developed Kismet as part of her doctoral research at the MIT Artificial Intelligence Lab. Kismet showed that robots could engage in social exchange by perceiving and expressing human emotion. LightBerry builds on this foundational insight by developing software designed for human-robotic interactions. 

The team has built a multi-layered architecture that separates robotic cognition into speech, internal reasoning, and motion planning. Their system enables robots to understand emotional context and express intent across different hardware platforms. We first encountered Lightberry's work through their delightful robot photographer @beachboyrobot, which live-tweets events and conferences across the Bay Area.

Stephan Koenigstorfer, co-founder of Lightberry, studied physics at Imperial College and contributed to the ALICE experiment at CERN during his PhD at the Technical University of Munich. He is a blue-sky thinker with a deep technical understanding of the robotics landscape. 

In this conversation, Koenigstorfer shares how Lightberry translates intent into action using platform-specific components and VLA models, why he challenges the assumption that social robotics requires large amounts of interaction data, and how his team is exploring emotional expression through motion.

– 

Your robots have agency and what you describe as an “internal monologue of thoughts.” How does this system work? 

We take a lot of inspiration from how humans think. Imagine different parts of the brain: one responsible for speech, another for maintaining an internal monologue – listening to sensory inputs, deciding if we need to change direction or take action – and a third for motion planning, like avoiding collisions with people. We've split it into various parts. Some run on LLMs and other models; some work on good old deterministic logic, depending on what works best. We're very much an engineering company in the sense that we take what's already out there and put it together to create something new.

What led you from voice assistants to a full robotics platform?

We had developed our own system for turn-taking – when the machine speaks versus the person – and we thought it was a small part of our stack. Then a robotics customer asked us to come with them to CVPR in Nashville earlier this year. We demoed it there, and all the robotics companies came to talk to us. That was the initial spark. They were all saying, "We need this."

When you move from a web-based voice assistant to an actual hardware stack, things get a lot more difficult; many things aren't taken care of for you anymore. The hardware is in a really good space, but there's a lack of software. The more we looked into robotics as a market, the more excited we got. We visited a lot of companies, many in China, to see where the hardware and software were at. We brought on a roboticist with about ten years of experience to round out the team, and that's how we got here.

What really got us excited, and keeps us excited, is realizing it's not just about optimizing voice technology for robots. Things become very different when it's not a faceless voice assistant but a physical embodiment; people suddenly expect you to see, to move, to tell different people apart, to understand what they're saying and doing. That's natural, because that's everything we expect from another person. The form factor fundamentally changes the requirements.

You're deploying across different hardware platforms. How do you ensure consistent behavior across different kinematics, sensors, and motors?

When we actually start accessing the robot in our software, we hand over to models specifically trained for that output, particularly for motion. The part of the brain responsible for how the robot should move will say something like, "I would like to shake my hand." It's not going to say, "Move the shoulder actuator there" – it's not at that level. Instead, it expresses intent, and that goes to a platform-specific component that already has a policy for handshakes, or a VLA model that can translate that intent into action based on the individual hardware.

Social robotics seems to need a lot of interaction data to improve. How do you gather, structure, and learn from real-world human interactions at scale?

We actually challenge the presumption that we need vastly more interaction data. It's really about how emotive you can be; not just responding quickly or syncing everything up. It's about understanding emotional context and all the unspoken things in human communication. If you give models that context, they're actually pretty good right now. Not perfect, but pretty good.

You could train an entirely new model to do everything, but doing that on a robot is difficult. Where does the compute live? How does latency affect things when you're streaming data up and down? And you'd need to train a separate model for each robot, because how you express emotions through body language changes with each platform. 

So how do we actually collect data? By deploying robots out there. People are very eager to interact with them, even now, because they can be quite charming. We're at the point where we're not perfect, but we're good enough to start deploying. It shouldn't keep living in a lab. It should be on the streets, amongst people.

You've mentioned that customers can program robots by talking to them with no coding involved. How does that work in practice?

When you receive the robot or reset it, it boots up in what we call the Setup Assistant, which is a personality that asks, "What would you like me to do?" Think about how a small business owner would instruct an employee – it's basically the same.

You tell the robot: "Today you're going to be event staff at this conference. I want you to entertain guests. I want you to know everything about the event – please go look it up. Be funny and charming.” Alternatively, “be a little reserved so you don't rush into people; be more proactive than reactive." You tell it what to do in plain language, the same way you would a person. It keeps asking questions until it knows enough, then reboots into this new personality and hooks up whatever skills it needs.

Right now, if you ask it to be a bartender, it'll tell you it can't because fine motor skills are still very difficult. If you ask it to be a photographer, it can do it, because it has eyes that work as a camera. It can take photos, post-process them, email them to people, even post them on Twitter. You can check out our robot photographer @beachboyrobot on Twitter, where it is live-tweeting events.

If you ever have to interact with the robot in a way that's materially different from how you'd interact with a person, we've failed as a company. If you can't call them, can't Slack them, can't email them – if you have to open a dashboard – we've failed.

You're deploying in homes, offices, shops, and conferences. Which use cases are gaining the most traction right now?

Right now, we see the most traction in anything that is attention-related – effectively marketing – which means conferences and events specifically. The reason for this is that marketing is typically an interactive task. If you have a robot in the home that can't do anything physical, it's not super useful yet. But if you have a robot at your conference booth chatting to people about your company while you're grabbing coffee or having a deeper conversation with one customer, that both attracts attention and the software is good enough for it. We're stopping traffic every time we take it out on the street. People stop their cars in the middle of the road to ask us what the hell is going on.

What future use cases are you most excited to explore in the next 12 to 18 months?

The next use case I’m very excited to explore is shop assistants. Think about a clothing store or department store where shop assistants do a variety of tasks; part of it is organizing things, which is still difficult for robots. But part of it is interacting with customers, advising them, giving feedback and opinions.

What's fascinating is that the robot's taste becomes an emergent property of how you prompt the models. And it puts people into actual repeated contact with robots; not just a one-time event. That moves away from attention being the main selling point toward robots actually doing a subset of real tasks.

What we've seen at events is that robots never replace human team members. Instead, the humans say things like, "Hey, this is something I don't really like doing – can you do it?" How those dynamics play out when a robot is a team member for weeks, not just a day or two, is very exciting to me.

One use case that is slightly more long-term is companionship for elderly care. If you look at Japan and China, elderly care is a very heavy focus of robotics. In the US and Europe, it will probably become a heavy focus pretty quickly. You have an aging population and an epidemic of loneliness among older people; companionship robots are one way to combat that at home.

Do you think robots can eventually emulate true human companionship through emotional intelligence, stored memory, and repeated conversation?

Yes, I think so. Memory and understanding context aren't really the bottlenecks anymore. Even if you just keep a transcript of every conversation, you have tools that can search over it when something comes up. With an internal monologue, you can decide to run related queries and inject relevant context into the conversation.

The bigger challenge is that models are trained on text data and aren't very good at understanding everything we don't say. Most of our emotional communication happens in what we don't say – tonality, body language. That's the real bottleneck, and that's what we're actively working to solve. It's still early days, but to me, that's the frontier.

You have a lean team right now. How are you thinking about building out  your team over the next three to six months?

We have three full-time people: myself, my co-founder Ali, and Steph, who we brought onto the founding team for his robotics experience. We have a few people on contract for CAD design and brand design. We're looking to hire two to five more people over the next three to five months.

Specifically, we're looking for robotics experience – people who've trained reinforcement learning models for locomotion and specific policies. We also need one or two backend software engineers to help optimize latency and scale our systems for production, supporting more robots across different geographies.

Here's how we think about it: look at robots in movies. You're seeing fifty years of market research into what kind of life with robots we can imagine, and which versions we actually like. There's a huge difference between how Arnold Schwarzenegger's Terminator acts versus R2-D2, C-3PO, WALL-E, or Baymax. Science fiction isn't just fiction that gets you excited – it's a guideline for where we should be heading.

So we're also interested in anyone with experience in animation or emotive interfaces. There are great companies out there working on robots that can fold laundry or play ping pong. What we're looking for is: how can we express different emotions through motion, not just speech? People interested in that, in whatever shape or form, are people we want to talk to.

How are you thinking about monetization? Is it a software licensing model, a SaaS subscription, or something else?

It's effectively a software licensing model per robot. We have a distribution deal directly with Unitree, whose G1 is by far the best-selling humanoid robot in the world. So we monetize on both sides: selling to end customers on a license basis, and getting a significant cut of each Unitree robot sold with our software.

Let’s zoom out to the next ten years: do you envision a world where most robots have a social layer, and Lightberry is the operating system powering it?

Exactly. The way we're going to interface with robots, particularly humanoids and robots in people-facing functions, is going to be voice. That's the primary interface we use with each other.

Every GUI you've ever used, every move from command lines to touchscreens – all of this exists because there was no good verbal interface to interact with machines. But when you talk with another person, you talk. It's how we're trained to communicate from very early on.

So yes, I think we're going to communicate with robots the same way we do with people. We're doing this cross-platform, and Lightberry can be the OS for that – the same way Windows or macOS was for the PC.