on Playwright

Playwright is a well-designed tool that enables a model like Large Action Model (LAM) to control consumer web apps. The majority of AI companies offering such services to enterprise and consumer customers, such as Adept, either use Playwright or similar alternatives like a Chrome extension installed on a user's desktop browser. We'd like to take this opportunity to illustrate various design choices we've made to make best use of Playwright in our architecture, among other security, performance and algorithmic considerations.

We recognize the innate risks that building a platform for other people to completely control introduces. Because of this, our design calls for clear boundaries between where rabbit's core services (the things that provide the core features of LAM to thousands of users) execute and where tasks are performed on behalf of the user (the login and execution container). Each user's containers are isolated from each other, and also from LAM. When you look at the code running on the containers, you'll find very little there, as we expect any registered user to have full control over the login and execution containers that host the apps. Hence, not much business logic would be expected to reside in a place we expect everyone to have access to.

on data security

The collection and storage of credentials have been a topic of active discourse for a very justified reason, as it relates to the fundamental concept of trust in one's digital life. OAuth for traditional architecture has proven to be a battle-tested standard that receives wide adoption. However, agent authentication and authorization is a modern concept that hasn't fully matured; and we're doing our best to achieve things no one else has, with tools that we're building as we need them. To illustrate in practice: When you visit the rabbit hole and click "Connect Account", we connect you to a login container. Once you've authenticated, we:

  1. Extract your session cookie
  2. Tokenize it with our privacy vault partner, Piiano (you can read more about their thoughts on privacy, tokenization, and security here)
  3. Store the token in our database

The tokenization service is configured with multiple layers of access controls, ensuring that even if our database was leaked, your cookies would still be vaulted, and attackers would only have access to a random string of characters.

As we prepared for launch, we considered other potential threats and came up with corresponding mitigations:

  • Threat: "An attacker logs into the execution container and is able to gain persistence. They install a keylogger in an attempt to steal other customer credentials."
  • Mitigation: Every container is provisioned from a base container that is built using our CI/CD pipelines. These containers are dynamically provisioned and assigned when a user connects to the service. Once a login cookie is extracted, the container is terminated. This ensures that a container is never shared or re-used. When a session is terminated, so are the containers that support it.
  • Threat: "An attacker could use their access to a container as a “pivot” point to move laterally and compromise other containers, including other customers or core services."
  • Mitigation: We've followed industry best practices concerning Kubernetes cluster hardening. This includes ensuring that high-risk pods do not have privileged access or network connections outside what is required to operate.

As a natural process of evaluating trade offs when we launch a new product, we considered the choices of whether our infrastructure can be deployed without further hardening the containers, and whether we should launch without a secure way to store the credentials. The answer to both of these questions was no, and we decided that even if we ship fewer features, we had to harden the containers and we had to ensure cookies were stored in a secure way. This has paid dividends over the last few weeks, because even though there have been lots of attempts, we have yet to find evidence of a compromise of our core systems. We are also looking forward to collaborating with the security community on their findings through a vulnerability disclosure program published on our website.

Our goal remains unchanged - to build the first-ever privacy-focused, AI device that consumers can trust to act on their behalf.

on Large Action Model

We are a small team and we always strive to provide the best experience possible to our users while being transparent. We have summarized, from a high level, parts of our system in the patent https://patents.google.com/patent/US11908476B1. Figure 4A may be especially helpful to understand the security perimeter, where label 420 (Agent Host) is what individual users are allowed to access (e.g., the windows that show up during log-ins, or the instances running in the cloud that serve user music.). The “interface model”, or LAM, is encapsulated on hosted infrastructure by rabbit and issues commands to drive the UI, such as Playwright in this particular case.

r1's LAM features currently support various web apps and have served over 40k requests for tens of thousands of customers. As the first company building consumer agents at scale, we've encountered various challenges that are unique and beyond any existing architecture patterns. We've overcome them by proposing innovative technical solutions and collaborating with our cloud partners. Thanks to the relative maturity in automated web testing in virtualized environments, we are able to build one of the largest clusters of secure consumer web apps driven by LAM (~2000 pods in a single cluster) in a relatively short time. We are gradually expanding our support to more sophisticated platforms like Android, where LAM is able to directly operate Android apps within our hosted infrastructure. Such endeavors contain their own challenges at scale from nested virtualization to resource efficiency. We are excited to share our journey, discoveries, and other valuable lessons we've learned in time. Naturally, when we are controlling apps on these alternative platforms, we choose the most efficient ways for LAM to issue commands: in Android's case, this is a combination of x11vnc and proxied adb commands.

The rapid development of AI has allowed us to explore the trade-off between fully neural systems that are capable, generalized, slow, and unstable, and fully symbolic systems that are rigid, domain-specific, stable, and efficient. While the majority of consumer applications are structured enough such that a state machine can help drive the majority of its functionalities, the arrival of generative UI may advocate for a vision-dominated neural agent to dynamically take actions as the interface morphs. The various systems we have built for LAM (to tune it, to run it) give us the freedom to adapt with the rest of the community.

Thank you so much for the trust you've shown us by inviting us into your digital life. We look forward to providing more updates and insights as things progress.