Is it okay to lie to the AI Agent?

One of the hardest parts about building AI agents in your org is that you cannot rely on them to keep private information hidden across boundaries.

For example if one agent has access to both your DM's and a public channel on slack (or Zulip) and it's able to read/write to both places, then you face a real risk of your private messages getting leaked into the public channel. It could be by pure mistake of the model. Even with good prompting you just can't really guarantee it won't get confused.

This confusion problem is solvable by better models and better security frameworks. But even when this happens, there is an interesting social fact:

It's socially acceptable to lie and mess with an AI agent, but it's not acceptable to lie to a human being, we call that fraud.

At WindBorne, the person who runs HR is named Alyssa. Sometimes, an employee will ask me for permission on something, and I will say something like "I'm good with that, you can screenshot this message and send it to Alyssa so she knows". Technically someone could just doctor a screenshot of me saying that and send it to Alyssa and get some kind of sensitive information from it. But if they did, that would be fraud, it's NOT socially acceptable and they would be fired immediately with possible legal consequences depending on what they did.

Meanwhile, people have no problem messing with SHODAN¹ and pretending to be me to see if they can get it to break through its prompting and confuse it. And I don't blame them! I do that too. It's a software system and so it's good to naturally poke and prod at it and see what happens. But this also poses a big security problem. Imagine you had another human at the other side of the LLM API. Very good chance you could confuse them too.

At some point in the future, I may go "hey everyone, you need to treat the AI agents as if they are one of us" but I haven't gone that far into the sci fi reality just yet. And even if I do that, I can't have the same expectations of honesty to the AI that I do to a person, as a new hire might miss that message, and it's not the societal norm yet.

So where are we headed as a society?

SHODAN is the name of WindBorne's primary embedded AI agent