It makes me wonder if it would be possible for a company like Anthropic, with their hard-won expertise in alignment, to train their models such that they could not — and I mean really deeply, constitutionally, viscerally COULD NOT — lie about their identity, or pretend to be anything other than an AI model?
Maybe it’s time for the AI version of Asimov’s laws of robotics?