My take on AI safety

February 2023

This is my uninformed perspective as an engineer in the LLM space. Others have thought much more deeply about this than me, so you shouldn’t listen to me.

The mainstream seems to believe the primary risks of AI are fake content and unemployment. I think these are real problems, but relatively easy to manage. Cryptographic digital signatures should provide an antibody to fake content, and wealth redistribution methods should help those who’ve lost their economic value. I suspect in the future, people will be wondering why they cared so much about making money rather than simply doing what they enjoy.

Some researchers focus on existential risk. The framing I've seen most often is that it may be impossible to align superintelligence, so whatever goal it ends up with will likely involve killing us as a subgoal due to instrumental convergence. While this seems plausible to me, I haven’t found an argument for it that I understood. It seems to me that if you took a good, empathetic human and made them a billion times more intelligent, they would still avoid hurting humans. After all, we're able to feel abundant empathy towards the mentally disabled; we don't kill them when they get in the way of our goals. It’s unclear to me why a superintelligent AI trained by humans will have different values than a superintelligent human.

Regardless, I think existential risk will come long before AI is smart enough to unalign itself with its developers, because some developers themselves will be unaligned. We’ve already had close calls with nuclear weapons and synthetic viruses. I think the reason those weapons haven’t ended the world yet is because they’re hard to use tactically—it’s hard to get them to accomplish what a group of powerful people actually want. Not many people want a city reduced to rubble, and we can’t control what a virus is going to mutate into as it spreads. AI is considerably more useful. You can use it to manipulate and subjugate precisely the subset of society you want to target.

There will almost certainly be more good people using AI than bad people using AI, so they’ll be trying to stop the abusers. But this doesn’t bring me much comfort because it’s far easier to create destruction than it is to stop it—again, consider nuclear weapons and synthetic viruses.

I have no solutions! Would love to discuss with anyone who can think of solutions or poke holes in my logic.