I really like seeing more conversation about this kind of misalignment. Good vs good at is one kind, actually true vs plausible/engaging is another dimension where tradeoffs are being made that are not in the direction of improving the world.
One of the properties of Goodness is that it is holistic--an ecosystem of values where the relationships between the parts are as if not more important than the parts themselves. Any attempt to define goodness, while perhaps useful as an approximation (a simulation of the ecosystem), will necessarily leave parts out, and because of the primacy of the interconnections, those omissions will ripple back and eliminate the capacity of even the included parts to matter, like a trophic cascade.
There is a saying in computer science that any problem can be solved by adding another layer of abstraction. Perhaps that idea can be applied to AI. Impossible to list all the good and bad behaviors? Go up a level of abstraction and instead list the values that generate those behaviors. Impossible to list all the values? Go up a level of abstraction and search for the underlying generator of what makes anything valuable (IRL & CIRL are examples of trying to do this). But there is a trade-off here between concision and vagueness, which is to be expected since the one problem abstraction cannot solve is too much abstraction.
Yeah, I like this line of thinking. To my mind the obvious next question to ask is "what's the structure of this failure mode for human organizations?", because that seems like the root dysfunction that's making the AI situation so hard to deal with. Here's an angle, albeit an unfortunately bleak one:
When it comes to human organizations the simple goals that appeal to strong human tendencies are the ones that are easiest to coordinate around (e.g. wealth, power, etc.), but unfortunately the simple goals are inhuman and often inhumane.
Making an organization directed towards the Good would require its members to regulate each other's behavior in ways that keep them aligned with the Good. But this is a really tough equilibrium to maintain, even when the people's conception of the Good is similar! For example, maybe people aren't quite going to see eye to eye on whether person X was acting out of line that one time, and a big political mess can ensure.
This sort of coordination failure seems to happen to non-profits much more than it happens to for-profit corporations. "Make money" is a *way* simpler and less subjective goal than "be good". Sure, at a for-profit, there might some subjectivity about whether, say, the CEO's latest gambit is a genius move that will eventually make the company money or whether he's lost the plot. But at least that caches out to a disagreement about an empirical prediction, and the CEO knows he's eventually going to get fired if he loses a lot of money.
In fact, organizations that start off being Good focused (at least nominally) can often collapse into chasing simpler convergent goals like money or power. At the risk of picking an overly politicized example (at least in our part of twitter), Sam Altman's victory over the old OpenAI board seems to me like a prime example of this sort of collapse.
I don't know what if anything can be done about this Good-collapse problem, and I'd be curious what you think!
I agree with the other commenter that this is good. I want very much to quibble but I see where you're going re: getting AI to be good is very hard indeed. For most of the sweep of human (pre?)history, a good man was remembered as such if he was good at helping his team win. I do not want AI that is 'good' in this way, for obvious reasons.
This is good
I really like seeing more conversation about this kind of misalignment. Good vs good at is one kind, actually true vs plausible/engaging is another dimension where tradeoffs are being made that are not in the direction of improving the world.
One of the properties of Goodness is that it is holistic--an ecosystem of values where the relationships between the parts are as if not more important than the parts themselves. Any attempt to define goodness, while perhaps useful as an approximation (a simulation of the ecosystem), will necessarily leave parts out, and because of the primacy of the interconnections, those omissions will ripple back and eliminate the capacity of even the included parts to matter, like a trophic cascade.
There is a saying in computer science that any problem can be solved by adding another layer of abstraction. Perhaps that idea can be applied to AI. Impossible to list all the good and bad behaviors? Go up a level of abstraction and instead list the values that generate those behaviors. Impossible to list all the values? Go up a level of abstraction and search for the underlying generator of what makes anything valuable (IRL & CIRL are examples of trying to do this). But there is a trade-off here between concision and vagueness, which is to be expected since the one problem abstraction cannot solve is too much abstraction.
Yeah, I like this line of thinking. To my mind the obvious next question to ask is "what's the structure of this failure mode for human organizations?", because that seems like the root dysfunction that's making the AI situation so hard to deal with. Here's an angle, albeit an unfortunately bleak one:
When it comes to human organizations the simple goals that appeal to strong human tendencies are the ones that are easiest to coordinate around (e.g. wealth, power, etc.), but unfortunately the simple goals are inhuman and often inhumane.
Making an organization directed towards the Good would require its members to regulate each other's behavior in ways that keep them aligned with the Good. But this is a really tough equilibrium to maintain, even when the people's conception of the Good is similar! For example, maybe people aren't quite going to see eye to eye on whether person X was acting out of line that one time, and a big political mess can ensure.
This sort of coordination failure seems to happen to non-profits much more than it happens to for-profit corporations. "Make money" is a *way* simpler and less subjective goal than "be good". Sure, at a for-profit, there might some subjectivity about whether, say, the CEO's latest gambit is a genius move that will eventually make the company money or whether he's lost the plot. But at least that caches out to a disagreement about an empirical prediction, and the CEO knows he's eventually going to get fired if he loses a lot of money.
In fact, organizations that start off being Good focused (at least nominally) can often collapse into chasing simpler convergent goals like money or power. At the risk of picking an overly politicized example (at least in our part of twitter), Sam Altman's victory over the old OpenAI board seems to me like a prime example of this sort of collapse.
I don't know what if anything can be done about this Good-collapse problem, and I'd be curious what you think!
I agree with the other commenter that this is good. I want very much to quibble but I see where you're going re: getting AI to be good is very hard indeed. For most of the sweep of human (pre?)history, a good man was remembered as such if he was good at helping his team win. I do not want AI that is 'good' in this way, for obvious reasons.