Open Source AI: Opportunities and Challenges
Source: Irving Wladawsky-Berger, CogWorld Think Tank member
“Since the 1980s, open source has grown from a grassroots movement to a vital driver of technological and societal innovation,” said “Standing Together on Shared Challenges,” a report published by Linux Foundation Research in December of 2023. “The idea of making software source code freely available for anyone to view, modify, and distribute comprehensively transformed the global software industry. But it also served as a powerful new model for collaboration and innovation in other domains.”
The report is based on the Open Source Congress that was convened by LF Research in July of 2023 in Geneva to discuss the key challenges facing the open source community, such as safeguarding critical open source infrastructures and ensuring that new regulations are compatible with open source principles and practices. In addition, the Open Source Congress explored the opportunities for open source collaborations on important new topics, in particular, artificial intelligence. Given my interest in “How the Original Promises of the Open Source Movement Now Apply to AI Systems,” I’ll focus my discussions below on the report’s section on AI.
“Does AI Change Everything? What is Open?” is the theme of the report’s AI module. “For the OSS community, AI presents an array of opportunities and challenges. The community leaders assembled in Geneva discussed the need to align on a definition of open AI, as well as on the challenges created by AI-enabled code generators for licensing, security, and intellectual property. Finally, Congress participants reflected on the broader societal impacts of AI and the role of the open source community in addressing issues such as bias, privacy, and existential threats to humanity.”
Let me summarize some of the key points raised in the report’s AI module.
Openness in AI entails more than just access to the source code
The definition of open-source software (OSS) has been generally accepted for about 25 years: “OSS is software with source code that anyone can inspect, modify, and enhance.” The Open Source Definition points out that beyond access to the source code, OSS must comply with a few additional criteria, including free redistribution, derived works, and distribution of license.
But open AI is a different story because AI systems don’t behave like traditional software. Open-AI systems require distinct definitions, protocols, and development processes from those used in open-source software. While software played a central role in the evolution of IT systems over their first few decades, data has played the central role in the advances of AI over the past 20 years. Data is not merely the fuel for AI systems, but the determining factor in the system’s overall quality.
One of the key challenges with AI systems is their lack of transparency. It’s very difficult to explain in human terms the results of an AI decision. AI systems have huge numbers of parameters within their complex neural networks, making it nearly impossible to assess the contributions of individual nodes to its overall statistical decision in terms that a human will understand. “Merely looking at the source code in AI does not necessarily explain or shed light on why AI systems generate the outputs they do. Even AI developers concede that they cannot readily explain the outputs of AI systems they are developing.”
“Explainability is important because it increases the trustworthiness, safety, and accountability of the systems that increasingly shape life-changing decisions such as diagnosing disease or deciding who gets access to credit.” But, said the Geneva Congress participants, this is difficult because it requires “understanding the model architecture, including the weights applied to different variables in the models and the types of data used to train it. Unfortunately, the more sophisticated an AI system becomes, the harder it is to pinpoint exactly how it derived a particular insight.”
AI-generated code will create challenges in open source licensing, security, and regulation
A recent McKinsey study on “The Economic Potential of Generative AI,” estimated that generative AI could improve the productivity of software development by between 20% and 45% of the current annual spending on the function. GitHub estimated that such AI-based productivity improvements could increase global GDP by $1.5 trillion by 2030, as coding tasks that might have taken hours could now be completed in seconds.
“AI code generators have been trained on massive datasets, including the vast open source code libraries hosted on GitHub and other platforms,” noted the LF Research report. “The upside is that open source repositories include a wealth of diverse code written by developers around the globe. These repositories encompass a vast array of programming languages, paradigms, and application domains, rendering them a rich and exhaustive wellspring of real-world code for training AI models.”
But, there is a serious downside, as the Geneva Congress participants pointed out. “The growing use of AI code generators creates a series of challenges related to licensing, security, and regulatory compliance. These challenges stem from the lack of provenance associated with the code generated by AI models. … As a result, it can be challenging to ascertain whether the generated code is proprietary, is open source, or falls under some other licensing scheme. This opacity creates a risk of inadvertent misuse of proprietary or licensed code, leading to potential infringement issues.”
Systemic risks from AI require an urgent, open source response
In addition, “Congress participants warned that the growing influence of AI has given rise to new risks and ethical considerations related to bias, transparency, privacy, job displacement, and existential threats to humanity.” Let me briefly discuss each of their key warnings.
Bias and discrimination. Stanford’s 2022 AI Index Report found that while LLMs are more capable than ever, they’re also more prone to reflect the toxicity and bias present in the huge amounts of data they’re trained on, and to amplify society’s prejudices and power structures. “Artificial intelligence systems trained on historical loan data, for example, could perpetuate discriminatory lending practices, resulting in unequal access to credit or loans for marginalized groups. Predictive policing systems that use AI algorithms to identify crime hotspots and allocate police resources have been criticized for disproportionately targeting minority communities.”
Intentional misuse of its capabilities for malicious purposes. It’s not at all surprising that a technology as powerful as AI in already being misused for all kinds of malicious purposes. As the 2023 AI Index Report noted: “The number of newly reported AI incidents and controversies in the AIAAIC database was 26 times greater in 2021 than in 2012. The rise in reported incidents is likely evidence of both the increasing degree to which AI is becoming intermeshed in the real world and a growing awareness of the ways in which AI can be ethically misused. The dramatic increase also raises an important point: As awareness has grown, tracking of incidents and harms has also improved — suggesting that older incidents may be underreported.”
Existential threats to humanity. When looking at its likely long-term evolution, AI may well be in a class by itself because of the serious concerns that have been raised about the potential impact of machines that may equal or surpass human levels of intelligence. Such a prospect is accompanied by fears that an increasingly powerful, out-of-control AI could eventually become an existential threat to humanity, although the potential timeline for such superintelligence is a subject of considerable speculation and debate.
“What are the chances of an AI Apocalypse,” asked The Economist in an article published in July, 2023. To quantify these fears, the article compared the predictions of superforecasters, — “general-purpose prognosticators with a record of making accurate predictions on all sorts of topics, from election results to the outbreak of wars,” — with the predictions of domain experts in a number of existential risk domains including AI, nuclear war and pandemics.
“The most striking conclusion of the study was that the domain experts, who tend to dominate public conversations around existential risks, appear to be gloomier about the future than the superforecasters,” said The Economist. “The median superforecaster reckoned there was a 2.1% chance of an AI-caused catastrophe, and a 0.38% chance of an AI-caused extinction, by the end of the century. AI experts, by contrast, assigned the two events a 12% and 3% chance, respectively.”
Fears that a super-intelligent AI could pose an existential threat to humanity should not be our main AI concern at this time, — and possibly ever. AI concerns should be focused on the regulatory, legal, and geopolitical arenas, where we’re already seeing a number of here-and-now AI issues.
In conclusion, “the consensus emerging from discussions in Geneva is that openness in AI provides a better pathway to addressing AI’s weaknesses and challenges. … As the underlying technologies mature, AI will perform an ever-increasing array of tasks that today require human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. Groundbreaking applications in domains such as healthcare, transportation, public administration, finance, education, and entertainment will give rise to significant social and economic benefits but also considerable risks.”
“As companies race to deploy and monetize a new generation of AI technologies, it would be prudent for all stakeholders involved in AI development to commit to ethical guidelines or principles for AI development that promote transparency, accountability, fairness, and the responsible use of AI technology, ensuring that AI systems align with human values and societal well-being. Above all, a commitment to open source approaches would ensure that AI is deployed in a manner that aligns with human values, safeguards human rights, and promotes the overall well-being of society.”
Irving Wladawsky-Berger is a Research Affiliate at MIT's Sloan School of Management and at Cybersecurity at MIT Sloan (CAMS) and Fellow of the Initiative on the Digital Economy, of MIT Connection Science, and of the Stanford Digital Economy Lab.