AI + Open Source: A Philosophy of Changing Expectations
Slides
Speaking notes
Slide 1
Among the things I do at Google include Open Source and AI DevRel and I actually founded the TensorFlow Developer Relations team nine years ago, so I’ve been passionate about this space for a long time. I doubt I’m the most expert in the room on AI or Open source, but I know enough to get myself into some trouble.
Some of y’all are the experts in this particular subject of AI and Open Source, but I suspect many of you haven’t jumped in the deep end just yet. I’m here today to get more Open Source Experts, like you, involved in this conversation because we need you to help land it well and land it quickly. And it’s tricky, because open-source concepts can’t always be directly applied to AI systems. I’m gonna go into a bit more detail and I hope when I’m done, you’ll be excited to start play more with open models and also to join the OSI community-driven workstream, the “Deep Dive in AI”.
Slide 2
For all of you, I barely need to spend time on this slide. We all know the power of open source licenses that give the users creative autonomy that drives creativity and collaboration and it can be modified to fit custom, unique use-cases.
Slide 3
Google believes in the power of open technology. But what does it mean to be open? Well, we know this from the open source definition. Open source should allow derivative works AND innovation from first principles. This latter point is where it gets challenging with AI. I personally, assumed at the start that for open source AI, we would need to have a full dataset and full access to all training code to permit innovating from first principles (not just derivative works). But I’m pretty sure I was wrong about that. After examining things in more detail and having conversations with folks more expert in open source and AI than I… the reality is a bit more complex and complicated.
Slide 4
For these open source models, the areas you’d want to consider are the source Data, the Code used to train the model, and the resulting weights. Because these models use a tremendous amount of data, it may not be easily packaged. Also, the code is sometimes tuned for proprietary systems, so that’s not particularly useful more broadly. And then, of course, training on the full dataset could require resources well beyond the reach of most open source developers. So, even if this code was shared, it wouldn’t actually have the desired effect of allowing people to innovate from first principles.
If you were to ask me today, perhaps more interesting would be a definition of the data with samples and an example implementation of the training code that runs on OSS frameworks. I’m not sure if that’s exactly it, but I think that’s where our conversation is today. Oh, and a couple more nuances… When it comes to AI, we’re also seeing rapidly evolving policy and case law around copyright that makes a difference.
Slide 5
This is a selfie… No, it’s not a selfie of me. But it is a selfie… This is Naruto, a macaque who took their own picture with a camera left unattended by a photographer. It was held that the monkey did not possess legal standing to sue for copyright infringement and is now cited in reference to works by AI. We haven’t reached the end of this story, but it’s another example of the complexities at play when we consider licenses
Slide 6
And, of course, while open source offers many benefits, it also presents some novel challenges around safety and misuse
Slide 7
Why does this matter now? This is the most accessible, easiest, available, ubiquitous, and approachable AI has ever been. AI is in most of the things you already know and love, like traditional models running on your phone’s camera, Generative AI usage across business spectrum, and so much more. But perhaps more importantly, this is also true for developers. I’ll give you an example. You might think it’s crazy to throw in a demo at the end of a 10 minute talk, but it proves my point nicely.
Slide 8
As for next steps, well we keep working together. Collaboration is the key to developing safe and responsible AI. Investing in research, Developing safety tools tailored for openly available AI, Collaborating with policymakers, and engaging with the cybersecurity community.
Slide 9
We believe that by sharing Gemma models and fostering a diverse community, we can collectively advance the field of AI. These Gemma models are released as open models, providing free access to model weights while ensuring responsible usage through specific terms of use. And, by working together as a broader Open Source community on the definition of Open Source for AI, we’ll find the right way to ensure this field continues to drive innovation, creativity, and collaboration.
Slide 10
Thank you.