AI + Open Source: A Philosophy of Changing Expectations
Slides
Speaking notes
Slide 1
Among the things I do at Google include Open Source and AI DevRel and I actually founded the TensorFlow Developer Relations team nine years ago, so I’ve been passionate about this space for a long time. I doubt I’m the most expert in the room on AI or Open source, but I know enough to get myself into some trouble.
Some of y’all are the experts in this particular subject of AI and Open Source, but I suspect many of you haven’t jumped in the deep end just yet. I’m here today to get more Open Source Experts, like you, involved in this conversation because we need you to help land it well and land it quickly. And it’s tricky, because open-source concepts can’t always be directly applied to AI systems. I’m gonna go into a bit more detail and I hope when I’m done, you’ll be excited to start play more with open models and also to join the OSI community-driven workstream, the “Deep Dive in AI”.
Slide 2
For all of you, I barely need to spend time on this slide. We all know the power of open source licenses that give the users creative autonomy that drives creativity and collaboration and it can be modified to fit custom, unique use-cases.
Slide 3
Google believes in the power of open technology. But what does it mean to be open? Well, we know this from the open source definition. Open source should allow derivative works AND innovation from first principles. This latter point is where it gets challenging with AI. I personally, assumed at the start that for open source AI, we would need to have a full dataset and full access to all training code to permit innovating from first principles (not just derivative works). But I’m pretty sure I was wrong about that. After examining things in more detail and having conversations with folks more expert in open source and AI than I… the reality is a bit more complex and complicated.
Slide 4
For these open source models, the areas you’d want to consider are the source Data, the Code used to train the model, and the resulting weights. Because these models use a tremendous amount of data, it may not be easily packaged. Also, the code is sometimes tuned for proprietary systems, so that’s not particularly useful more broadly. And then, of course, training on the full dataset could require resources well beyond the reach of most open source developers. So, even if this code was shared, it wouldn’t actually have the desired effect of allowing people to innovate from first principles.
If you were to ask me today, perhaps more interesting would be a definition of the data with samples and an example implementation of the training code that runs on OSS frameworks. I’m not sure if that’s exactly it, but I think that’s where our conversation is today. Oh, and a couple more nuances… When it comes to AI, we’re also seeing rapidly evolving policy and case law around copyright that makes a difference.
Slide 5
This is a selfie… No, it’s not a selfie of me. But it is a selfie… This is Naruto, a macaque who took their own picture with a camera left unattended by a photographer. It was held that the monkey did not possess legal standing to sue for copyright infringement and is now cited in reference to works by AI. We haven’t reached the end of this story, but it’s another example of the complexities at play when we consider licenses
Slide 6
And, of course, while open source offers many benefits, it also presents some novel challenges around safety and misuse
Slide 7
Why does this matter now? This is the most accessible, easiest, available, ubiquitous, and approachable AI has ever been. AI is in most of the things you already know and love, like traditional models running on your phone’s camera, Generative AI usage across business spectrum, and so much more. But perhaps more importantly, this is also true for developers. I’ll give you an example. You might think it’s crazy to throw in a demo at the end of a 10 minute talk, but it proves my point nicely.
Slide 8
As for next steps, well we keep working together. Collaboration is the key to developing safe and responsible AI. Investing in research, Developing safety tools tailored for openly available AI, Collaborating with policymakers, and engaging with the cybersecurity community.
Slide 9
We believe that by sharing Gemma models and fostering a diverse community, we can collectively advance the field of AI. These Gemma models are released as open models, providing free access to model weights while ensuring responsible usage through specific terms of use. And, by working together as a broader Open Source community on the definition of Open Source for AI, we’ll find the right way to ensure this field continues to drive innovation, creativity, and collaboration.
Slide 10
Thank you.
Enable Two-Factor Authentication for SSH on Ubuntu
tl;dr
- Open an ssh connection to your server and keep it open until you’ve verified the installation is working properly.
- Install the Google PAM module
- Configure for your non-root login user via
$ google-authenticator
- Update
/etc/pam.d/common-auth
withauth required pam_google_authenticator.so nullok
- Update
/etc/ssh/sshd_config
withChallengeResponseAuthentication yes
. You will likely just need to change an existing line fromno
toyes
. - Restart the ssh daemon with
$ sudo service sshd restart
- Test in another terminal window so you don’t lock yourself out if it doesn’t work for some reason.
Background
I’m a huge fan of Multi-factor authentication (MFA) because I simply don’t think passwords provide enough security. Perhaps we can live by the rule of ‘no one is likely to target my out-of-the-way account’ for a little while. But eventually you’re gonna have an account that’s worth targeting. As such, you might as well make a habit of using stronger security practices across the board. Of course, this means spending a bunch of time adding additional factors of authentication to all your accounts, systems, and data. This is a non-trivial amount of time and effort–most services provide the ability nowadays, but it’s up to you to hunt the setting down and enable it. As for setting it up on your own systems, that’s another story…
In this article we’ll look at adding MFA to your Linux system. Up until recently, adding two-factor on Linux was a pain and if done wrong could leave you without remote access. It’s no fun making a simple mistake and then having to get on the road to the data center to log in locally.
The Google PAM module is now in the Ubuntu package repository and the config files are straight-forward to update. Installation should only take a few moments and should be easy enough to roll-back if it fails for some reason.
Installation
Assumptions
- You have your own Linux server with root access
- You are able to log into your ssh server and maintain a connection across ssh daemon reboots
- The following steps were tested on Ubuntu 18.04 and should be similar on most recent Ubuntu installations.
Google Authenticator app
Install this on your phone (if you don’t already have it) from the Google Play Store or the Apple Store.
Google PAM module
The first step is to install and configure the Google PAM module.
# Login to your server
$ ssh username@example.com
# Update and upgrade your packages for good measure
$ sudo apt-get update && apt-get upgrade
# Install the PAM module
$ sudo apt-get install libpam-google-authenticator
Assuming you’re not way behind of your updates, this should be a quick few minute process with little propmting. Next, let’s config the PAM module for the current user (make sure you were using sudo
not su
).
$ google-authenticator
You will then be prompted to set tokens as time-based or not. Choose yes. They are more secure.
Do you want authentication tokens to be time-based (y/n) y
You’ll then get a few things printed to your terminal:
- QR Code - Scan this with the Google Authenticator app on your phone right now to add the token generator for this server to your list.
- Secret Key - As an alternative to the QR Code, you can instead type this secret key into the app.
- Verification code - The first verification code that should be generated. Compare it to the first code you get from the app. They should be the same.
- Backup codes - Save these somewhere encrypted, you use them if you lose your phone.
Time is still not a perfectly tracked thing. Although this slightly reduces your security, go ahead and allow a little fuzziness.
By default, tokens are good for 30 seconds and in order to compensate for
possible time-skew between the client and the server, we allow an extra
token before and after the current time. If you experience problems with poor
time synchronization, you can increase the window from its default
size of 1:30min to about 4min. Do you want to do so (y/n) y
And finally, let’s rate limit login attempts to 3 times every 30s.
If the computer that you are logging into isn't hardened against brute-force
login attempts, you can enable rate-limiting for the authentication module.
By default, this limits attackers to no more than 3 login attempts every 30s.
Do you want to enable rate-limiting (y/n) y
Configure your system for MFA
You’ll need to update your auth system to require Gooogle Authenticator for those users who have configured it. To do so, edit /etc/pam.d/common-auth
and include the following line at the end:
auth required pam_google_authenticator.so nullok
A couple of notes here:
- The
nullok
option is what allows users who haven’t yet configured Google Authenticator to log in without it. - If you want to require MFA for a window manager on a desktop machine, you’ll need to enable it in additional files such as
/etc/pam.d/gdm
or/etc/pam.d/common-session
.
Next up, you’ll need to enable your SSH daemon to ask for the second factor. That’ll be in /etc/ssh/sshd_config
. Look for this line:
ChallengeResponseAuthentication no
…and update it (or add it):
ChallengeResponseAuthentication yes
Activate and test
Okay, so here’s the moment of truth. Are you ready? If you answered yes, then that means you are keeping a logged-in SSH session open in one window and you have another ready to go for the actual test. You’ll want to do this in case something is wrong and you’re not able to log in.
With that somewhat silly, but very important, disclaimer out of the way. Let’s restart the SSH service:
$ sudo service sshd restart
Then, in your other, fresh, terminal window, log back in.
```bash
$ ssh username@example.com
Password:
Verification code:
Recovery
Let’s say there’s a failure of some kind. Here are some recovery states you can take.
Lost your phone
But you saved your recovery codes, right? Log back in with one of them, then reconfigure Google Authenticator:
$ rm ~/.google-authenticator
$ google-authenticator
Got a new phone
But you saved your QR Code or secret key, right? Just fire up the Google Authenticator app on your new phone and add your server to the list with that.
Lost all the things
Really? You lost your phone, recovery codes, secret key, and QR code? This is really not your day. *hugs*
Local machine
- Boot into recovery mode
- Reconfigure Google Authenticator as per “Lost your phone” above
Remote machine
- Log in with root or user with
sudo
privledges that isn’t set up for MFA - Remove target user’s
.google_authenticator
file - Log back into that user’s account and setup Google Authenticator
Remote machine, but everyone is set up for MFA
- Drive to the data center
- See “Local machine”
Conclusion
Congratulations! Your data and the world is safer because of your choice to use MFA. Be proud and share this article so others can be as security-cool as you. =]
Page 3 of 61
subscribe via RSS