Notes for July 1 2021
A monthly summary of news and notes for MLC@Home
Summary
Happy first birthday to MLC@Home! This project went live on July 1, 2020, and caught on pretty quickly in the BOINC community. We've remained focused on our goal, which is breaking open the black box of neural networks to explain why they make the choices they do. This is so important as machine learning permeates more and more of our everyday life; from autonomous cars, to banking decisions, and medical diagnoses. We need research to understand how to keep bias out of these systems.
We are also the first, and to date only, public machine learning focused BOINC project. This means that while we could leverage the BOINC framework for job management, we have to build most of the ML client infrastructure from the ground up. This hasn't always been smooth, but we've accomplished so much in the past year regardless.
In the past year, we have:
- Received contributions from over 2500+ volunteers and 9200+ hosts
- Processed over 3.4 million BOINC workunits
- Trained over 1.1 million neural networks for analysis over 3 different datasets, the largest datasets of their kind
- Generated over 4.3TB of data for analysis
- Published one academic paper (more coming..)
- Presented at the 2021 BOINC Workshop
- Released 47 client versions targeting 3 different CPU architectures, 2 GPU architectures, and multiple versions of Windows and Liunx.
- Outgrew the initial server within the first few months!
I'm overwhelmed by our community and what we've accomplished together. We've already shown that networks trained with the same data cluster together in weight space, despite the randomness associated with neural network training. We've also shown we can use this clustering to detect networks trained with poisoned data versus clean data, a significant finding in the field.
But there's still soo much more to do! So while we want to acknowledge and celebrate what we've jointly accomplished so far, let's also look forward and set some loose goals for the next year of MLC@Home:
- MLDS will continue near term!
DS4 is (almost) ready and expands the dataset to include CNN network types as well as RNNs used in DS1-3. DS5 will likely vary the shape and size of each network slightly to see if clustering still happens when shape is varies. Future MLDS work beyond DS5 is TBD, but we expect there to be plenty DS4/DS5 WUs for many months to come. We expect to update the paper with the latest runs over the next month.
- We'd like to expand beyond MLDS!
We are the first project to do ML on a BOINC-sized scale. We would like to expand to supporting other areas of research, and want to commit to bringing at least one other ML project online within the next year. Please contact us if you are a researcher who is interested in working with the platform!
- We need to improve the technical side of the project
From the client supporting AMD GPUs and OSX to optimizing utilization of graphics cards to a better validation process for WUs, there's a laundry list of technical issues we'd like to address, and have not done so effectively in the past three months. We're also hitting some corner-cases of the BOINC software stack that are tricky to work around. If you are a developer and want to help, we'd welcome the support.
- We'd like to improve outreach
To get more people involved, we'd like to produce a few short videos about the project, what we've found and how others can help. These should be short, easily accessible, and easy to share. We'd like to produce at least one of these within the next 6 months.
These are loose goals but should give you an idea where we're concentrating our efforts for the next year. If you have further insights, please share them below or on Discord.
Thanks again for supporting MLC@Home, and here to many more years of successful, important research in an important field.
Other News
- DS3 is all but complete (just a last few 130+ trickling in!). I consider DS3 to be the most important dataset and can't wait to run our analysis on the whole thing!
- From now on we'll be blasting DS1 (then DS2) WUs into both the GPU and CPU queues until that completes and/or until DS4 is ready. We'll try to get those over the hump ASAP.
- Some fun news! MLC Discord user Tankbuster has updated our banner graphic! See the updated banner on project and home pages!
- Even more exciting, Tankbuster built a prototype graphics app for MLC@Home! You can see mockups and videos and follow the discussion at the MLC Discord server (link at the bottom). Screenshot:
- Reminder: the MLC client is open source, and has an issues list at gitlab. If you're a programmer or data scientist and want to help, feel free to look over the issues and submit a pull request.
Project status snapshot:
(note these numbers are approximations)
Last month's TMIM Notes: Jun 8 2021
Thanks again to all our volunteers!
-- The MLC@Home Admins(s)
Homepage: https://www.mlcathome.org/
Discord invite: https://discord.gg/BdE4PGpX2y
Twitter: @MLCHome2
Source