Rattibha

I really like this MLOpsy research vision (h/t to @AlexTamkin for sharing with me a while back) so I will take the opportunity to go on my tangential versioning rant: while versioning modeling code & param values (ie checkpoints) is a great step for prod ML, it's also important

to have semantically meaningful diffs in versions during code review. No one can look at a byte diff of a list of 2 float vectors and actually know what the diff is. A natural solution to this is refining continuous integration (CI) for ML software, which, at mature companies,

ends up being this dynamic set of examples for an "evaluation set" that grows when engineers witness new failures or important data points in the wild (can't cite anything rn, just observational from my interview study). Then in PRs, the main diff assessed is eval set performance

What does "meaningful" model versioning unlock? I think we'll be able to iterate on the model / ML parts of codebases a lot more. The data-centric AI movement emerged simply because we currently get bigger gains iterating on the training data rather than model architecture, but

this makes a lot of sense rn bc we have a decades-old history of data eng & management (it is very important to acknowledge this in my opinion!). In the interview study we're noticing that companies with more mature model versioning practices

(e.g., dynamic eval set, tying ML PRs to business outcomes), are actively iterating on the model! For example, one team at one fortune 50 co is making their ML system as deep learning-based as possible to remove individual components in

the pipeline, or get rid of the need to test at a bunch of "transition" points in the pipeline. This made me 🤯 bc the current narrative in many prod ML cases is, "dont do DL in prod bc xgboost is better," but really

the narrative is "you can only make as many changes to your system as your infra can support human-understandable / meaningful versioning practices for all dynamic parts of your system (data, code, models, hardware)." So I really think once we've nailed these versioning

challenges (and other MLOps issues), ML software development is gonna look really different, less volatile / more stable, we'll be iterating on any model/code/data, and yeah maybe I'll be an MLE again?? ✨

Categories

More from this author

Related Threads

Popular Threads

Categories

More from this author

Related Threads

Popular Threads

Unroll Thread