KDD20 Tutorial: Practical Automated Machine Learning with Tabular, Text, and Image Data

Automated machine learning (AutoML) offers the promise of translating raw data into accurate predictions with just a few lines of code. Rather than relying on human time/effort and manual experimentation, models can be improved by simply letting the AutoML system run for more time.

In this hands-on tutorial, we demonstrate fundamental techniques that enable powerful AutoML. We consider standard supervised learning tasks on various types of data including tables, text, and images. Rather than technical descriptions of individual ML models, we emphasize how to best use models within an overall ML pipeline that takes in raw training data and outputs predictions for test data. A major focus of our tutorial is on automating deep learning, a class of powerful techniques that are cumbersome to manage manually. Each topic covered in the tutorial is accompanied by a hands-on Jupyter notebook that implements best practices.

Most of this code is adopted from AutoGluon, a recent AutoML toolkit that makes it easy to translate your data into highly accurate models: autogluon.mxnet.io

Information

Tutors: Jonas Mueller, Xingjian Shi, Alex Smola (Amazon Web Services)

Contact: Jonas Mueller

Live Q&A: August 24, 2020: 1-4pm (PST)

Video presentations: https://www.youtube.com/playlist?list=PLlGlURKFtW6jfdjxBoZyYrr1Lm0QzeS7h

Setup Instructions

Before running the hands-on tutorials on your own machine, please install AutoGluon (and subsequently make sure you have version 0.0.13). You’ll also need to have installed MXNet by following this guide. Tutorial #7 also requires you to install Pytorch and torchvision.

A Linux machine with GPU is recommended, although you should be able to easily run the tabular data tutorials (#1-4) on a Mac laptop as well. All tutorials should be run in either Python 3.6 or 3.7.

See here for setup instructions on a Sagemaker instance.

Hands-on Tutorials

1. AutoML with Tabular data - Using AutoGluon

2. AutoML with Tabular data - Data Processing

3. AutoML with Tabular data - Training Models & Ensembling

4. AutoML with Tabular data - Inference

5. AutoML with Image data - Using AutoGluon

6. AutoML with Image data - Hyperparameter Optimization

7. Tuning your own models

8. AutoML with Text data - Using AutoGluon

9. AutoML with Text data - Customize Search Space and HPO

10. AutoML with Text data - Mixed Data Types