Hasan's Post

Tutorial repository

View on GitHub
19 July 2021

Week1 classnotes

by Hasan

This is my class Note of courseara Specialisation of Machine learning in Production. This is the 2nd course of the specialization. This course is divided into four weeks. This is the class notes of week1

1. Introduction to Machine Learning Engineering in Production

Overview

ML code fig
diff

So actually production ML is a mixture of Machine learning development + Modern Software development

Managing the entire life cycle of data

Modern software development

Yes off course previous things are important, but you are putting a product into development, so you need to consider all the precaution which is necessary for a software deployment. So additionally following things need to be consider.

Production machine learning system

  1. Scoping

    • Define project and goal
    • Resources
  2. Data

    • Define data and establish baseline
    • label and Organize data
    • Sometimes also require human label performance and baseline definition.
  3. Modelling

    • Select and train model
    • Perform error analysis
  4. Deployment

    • Deploy in production
    • Monitor and maintain system performance

We actually live a world where everything changes dramatically. So may be there will be time when degradation of your model performance will occur. So continuous monitor the model performance will help you to detect the deterioration of model. Then need to again go to Modelling part of the model.

ML_prod_cycle

Challenges in production grade ML

ML Pipelines

Directed acyclic graphs

dags

Pipeline orchestration frameworks

orchestration

Tensorflow Extended (TFX)

seq
tfx_c

TFX Hello World

hello

2. Collecting Data

Importance of Data

ML: Data is a first class citizen

Everything starts with data

Data Pipeline

data_c

Data Collection and Monitoring

data_m

Key points

Example Application:Suggesting Runs

Key considerations

ex

Get to know your data

Dataset issues

Measure data effectiveness

Translate user needs into data needs

Key points

Responsible Data: Security, Privacy, \& Fairness

Outline

bias_

Data security and privacy

User privacy

ML systems can fail users

Commit to fairness

Reducing bias: Design fair labeling systems

Types of human raters

Key points

3. Labelling data

Case Study: Degraded Model Performance

Case study: taking action

What causes Problems ?

Gradual Problems

  1. Data changes
  1. World changes

Sudden Problems

Why “Understand” the model

Data and Concept Change in Production ML

Outline

Detecting problems with deployed models

Easy problems

Harder problems

Really Hard problems

Key points

Process Feedback and Human Labeling

Data Labeling

Why is labeling important in production ML?

Direct labelling: continuous creation of training of training dataset

feed

Process feedback -advantages

Process feedback -disadvantages

Process feedback- Open-Source log analysis tools

Process feedback- Cloud log analytics

Human Labeling

Human Labeling Methodology

Human labeling- advantages

Disadvantages

Key points

4. Validating Data

Detecting Data issues

Outline

Drift and skew

dex
con_

Detecting data issues

Detecting distribution skew

dis_skew

Skew detection workflow

skew_detection_workflow

Tensorflow Data Valiadation (TFDV)

Please see this tutorial

tags: