Hasan's Post

Tutorial repository

View on GitHub
19 December 2022

Stable Diffusion fastai course lesson 9

by Hasan

First part

Guidance scale

Negative prompts

Image to image

Finetune

Text Inversion

Dream booth

Second part -> Details of machine learning


flowchart LR

A[Image of 3] --> B[function]
B[function] --> C[0.9]

flowchart LR

A[image of 3 + noise] --> B[function]
B[function] --> C[0.6]

flowchart LR

A[Image of noise] --> B[function]
B[function] --> C[0.02]

Generate an image from a magical function

Where to get such function

Training our neural network (magical function)


flowchart LR

A[Image of 3 training data] --> B[Image of 3 + our noise]

flowchart LR

A[inputs] --> B[Neural Net] 
B[Neural Net] --> C[outputs]
C[outputs] --> D[Loss]
D[Loss] --> E[Update weights]
D[Loss] --> A[inputs]

flowchart LR
A[noisy image] --> B[Neural Net]
B[Nerual Net] --> C[predict noise]

  1. Substract predicted noise from the noisy image
  2. Now in the next step, we will predict the noise again from our neural network.
flowchart LR
A[noisy image] --> B[Neural Net = Unet]
B[Nerual Net =Unet] --> C[predict noise]

flowchart LR
A[noisy image] --> B[Neural Net = Unet]
B[Nerual Net = Unet] --> C[predict noise]
C[predict noise] --> D[Substract] 
A[noisy image] --> D[Substract]
D[Substract] --> E[Actual image]

Variational Autoencoder


flowchart LR
A[Input image of size 512x512x3] --> B[Convolution with stride 2 result = 256x256x6 ] --> C[Convolution with stride 2 result  = 128x128x12] --> D[Convolution with stride 2 result = 64x64x24] --> E[Resnet block result = 64x64x4]

 flowchart TB


    subgraph Decoder
    64x64x4_i-->inverse_conv_1
    inverse_conv_1-->128x128x12_i
    128x128x12_i-->inverse_conv_2
    inverse_conv_2-->256x256x6_i
    256x256x6_i-->inverse_conv_3
    inverse_conv_3-->512x512x3_i
    end

    subgraph Encoder
    512x512x3 --> conv_stride2
    conv_stride2 --> 256x256x6
    256x256x6 --> conv2_stride2
    conv2_stride2 --> 128x128x12
    128x128x12 --> conv3_stride2
    conv3_stride2 --> 64x64x24
    64x64x24 --> resnet_block
    resnet_block --> 64x64x4
    64x64x4 --> 64x64x4_i
    end

    512x512x3_i-->output
flowchart TB

subgraph Autoencoder
 Encoder-->Decoder

end

Latents

Training the U-net

flowchart LR

A[noisy image] --> D[Autoencoder Encoder] --> E[Noisy Latents] --> B[Neural Net = Unet]
B[Nerual Net =Unet] --> C[predict noise]
E[Noisy Latents]  --> F[Subtract] -->C[predict noise]--> G[Actual latents ] --> H[Autoencoder Decoder] --> I[Actual image]
flowchart LR

A[noisy image] --> D[Variational Autoencoder Encoder] --> E[Noisy Latents] --> B[Neural Net = Unet]
B[Nerual Net =Unet] --> C[predict noise]
E[Noisy Latents]  --> F[Subtract] -->C[predict noise]--> G[Actual latents ] --> H[Variational Autoencoder Decoder] --> I[Actual image]

Text insertion


flowchart LR
A[Text e.g. a cute teddy] --> B[Vector of numbers representing a Teddy]

ClIP and contrastive loss

flowchart LR

A[Text] -->B[Neural network model ] -->C[Text embedding vector]
flowchart LR

A[Image] -->B[Neural network model ] -->C[Image embedding vector]
image\Text A gracful swan A cute teddy jeremy howard
image of swan swan text and image embedding dot product cute teddy text + swan image embedding dot product jeremy howard + swan image embedding dot product
image of cute teddy embedding of cute Teddy image + embedding of graceful swan embedding of cute teddy image + cute teddy text embedding embedding of cute Teddy image + embedding of jeremy howard text dot product
image of jeremy howard embedding of jeremy howard image + a graceful swan text embedding dot product embedding of jeremy howard image embedding + a cute Teddy text embedding dot product embedding of jeremy howard image + jeremy howard text embedding dot product
image\Text A gracful swan A cute teddy jeremy howard
image of swan big small small
image of cute teddy small big small
image of jeremy howard small small big

flowchart LR
A[Text e.g. a cute teddy] --> B[Vector of numbers representing a Teddy] -->C[Neural network]
D[Some noisy image] --> C[Neural network] 

Time steps


flowchart LR

A[Image of training set] --> B[Create mini Batch] 
B[Create mini Batch] --> C[Pick beta or sample time step to select noise amount]
C[Pick beta or sample time step to select noise amount] --> D[Add noise to the mini batch]
 D[Add noise to the mini batch] --> E[Train the model]
tags: