notes

uni notes
git clone git://popovic.xyz/notes.git
Log | Files | Refs

plan.md (948B)


      1 # Plan for the presentation
      2 
      3 ## Introduction
      4     * Setup
      5     * Method
      6     * main focus -> learning rate
      7     * conjecture -> large step size learns sparse features & generalizes
      8     better
      9     * but to do this we need to dive deeper in to the dynamics of an SGD
     10     iteration
     11     * SGD is GD with specific label noise
     12 
     13 ## Loss stabilization for quadratic loss
     14     * Setup
     15     * Iterates
     16     * Proposition for Loss staiblization
     17     * Proof
     18     * Explanation
     19 
     20 
     21 ## SGD Dynamics
     22     * Stochastic differential equations
     23     * what does it mean for sgd
     24     * Utilization for loss stabilizaion
     25     * measurement of sparse feature learning
     26     * feature sparsity coefficient
     27 
     28 
     29 ## Diagonal Networks
     30     * Setup
     31     * Measuring
     32     * Results
     33 
     34 
     35 # SGD and GD have different implicit biases
     36 
     37 
     38 ## ReLU Networks
     39     * Setup
     40     * Measuring
     41     * Results
     42 
     43 ## Outlook to more comlex cases
     44     * Setup
     45     * Datasets
     46     * Warmup step size
     47     * Results
     48 
     49 * Thats it