August 24th

  • Long await between updates.
  • Week 8 of the AI alignment program
  • Project on understanding LLMs
  • Transformers learn shortcuts to Automata
    • They are training the network on inputs and outputs of Automata
    • Seems like they don’t claim that Automata is what the transformer is learning (need to confirm)
    • They show that a parallel theoretical solution exists for those Automata that they think transformer might be learning.
    • The transformer is forced to learn recursive solution which is shown to be more robust
    • Again, I am not sure if the first version that the transformer learns is actually a solution or simply memorization.
     

    August 23rd

    • Theoretical linguistics capabilities of LLMs
    • Recursion in LLMs

    August 11th

    • Technical AI governance
     

    August 5th

      • An intuition for the transformer model from the perspective of the residual stream
      • Eg. : GPT type model, Decoder only transformers
      • For simplification, they avoid the MLPs, because they want to be able to view the model as a linear sum, by avoiding the activation functions that causes non-linearity.
        💡
        I am curious how could these toy model would be: one that it is already a small model, and on top of it we are not considering the activation function.
         
        Question: What loss-function is used for the next word predictor?
        Answer: Cross-entropy loss for the output vector of size , calculating the classification probability among all possible words in the vocabulary.
        • 0-layer transformer
         
         

        August 2nd

        • Go through the readings of this week on Robustness for adversial attacks.
        • First paper on attacks on LLMs to produce harmful content.
          • Manual Jailbreaking requires human ingenuity and is time intensive.
          • Previous attempts on generating harmful response using gradient on Loss function depended upon a particular model and a particular response.
          • Here they use the technique on making the model start the answer with, Yes sure, here is …
          • Here … is the harmful prompt. In that case the model will generate the harmful content with high likelihood. This is used through prompting jailbreaking as well but chances are lower.
          • The model is used to find an optimal suffix to any prompt that would start the answer as above.
        • Cohort call for AI Safety program
          • Discussion on the AI control, Unlearning etc.
        • Chris Olah’s talk

        August 1st

        • Interpretable AI: There are various techniques that have been classically studied to understand the internals of the neural networks.
        • Read the section on
        • They identify the pixels in the input space which corresponds to the higher activation mainly via two different kind of techniques:
          • Occlusion- or perturbation-based: Methods like SHAP and LIME manipulate parts of the image to generate explanations (model-agnostic). Gradient-based: Many methods compute the gradient of the prediction (or classification score) with respect to the input features. The gradient-based methods (of which there are many) mostly differ in how the gradient is computed
         

        Visualization in the CovNets

         
        Lecture 12 | Visualizing and Understanding
        In Lecture 12 we discuss methods for visualizing and understanding the internal mechanisms of convolutional networks. We also discuss the use of convolutional networks for generating new images, including DeepDream and artistic style transfer. Keywords: Visualization, t-SNE, saliency maps, class visualizations, fooling images, feature inversion, DeepDream, style transfer Slides: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture12.pdf -------------------------------------------------------------------------------------- Convolutional Neural Networks for Visual Recognition Instructors: Fei-Fei Li: http://vision.stanford.edu/feifeili/ Justin Johnson: http://cs.stanford.edu/people/jcjohns/ Serena Yeung: http://ai.stanford.edu/~syyeung/ Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This lecture collection is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. From this lecture collection, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. Website: http://cs231n.stanford.edu/ For additional learning opportunities please visit: http://online.stanford.edu/
        Lecture 12 | Visualizing and Understanding
        • Visualize the filters in the first layer.
          • Shows the shapes looked into for the template matching.
        • Nearest neighbour in the last hidden layer.
          • The semantically similar photos are near to each other even if they might differ to each other in pixel position
        notion image
         
        badge