On the Robustness of Self-Supervised Representations for Multi-view Object Classification

In this post, I’ll talk about a paper we recently published on the robustness of self-supervised representation with respective to viewpoint variation - one of the core tenants of any capable vision system. At this point, it is known that vision models pretrained using self-supervised objectives outperform standard supervised pretraining on a set of common, standard benchmark datasets such as ImageNet, CIFAR10, COCO, and Birdsnap. However, these datasets all serve to evaluate these models in a very narrow aspect - simple object classification performance.

ssl   self-supervised learning   representation learning  

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

As is commonly known at this point, transformers have transformed the field of NLP, and sequence modelling in general. However, computer vision has thus far remained dominated by the CNN. Its inductive biases result in unparalleled efficiency in terms of data and parameters for modelling data with a grid-like topology - most often images or video.

transformers   vision transformers   inductive biases  

pydags - A lightweight DAG framework for Python

I recently released a pre-alpha version of a Python library I’ve been working on. It’s still in the very early stages of development, but this tutorial aims to give an introduction to the library and its purpose.

python   dag   directed acyclic graph   kubeflow   airflow   python3  

A Simple Framework for Contrastive Learning of Visual Representations

A popular and useful framework for contrastive self-supervised learning known as SimCLR was introduced by Chen et. al.. The framework simplifies previous contrastive methods to self-supervised learning, and at the time was state-of-the-art at unsupervised image representation learning. The main simplification lies in the fact that SimCLR requires no specialised modules or additions to the architecture such as memory banks.

self-supervised learning   computer vision   deep learning  

A Foundation of Mathematics - The Peano Axioms

In the past mathematicians wished to created a foundation for all of mathematics. The number system can be constructed hierarchically from the set of natural numbers \(\mathbb{N}\). From \(\mathbb{N}\), we can construct the integers \(\mathbb{Z}\), rationals \(\mathbb{Q}\), reals \(\mathbb{R}\), complex numbers \(\mathbb{C}\), and more. However, it is desirable to be able to construct the naturals (\(\mathbb{N}\)) from more basic ingredients, since there is no reason \(\mathbb{N}\) should itself be fundamental.

mathematics   peano axoims  

Reducing the dimensionality of data with neural networks

Reducing the dimensionality of data has many valuable potential uses. The low-dimensional version of the data can be used for visualisation, or for further processing in a modelling pipeline. The low-dimensional version should capture only the salient features of the data, and can indeed be seen as a form of compression. Many techniques for dimensionality reduction exists, including PCA (and its kernelized variant Kernel PCA), Locally Linear Embedding, ISOMAP, UMAP, Linear Discriminant Analysis, and t-SNE. Some of these are linear methods, while others are non-linear methods. Many of the non-linear methods falls into a class of algorithms known as manifold learning algorithms.

deep learning   autoencoder   dimensionality reduction  

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Labelled data is often either expensive or hard to obtain. As such, there has been a plethora of work to make better use of unlabelled data in machine learning, with paradigms such as unsupervised learning, semi-supervised learning, and more recently, self-supervised learning. FixMatch is an approach to semi-supervised learning (SSL) that combines two common approaches of SSL: 1. consistency regularisation and 2. pseudo-labelling.

semi-supervised learning   computer vision   deep learning  

All About Convex Hulls

The convex hull is a very important concept in geometry, and has many applications in fields such as computer vision, mathematics, statistics, and economics. Essentially, a convex hull of a shape or set of points is the smallest convex set that contains that shape or set of points. Many algorithms exist to compute a convex hull. Many of these algorithms have focused on the 2D or 3D case, however, the general \(d\)-dimensional case is of big interest in many applications.

computational geometry   geometry   computer vision  

Representation Learning (1)

For a while I’ve been interested in representation learning in the context of deep learning. Concepts such as self-supervised learning, unsupervised representation learning using GANs or VAEs, or simply through a vanilla supervised learning of some neural network architecture. Upon reading the literature, I had an idea that serves as a nice integration of two very interesting and useful models / techniques - the Fisher vector (which I’ve previously posted about in my blog here), and the variational autoencoder (which I’ve been meaning to write a blog post about!). This blog post just serves to flesh out the idea, should I choose to pursue or revisit it at some point.

representation learning   fisher vectors   deep learning  

Human Action Recognition

In this post we will discuss the problem of human action recognition - an application of video analysis / recognition. The task is simply to identify a single action from a video. The typically setting is a dataset consisting of \(N\) action classes, where each class has a set of videos associated with it relating to that action. We will focus on the approaches typically taken in early action recognition research, and then focus on the current state-of-the-art approaches. There is a recurring theme in action recognition of extending conventional two-dimensional algorithms into three dimensions to accommodate for the extra (temporal) dimension when dealing with videos instead of images.

cnn   deep learning   action recognition  

Dimensionality Reduction

In machine learning, we often work with very high-dimensional data. For example, we might be working in a genome prediction context, in which case our feature vectors would contains thousands of dimensions, or perhaps we’re dealing in another context where the dimensions reach of hundreds of thousands or possibly millions. In such a context, one common way to get a handle on the data - to understand it better - is to visualise the data by reducing its dimensions. The can be done using conventional dimensionality reduction techniques such as PCA and LDA, or using manifold learning techniques such as t-SNE and LLE.

dimensionality reduction   pca   t-SNE   machine learning   manifold learning  

Optical Flow

Optical flow is a method for motion analysis and image registration that aims to compute displacement of intensity patterns. Optical flow is used in many different settings in the computer vision realm, such as video recognition and video compression. The key assumption to many optical flow algorithms is known as the brightness constancy constraint, as is defined as:

optical flow   lucas kanade   dense optical flow   computer vision  

Ensemble Learning

Ensemble learning is one of the most useful methods in the machine learning, not least for the fact that it is essentially agnostic to the statistical learning algorithm being used. Ensemble learning techniques are a set of algorithms that define how to combine multiple classifiers to make one strong classifier. There are various ensemble learning techniques, but this post will focus on the two most popular - bagging and boosting. These two approach the same problem in very different ways.

ensemble learning   boosting   bagging   machine learning  

Autoencoders

Autoencoders fall under the unsupervised learning category, and are a special case of neural networks that map the inputs (in the input layer) back to the inputs (in the final layer). This can be seen mathematically as \(f : \mathbb{R}^m \mapsto \mathbb{R}^m\). Autoencoders were originally introduced to address dimensionality reduction. In the original paper, Hinton compares it with PCA, another dimensionality reduction algorithm. He showed that autoencoders outperform PCA when non-linear mappings are needed to represent the data. They are able to learn a more realistic low-dimensional manifold than linear methods due to their non-linear nature.

autoencoder   auto-encoders   neural networks   deep learning   dimensionality reduction  

Face Recognition: Eigenfaces

The main idea behind eigenfaces is that we want to learn a low-dimensional space - known as the eigenface subspace - on which we assume the faces intrinsically lie. From there, we can then compare faces within this low-dimensional space in order to perform facial recognition. It’s a relatively simple approach to facial recognition, but indeed one of the most famous and effective ones of the early approaches. It still works well in simple, controlled scenarios.

face recognition   eigenfaces   pca  

Support Vector Machines - Why and How

Support vector machines (SVMs) are one of the most popular supervised learning algorithms in use today, even with the onslaught of deep learning and neural network take-over. The reason they have remained popular is due to their reliability across a wide variety of problem domains and datasets. They often have great generalisation performance, and this is almost solely due to the clever way in which they work - that is, how they approach the problem of supervised learning and how they formulate the optimisation problem they solve.

svm   support vector machine   machine learning   kernel machines  

Local Feature Encoding and Quantisation

In this post, I will describe local feature encoding and quantisation - why it is useful, where it is used, and some of the popular techniques used to perform it.

fisher vector   vlad   feature vectors   feature encoding   computer vision