Self-Supervised Masked Graph modeling for generating Graphs based on Variational-AVE “Constructive learning” Part 1

11 min readFeb 21, 2024

1. Introduction

Generating Graphs from Thin Air: A New Self-Learning Technique for AI Recent years have seen exciting advancements in self-learning, where AI systems can learn on their own without needing labeled data. This approach has been particularly successful in areas like natural language processing, but for graphs (complex networks representing relationships between objects), progress has been slower.

This article introduces a new self-learning technique called Masked Graph Autoencoder (MaskGAE), specifically designed for graphs. Unlike traditional approaches that focus on reconstructing entire graphs, MaskGAE plays a clever game: it hides parts of the graph and then tries to guess the missing pieces based on the remaining information. This seemingly simple trick, called masked graph modeling, unlocks powerful learning capabilities, as we’ll explore further.

We’ll delve into both the theoretical and practical aspects of MaskGAE, showing why it works and how it outperforms other cutting-edge methods on tasks like predicting missing connections and classifying different types of nodes within a graph. By the end, you’ll gain insights into how AI can learn intricate relationships and even generate entirely new graphs on its own — a fascinating glimpse into the future of AI for graph-related tasks.

before Diving into Graph !! llet us Explain the Three Different Aspect Modeling ;

1. Masked language modeling (MLM): This is a type of modeling that uses masks to represent words in a sentence. For example, the sentence “The quick brown fox jumps over the lazy dog” could be masked as “The [MASK] [MASK] fox jumps over the [MASK] dog”. The MLM model would then try to predict the masked words, based on the context of the surrounding words.

Masked Language Model (MLM) is the process how BERT was pre-trained. It has been shown, that to continue MLM on your own data can improve performances (see Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks). In our TSDAE-paper we also show that MLM is a powerful pre-training strategy for learning sentence embeddings. This is especially the case when you work on some specialized domain.

2. Masked image modeling (MIM): This is a type of modeling that uses masks to represent patches in an image. For example, an image of a cat could be masked by covering up half of the cat’s face. The MIM model would then try to predict the missing pixels, based on the information in the unmasked parts of the image.

3. Masked graph modeling (MGM): This is a type of modeling that uses masks to represent edges in a graph. For example, a graph of social relationships could be masked by removing some of the edges between people. The MGM model would then try to predict the missing edges, based on the information in the unmasked edges and the nodes of the graph.

Graph Representation from Real world data

Graph representation is a fundamental concept in computer science and mathematics used to model relationships between objects. A graph consists of a set of vertices (also known as nodes) and a set of edges that connect pairs of vertices. Graphs can be used to represent a wide range of real-world systems, such as social networks, transportation networks, and molecular structures.

There are several common ways to represent a graph computationally:

Adjacency Matrix: An adjacency matrix is a 2D array where the rows and columns represent vertices, and the presence of an edge between two vertices is indicated by a non-zero value in the corresponding cell. This representation is efficient for dense graphs but can be memory-intensive for sparse graphs.
Adjacency List: An adjacency list is a collection of lists or arrays, where each vertex maintains a list of its adjacent vertices. This representation is more memory-efficient for sparse graphs and allows for efficient traversal of the graph.
Edge List: An edge list is a simple list of pairs of vertices that are connected by edges. While straightforward, this representation may require additional processing for certain graph algorithms.

Choosing the appropriate graph representation depends on factors such as the size and density of the graph, memory constraints, and the specific operations to be performed on the graph. Each representation has its advantages and trade-offs in terms of space complexity, time complexity, and ease of implementation.

Distrusted Graph Types

Unveiling the Hidden Connections: Exploring the World of Graphs:

While you may readily recognize graphs in the form of social networks, their versatility extends far beyond that. They offer a powerful and adaptable way to represent a vast array of data, often in ways you wouldn’t initially expect. This article delves into the intriguing world of graphs, uncovering hidden connections in images, text, and other types of data.

2. Beyond the Grid: Graphs as a New Lens for Images

We often view images as pixel grids, represented by numerical arrays. But there’s another perspective: we can visualize them as graphs! Imagine each pixel as a node, connected to its neighbors like a mini social network. This approach captures the spatial relationships between pixels, revealing valuable information about the image’s structure and symmetries.

Take a simple 5x5 image of a smiley face. Each pixel is a node, with eight connections to its neighboring pixels. The information stored at each node is a 3-dimensional vector representing the RGB color value. This representation allows us to create an adjacency matrix, showcasing the connections between nodes. By clicking on individual pixels, you can see how the graph dynamically updates, reflecting the changes in the image data.

3. From Words to Connections: Graphing the Flow of Text

Text, too, can be seen as a graph. By assigning unique identifiers to characters, words, or tokens, we create a sequence of connected nodes. Just like in the smiley face example, words are nodes, connected to the words before and after them. This creates a directed graph, capturing the flow and structure of the text.

Interestingly, this isn’t the only way to view text as a graph. Models like Transformers consider text as a fully connected graph, where relationships between any two tokens are learned. This offers deeper insights into the meaning and connections within the text.

While these graph representations might seem unusual, they provide valuable perspectives on images and text. They go beyond the limitations of grids and sequential structures, revealing hidden connections and patterns that traditional methods might miss.

4. Beyond the Familiar: Graphing the Diverse World of Data

Graphs aren’t limited to images and text. They excel at representing complex, heterogeneous data where nodes have varying numbers of connections. Take molecules, the building blocks of matter. Each atom is a node, connected by covalent bonds (edges) to other atoms. This graph representation captures the intricate relationships between atoms in 3D space, providing a powerful tool for studying and manipulating molecules.

Similarly, graphs can represent social networks, transportation systems, protein structures, and much more. Their flexibility allows them to adapt to diverse data, revealing hidden connections and patterns that would be difficult to discover otherwise.

By exploring the world of graphs, we unlock a new way of understanding and manipulating data. From the familiar grids of images to the intricate structures of molecules, graphs offer a lens for uncovering the hidden connections that bind our world together.

introduction to Generative Graph Modeling

Generative graph modeling has emerged as a promising approach for self-supervised learning, aiming to learn meaningful representations from unlabeled data in a task-agnostic manner. This paradigm has gained traction, particularly within the domain of graph neural networks (GNNs), due to its ability to effectively capture complex relationships and structural patterns present in graph-structured data.

Graph Autoencoders and Self-Supervised Learning

One prominent line of research in generative graph modeling focuses on graph autoencoders (GAEs), which are self-supervised learning models designed to reconstruct the input graph itself. Unlike contrastive methods that rely on handcrafted pretext tasks, GAEs leverage graph reconstruction as a natural pretext task, making them simple to implement and integrate with existing frameworks. However, literature has shown that traditional GAEs may overemphasize proximity information at the expense of structural details, limiting their applicability to tasks beyond link prediction.

Masked Auto-encoding for Graph Data :

Recent advancements in self-supervised learning, particularly in the fields of language and vision research, have highlighted the effectiveness of masked autoencoding. Masked autoencoding involves masking a portion of input signals and predicting the hidden contents, similar to fill-in-the-blank tasks in natural language processing. While this approach has shown promise in text and image data, its applicability to graph data has been relatively underexplored until recently.

To bridge this gap, masked graph modeling (MGM) has been proposed as a principled pretext task for graph-structured data. MGM involves removing a portion of the input graph and learning to predict the removed content, such as edges. This approach leverages the idea of masking and predicting through node and edge-level reconstruction, offering a novel way to perform self-supervised learning on graphs.

Introduction: Generative Learning vs. Contrastive Learning

Self-supervised learning mines its own supervised information from a large amount of unsupervised data. Compared to supervised learning, it uses information from the dataset itself to construct pseudo-labels. Self-supervised learning has great potential to serve as a complement to supervised learning in terms of representation learning [9].

Since the introduction of MoCo and SimCLR, contrastive learning has dominated the field of graph self-supervised learning, and its performance on tasks such as node classification and graph classification has far outperformed generative self-supervised learning approaches. However, the success of contrastive learning often relies on several factors.

High-quality data augmentation. GraphCL [5] explored the effectiveness of several data augmentation methods such as masking attributes, subgraph sampling, and randomly adding and dropping edges. However, it has found that effective data augmentation on graphs tends to rely on domain knowledge; for example, randomly adding and removing edges is beneficial for training in social networks, while it can have negative effects in molecular graphs. In summary, there is still no universal effective data augmentation in graph contrastive learning until now.
Complicated strategies to stabilize training. Contrastive methods avoid the model from falling into a trivial solution via versatile training tricks. Methods such as GRACE [8], DGI [7], and GraphCL [5] use negative sampling in training, while BGRL [6] leverages an asymmetric network structure and Exponential Moving Average strategy.

Generative self-supervised pre-training has been very successful in NLP and CV (BERT and MAE).

On the contrary, generative self-supervised learning can avoid the above dependencies. Generative self-supervised learning aims to reconstruct the features and information of the data itself. In NLP (Natural Language Processing), BERT [3] aims at recovering the masked words; in CV (Computer Vision), MAE [2] recovers the pixel points of the pictures.

For graphs, Graph Autoencoder (GAE) reconstructs the structural information or node features of the graph. Most existing graph autoencoders focus on link prediction and graph clustering objectives, and therefore usually choose to reconstruct the structural information of the graph, i.e., the adjacency matrix A. However, recent advances in graph autoencoders have lagged far behind contrastive learning, and their performance on tasks such as classification has not been satisfactory. SOTA for tasks such as node classification and graph classification are all based on the contrastive learning approach.

Unlike previous graph autoencoders, GraphMAE finds that:

simply reconstructing node features corrupted by mask allows graph autoencoders to surpass contrastive learning

GraphMAE’s crucial designs lie in the following aspects:

Node feature reconstruction with masks. Existing graph autoencoders usually uses edges as reconstruction targets, but it usually performs poorly in downstream classification tasks.
Decoding process with remasking & GNN as decoder. Existing graph autoencoders usually choose MLP as decoder, and since most graph node features are continuous vectors, the capability of MLP is not sufficient to reconstruct node features from the encoding result.
The MSE is replaced by the scaled cosine error as the loss function.

Understanding Masked Graph Modeling for Graph Autoencoders

in recent years, the field of machine learning has witnessed a surge in interest in self-supervised learning, a paradigm that enables models to learn representations from unlabeled data in a task-agnostic manner. Among the various domains where self-supervised learning has made significant strides, graph neural networks (GNNs) stand out as particularly promising due to their ability to capture intricate relationships and structural patterns present in graph-structured data. In this article, we introduce MaskGAE, a novel self-supervised learning framework for graphs that leverages the concept of masked graph modeling (MGM) as a principled pretext task.

The Foundation: Masked Graph Modeling (MGM)

At the heart of MaskGAE lies the concept of masked graph modeling (MGM), which forms the cornerstone of the self-supervised learning approach. MGM involves removing a portion of the input graph and tasking the model with predicting the removed content, such as edges. This intuitive pretext task provides the model with supervisory signals derived directly from the data, facilitating the learning of meaningful representations without the need for explicit labels.

Theoretical Underpinnings

The approach to self-supervised learning with MaskGAE is not only empirically grounded but also firmly rooted in theory. Insights from contrastive learning are drawn upon, highlighting the parallels between graph autoencoders (GAEs) and contrastive learning models. By framing GAEs as models that maximize mutual information between paired subgraph views associated with linked edges, a theoretical foundation is provided for understanding their effectiveness in self-supervised learning tasks. Moreover, it is demonstrated how MGM can enhance mutual information maximization by reducing overlap (redundancy) between subgraph views. This critical insight underscores the importance of effective masking strategies in self-supervised learning tasks and forms the basis for the design of MaskGAE.

Key Features of MaskGAE

MaskGAE integrates efficient encoding with robust decoding mechanisms to learn meaningful representations from graph-structured data. The key features of MaskGAE include:

Encoder: Graph convolutional networks (GCN) are employed as the encoder, a well-established GNN architecture renowned for its effectiveness in modeling graph data. Unlike traditional GAEs, the encoder in MaskGAE processes only a small portion of edges during training, ensuring computational efficiency and scalability.

Decoder: MaskGAE incorporates both structure and degree decoders to facilitate the reconstruction of graph structure and approximation of node degrees in a masked graph, respectively. These decoding mechanisms balance proximity and structural information, enhancing the model’s representation learning capabilities.

Learning Objective: The loss function for MaskGAE comprises reconstruction and regression losses, which measure the model’s proficiency in reconstructing the masked graph and approximating node degrees, respectively. This dual-loss framework serves as a regularizer for the encoder, encouraging the learning of more generalized representations.

Conclusion

In conclusion, MaskGAE represents a significant advancement in the realm of self-supervised learning for graphs. By leveraging the concept of masked graph modeling and drawing upon theoretical insights from contrastive learning, MaskGAE offers a principled approach to learning meaningful representations from unlabeled graph data. With its robust architecture and empirical validation, MaskGAE holds promise for a wide range of applications in various domains, heralding a new era of self-supervised learning for graph-structured data.

Reference

GraphMAE: Self-Supervised Masked Graph Autoencoders

What’s Behind the Mask: Understanding Masked Graph Modeling
for Graph Autoencoders

Variational Graph Auto-Encoders