MSc. Thesis Defense: Sefa Yıldız | Sabancı Üniversitesi

VERTEX COLORING BY SUBGRAPH EXPANSION IN UNSUPERVISED GRAPH NEURAL NETWORKS: CONSTRUCTING A CURRICULUM BY ITERATIVE GROWTH OF SUBGRAPHS OF AN INPUT GRAPH

Sefa Yıldız
Data Science, MSc. Thesis, 2025

Thesis Jury

Prof. Dr. Can Akkan (Thesis Advisor)

Prof. Dr. Enes Eryarsoy

Assoc. Prof. Ayla Gülcü

Date & Time: July 18th, 2025 – 10.30 AM

Place: SBS 1127

Keywords: unsupervised learning, graph neural network, curriculum learning, incremental learning, vertex coloring

Abstract

Vertex coloring is a classic combinatorial optimization problem in which each vertex of a graph must be assigned a color so that no two adjacent vertices share the same color. This problem is known to be NP-complete for determining whether a graph can be colored with k colors, and finding the minimum number of colors without conflicts is NP-hard (Garey & Johnson, 1990). It has important applications in scheduling, register allocation, timetabling, and other domains. In recent years, Graph Neural Networks (GNNs) have emerged as powerful models for graph-structured learning, and can tackle vertex coloring by framing it as an unsupervised node-classification task. In particular, Schuetz, Brubaker, Zhu & Katzgraber (2022) show that a physics-inspired approach optimized a Potts-based loss to encourage valid colorings. This GNN-based method has achieved performance on par with or better than classical solvers, even scaling to graphs with millions of vertices.

In this thesis, I propose a novel curriculum-inspired training strategy for vertex coloring using unsupervised GNNs. The method begins by training on a small subgraph and incrementally expands the training set (through layer-by-layer BFS-based, Breadth-First-Search, expansion, degree-first BFS-based expansion, or random walk expansion) to cover the entire graph. This curriculum-like incremental training exposes the network learn in stages, leveraging pre-trained embeddings and pre-trained model weights from the subgraph of the previous stage. Two GNN architectures (a Graph Convolution model and a GraphSAGE model) were implemented and trained with a continuous Potts-based loss. We evaluated our approach on a subset of the COLOR benchmark dataset (including Mycielski graphs, n-queen graphs, and a social network graph). The experiments show that the subgraph expansion methods—especially the degree-first BFS variant—reduces total training time by 30–35% while maintaining statistically comparable conflict counts. These results suggest that the curriculum-like incremental training is a promising direction for leveraging GNNs in combinatorial graph tasks.