Generative Models and Bootstrapping for Titanium Alloy Microstructure Analysis

Cat P. Le · Matthew LaRosa · Vahid Tarokh

arXiv:2305.11400 Machine Learning (cs.LG) Preprint • v1 • Nov 6, 2025

Abstract PDF

Abstract

The electron backscatter diffraction (EBSD) process for titanium alloys is complex, time-consuming, and expensive. Furthermore, up to 30% of scan data points can be uncertain due to factors such as material surface properties, holding positions, or scanner limitations. These uncertainties present significant challenges in understanding the material and its microstructure distribution. This paper explores the application of machine learning generative models, including generative adversarial networks (GANs) and normalizing flows, to capture the underlying representations of EBSD datasets with missing or uncertain points. The proposed models are capable of reconstructing and generating synthetic EBSD scans of titanium alloys, offering valuable potential for advanced applications such as fatigue and crack detection.

TL;DR: Electron backscatter diffraction (EBSD) for titanium alloys is a complex and costly process. This study investigates the use of machine learning generative models, such as GANs and normalizing flows, to address these uncertainties by learning representations from incomplete EBSD datasets.

Introduction

EBSD is a critical technique for analyzing material microstructures, especially in aerospace applications. However, it is costly, time-consuming, and often results in incomplete data due to scanning limitations. This paper proposes using generative models to reconstruct and generate EBSD scans, improving data utility and enabling better material analysis.

Challenges

High cost and complexity of EBSD scanning.
Limited availability of high-quality EBSD datasets (typically 5–10 scans).
Data heterogeneity across different scans prevents dataset merging.
Up to 30% of scan data may be missing or noisy due to detector and preparation issues.

Method in Brief

Three EBSD datasets of titanium alloys were used, sourced from AFRL and RTX, each treated independently due to heterogeneity.
Scans were divided into 250×250 samples, and data augmentation (flipping, rotation, cropping) was applied to expand the training set.
GAN-based models (DCGAN, WGAN-GP, MisGAN) were implemented to generate and reconstruct EBSD scans, but showed limitations due to small datasets and mode collapse.
Flow-based models (PixelCNN, PixelCNN++) were employed for pixel-by-pixel generation and missing data imputation, showing superior performance and realism.
A Conditional PixelCNN++ model was developed to learn microstructural variations across different scan angles and material cuts.

Model Architectures

GAN-based Models

DCGAN and WGAN-GP were used to generate EBSD scans but suffered from artifacts and mode collapse.
MisGAN attempted to learn missing data masks but overestimated missing regions and failed to reconstruct meaningful patterns.

Flow-based Models

PixelCNN and PixelCNN++ generated more realistic and structured scans.
PixelCNN++ improved over PixelCNN by reducing artifacts and modeling full pixel distributions.
Conditional PixelCNN++ captured variations across different material cuts (bore, transition, rim).

Image Gallery

The synthetic EBSD scans generated by DCGAN model. This model suffered from mode collapse.

The synthetic EBSD scans generated by WGAN-GP model. Similar to DCGAN, this model also suffered from mode collapse.

The synthetic EBSD scans generated by PixelCNN++ model. This flow-based model showed superior performance and realism.

Datasets

Three datasets from AFRL and RTX (Pratt & Whitney) were used.
Each dataset was processed independently due to heterogeneity.
Scans were divided into 250×250 samples, yielding ~1,000 samples per dataset after augmentation.

Results

GANs produced low-quality samples with artifacts and lacked diversity.
MisGAN failed to learn accurate missing data masks.
PixelCNN++ generated high-quality, realistic EBSD scans with fewer artifacts.
Conditional PixelCNN++ successfully modeled distinct microstructures across different scan angles.

Key Contributions

Introduced generative machine learning models (GANs and flow-based models) to reconstruct and synthesize EBSD scans of titanium alloys.
Demonstrated that flow-based models (PixelCNN, PixelCNN++) outperform GANs in handling missing data and generating realistic microstructures, especially with limited datasets.
Developed a Conditional PixelCNN++ model to capture microstructural variations across different scan angles and material cuts.
Applied data augmentation techniques (cropping, flipping, rotation) to expand small EBSD datasets while preserving microstructural integrity.
Provided a comparative analysis of GAN-based and flow-based models, highlighting the limitations of GANs in EBSD modeling due to mode collapse and data scarcity.
Showed practical implications for aerospace materials analysis, including potential applications in fatigue detection and material classification.

Citation

          @article{le2023modeaware,
            title={Generative Models and Bootstrapping for Titanium Alloy Microstructure Analysis},
            author={Le, Cat P. and LaRosa, Matthew and Tarokh, Vahid},
            journal={arXiv preprint arXiv:2305.11400},
            year={2025}
          }

Contact

Questions about this work? Reach out: calvine.le@gmail.com

More: Google Scholar