Radiant Diagnostics Buddy
Engineering a Multi-Modal AI System for Thoracic Pathology Detection
Problem Statement
Radiant Diagnostics Buddy is a multi‑modal deep learning system developed to assist radiologists in detecting thoracic pathologies from chest X‑ray images while incorporating essential clinical metadata. The project addresses a real‑world medical artificial intelligence challenge: multi‑label disease classification under extreme class imbalance, solved using a Late Fusion architecture that integrates convolutional neural networks with structured clinical data modeling. Beyond model performance, the project emphasizes data integrity, clinical realism, and evaluation rigor, demonstrating a deep understanding of advanced machine learning principles as applied to healthcare.
1. Introduction: Motivation, Context, and Problem Definition
Chest X‑ray imaging is the most frequently used diagnostic imaging technique worldwide due to its low cost, speed, and accessibility. In large hospitals and emergency departments, radiologists may be required to interpret hundreds of X‑ray images daily. Under such conditions, fatigue and time pressure can significantly increase the likelihood of delayed interpretation or missed subtle findings. Early‑stage pneumonia, small nodules, or mild effusions can easily escape detection, particularly in under‑resourced healthcare systems.
Radiant Diagnostics Buddy was conceived as a clinical decision‑support tool, not as a replacement for medical professionals. Its goal is to provide consistent, fatigue‑free preliminary screening that can assist clinicians in prioritizing cases, reducing diagnostic delays, and improving overall workflow efficiency. By flagging potentially abnormal scans, the system supports faster intervention and improved patient outcomes.
From a technical standpoint, this project tackles a complex and realistic machine learning problem. Chest X‑ray diagnosis is inherently multi‑label, meaning that a single image can exhibit multiple co‑existing pathologies simultaneously. Additionally, the dataset is severely imbalanced, with certain diseases appearing tens of thousands of times while others occur only a few hundred times. Finally, medical diagnosis is multi‑modal by nature: clinicians do not rely on images alone, but also interpret findings in the context of patient age, gender, and imaging view.
Formally, the objective is to learn a function that maps an image and its associated metadata to a vector of independent disease probabilities. To address this, the project implements a Late Fusion deep learning architecture that mirrors clinical reasoning by combining visual and contextual information at a high semantic level.
2. Dataset Description and Exploratory Analysis
2.1 The NIH ChestX‑ray14 Dataset
The project uses the NIH ChestX‑ray14 dataset, a large‑scale public medical imaging dataset that has become a benchmark in thoracic pathology research. The dataset contains 112,120 frontal chest X‑ray images collected from 30,805 unique patients, each annotated with up to 15 disease labels. In addition to the images, the dataset includes valuable metadata such as patient age, gender, and view position (PA or AP).
This dataset was chosen deliberately because it reflects the complexity and imperfections of real clinical data, making it ideal for demonstrating applied advanced machine learning rather than controlled toy examples.
2.2 Pathology Distribution and Class Imbalance
A detailed exploratory analysis revealed a highly skewed distribution of disease labels. More than half of the images are labeled as “No Finding,” while certain conditions such as Hernia and Pneumonia appear extremely infrequently.

[INSERT VISUAL 1 HERE: A bar chart showing disease frequency distribution, clearly illustrating the severity of class imbalance.]
This imbalance poses a significant challenge for machine learning models. Without correction, a model could achieve deceptively high accuracy by consistently predicting the majority class. To prevent this, the project incorporates Weighted Binary Cross‑Entropy Loss, which penalizes errors on rare diseases more heavily, forcing the model to learn discriminative features even for under‑represented conditions.
3. Data Cleaning and Preprocessing
Data preprocessing was treated as a critical engineering phase rather than a routine step. In medical AI, poor data handling can lead to misleading results and unsafe models.
First, strict data integrity checks were performed to ensure that every record in the metadata CSV file had a corresponding image on disk. Entries referencing missing or corrupted images were removed to prevent runtime failures and silent training errors.
Patient age required special attention. The raw dataset stores age as strings containing non‑numeric characters. Regular Expressions were used to extract numeric values, after which biologically implausible ages were removed. The cleaned age values were then standardized using Z‑score normalization, ensuring numerical stability and preventing age from disproportionately influencing gradient updates.
Disease labels were originally provided as pipe‑separated strings representing multiple diagnoses per image. These were transformed using a MultiLabelBinarizer, converting each label set into a fixed‑length binary vector. This representation allows the model to treat each disease as an independent Bernoulli classification task, which is essential for multi‑label learning.
Finally, categorical metadata such as gender and view position were encoded numerically. Explicitly encoding view position was particularly important, as AP and PA views introduce systematic anatomical distortions that can otherwise lead to false positives, especially for cardiac conditions.

4. Train‑Validation‑Test Strategy and Data Leakage Prevention
One of the most important methodological decisions in this project was the use of patient‑wise data splitting. Many patients in the dataset have multiple X‑ray images taken over time. If images from the same patient appear in both training and testing sets, the model may learn patient‑specific anatomical features rather than true disease patterns, leading to overly optimistic evaluation results.
To prevent this form of data leakage, all images belonging to a single patient were assigned exclusively to one dataset split. This ensures that model performance reflects true generalization to unseen patients, aligning the evaluation protocol with real‑world clinical deployment scenarios.
5. Model Architecture: A Multi‑Modal Late Fusion Design
The architectural design of Radiant Diagnostics Buddy is inspired by how clinicians diagnose disease. Radiologists interpret visual patterns in X‑rays while simultaneously considering patient demographics and imaging context. To replicate this reasoning process, the model is divided into two specialized branches.
[INSERT VISUAL 2 HERE: Architecture diagram showing the Vision Branch, Metadata Branch, and Late Fusion integration.
]
5.1 Vision Branch: DenseNet‑121
The vision branch is built upon DenseNet‑121, a convolutional neural network known for its dense connectivity pattern. Unlike traditional deep networks, DenseNet connects each layer to every subsequent layer, enabling feature reuse and preserving fine‑grained spatial information. This property is particularly valuable in medical imaging, where subtle texture variations may indicate pathology.
The network was initialized with ImageNet pre‑trained weights, allowing the model to leverage generic visual features such as edges and shapes learned from millions of natural images. Fine‑tuning these weights enables the network to adapt to domain‑specific radiographic patterns while significantly reducing training time and improving convergence stability.
5.2 Clinical Metadata Branch
The second branch processes structured clinical data using a Multi‑Layer Perceptron (MLP). This network learns non‑linear relationships between patient age, gender, and view position. By incorporating this information, the model gains contextual awareness, allowing it to adjust visual interpretations based on demographic and acquisition factors that influence radiographic appearance.
5.3 Late Fusion Integration
Rather than combining modalities at the input level, this project employs Late Fusion, merging high‑level representations from both branches. This approach allows each modality to learn independently before integration, reducing noise interference and improving interpretability. The fused feature vector is passed through fully connected layers with sigmoid activation to produce independent probability estimates for each pathology.
6. Training Strategy and Optimization
The model was trained using Weighted Binary Cross‑Entropy Loss, which treats each pathology as an independent classification problem while compensating for class imbalance. Optimization was performed using the Adam optimizer, chosen for its adaptive learning rate properties and robustness in deep networks.
Regularization techniques, including dropout and data augmentation, were applied to reduce overfitting. Image augmentations such as rotation, scaling, and horizontal flipping simulate real‑world variability in patient positioning and imaging conditions, improving generalization.
7. Model Evaluation and Metrics
In medical diagnosis, accuracy alone is insufficient. False negatives can delay treatment and have severe consequences. Therefore, the primary evaluation metric used in this project is the Area Under the Receiver Operating Characteristic Curve (AUC‑ROC).
AUC‑ROC measures the model’s ability to distinguish between diseased and healthy cases across all decision thresholds. It is particularly well‑suited to imbalanced datasets and provides a clinically meaningful assessment of diagnostic reliability.
[INSERT VISUAL 3 HERE: ROC curves for selected pathologies.]
8. Tools and Technologies
The project was implemented using a professional‑grade machine learning stack. Python served as the core programming language, with TensorFlow and Keras used for deep learning model construction. Pandas and NumPy supported data manipulation and numerical computation, while Scikit‑Learn provided preprocessing utilities. Visualization was performed using Matplotlib and Seaborn, and development was carried out in a Jupyter Notebook environment.
9. Impact and Applications
Radiant Diagnostics Buddy demonstrates strong potential for real‑world impact. Clinically, it can assist radiologists by prioritizing urgent cases and reducing workload. From a societal perspective, the system could be deployed in underserved or rural areas where access to specialist radiologists is limited, providing preliminary diagnostic support and improving healthcare equity.
10. Conclusion
This project demonstrates that effective medical AI systems require more than high predictive performance. Through rigorous data cleaning, patient‑wise evaluation, class‑balanced loss design, and a clinically inspired multi‑modal architecture, Radiant Diagnostics Buddy embodies responsible and advanced machine learning practice.
The system serves as both a strong academic capstone project and a foundation for future research in trustworthy, deployable medical artificial intelligence.