Towards Scalable and Policy-Compliant Sensor Placement for Large-Scale Air Quality Monitoring

Metric	Value
CPCB stations	~600
Population	1.4 billion
People per station	2.3 million
Area per station	5,400 km²


1. Surrogate Model — Predicts values + uncertainty; must be differentiable
2. Acquisition Function — Scores candidates; guides placement

(candidates)	(budget)	Evaluations
1,000	50	50,000
20,000	100	2,000,000

	Greedy MI	GD-MI (Ours)
Search space	Discrete ( candidates)	Continuous (coordinates)
Optimization	Enumerate all	Gradient descent
Complexity		iterations
Scalability		✓ Any

Requirement	Why?
Predictions	To impute values at proposed locations
Calibrated uncertainty	To compute information gain
Differentiable	To backpropagate through
Fast inference	To iterate quickly

Model	RMSE ↓	NLL ↓
Gaussian Process	5.16	-0.19
Convolutional GNP	5.31	-0.30
Transformer NP	4.90	-0.44

Method	RMSE ↓
Random	7.2
MaxVar	6.3
GD-MI	5.8
Greedy MI	5.6

Method	RMSE ↓
Random	6.50
MaxVar	6.41
GSM (Ours)	6.12
Greedy MI	5.74

Strategy	All-India RMSE ↓
Existing network	9.25 ± 0.00
Random	7.69 ± 0.15
Greedy MaxVar	7.46 ± 0.00
GSM Indep (ours)	7.42 ± 0.09
GSM Joint (ours)	7.27 ± 0.03

# The Coverage Gap <div class="cols"> <div class="col"> <img src="assets/images/india_urban_only_coverage.png" style="max-height:340px;" /> </div> <div class="col"> **Urban-only coverage** - Sensors in Delhi, Mumbai, Chennai, Bangalore, Kolkata - Rural India remains **invisible** **Hundreds of millions** unmonitored No data → No policy → No protection </div> </div> ---

# Global Comparison: Room for Growth | Country | Stations ↑ | People/Station ↓ | Stations/1000 km² ↑ | |---------|----------:|--------------:|-----------------:| | USA | 4,800 | 69K | 0.49 | | China | 5,000 | 280K | 0.52 | | Germany | 500 | 166K | 1.40 | | UK | 300 | 223K | 1.23 | | **India** | **611** | **2,290K** | **0.19** | India has **33× more people per station** than USA — significant opportunity for expansion ---

# From Prediction to Placement > Not all locations are equally informative <br> - Some regions are **well-understood** - Some regions are **uncertain** <br> > Place sensors where they reduce uncertainty the most ---

---

![bg contain](assets/images/sustainability_lab_prompt_subtitles_generated_20251217_162815.png)

---

# Main Takeaways 1. **GD-MI** = First gradient-based MI maximization for sensor placement 2. **Scalability breakthrough:** $O(I)$ vs $O(n \cdot k)$ - Enables **continental-scale** optimization 3. **Quality preserved:** Matches Greedy MI where tractable, **4% better** than MaxVar at scale 4. **Real-world ready:** Deployed framework for India air quality monitoring

---

# Thank You! <div style="display:flex; justify-content:center; gap:3rem; margin:1.5rem 0;"> <div class="qr" style="text-align:center;"> <img src="assets/images/qr_paper.png" /> <div style="font-size:0.7em; margin-top:0.3rem;">Paper</div> </div> <div class="qr" style="text-align:center;"> <img src="assets/images/qr_lab.png" /> <div style="font-size:0.7em; margin-top:0.3rem;">Lab</div> </div> </div> **Sustainability Lab @ IIT Gandhinagar** Positions: PhD · Postdoc · RA · Intern {patel_zeel, vinayak.rana, nipun.batra}@iitgn.ac.in

---

_paginate: false # Backup Slides --- # Impact: Same Budget, Better Outcomes **The cascade effect:** Better placement (GD-MI) ↓ Better predictions (lower RMSE) ↓ Better pollution maps (policy-ready) ↓ Better health outcomes (targeted interventions) > **4% RMSE improvement** at national scale = **millions** of people better served --- # Dataset: WUSTL PM₂.₅ - **Source:** Washington University in St. Louis - **Resolution:** 0.1° × 0.1° (~11 km) - **Period:** 1998-2018 (21 years monthly) - **Split:** Train 98-08 | Val 09-10 | Test 11-18 --- # Full Model Benchmark | Model | NLL ↓ | RMSE ↓ | |-------|:-----:|:------:| | CNP | 0.48 | 11.46 | | Random Forest | -0.11 | 6.55 | | GP | -0.19 | 5.16 | | ConvCNP | -0.27 | 5.28 | | ConvGNP | -0.30 | 5.31 | | TabPFN | -0.37 | 5.09 | | **TNP-D** | **-0.44** | **4.90** | --- # Constraint: Keep Sensors on Land $$\mathcal{L}_{\text{OOR}} = \sum_i \exp\Big(\text{dist}(\color{#00a651}{x_i}, \text{land}) - \delta\Big) - 1$$ Soft penalty grows exponentially as <span class="green">sensors</span> drift toward ocean --- # Scalability Analysis ![width:600px center](assets/images/scalability.png) - GD-MI runtime **independent of pool size** - At 20K candidates: GD-MI is **100× faster** than Greedy MI - Enables continental-scale optimization --- # More Results: k = 50 ![width:700px center](assets/images/quality_50_india.png) --- # More Results: k = 100 ![width:700px center](assets/images/quality_100_india.png)

> Same Madhya Pradesh setup ($n=308$, $k=8$)

Towards Scalable and Policy-Compliant Sensor Placement for Large-Scale Air Quality Monitoring

Air Pollution: A Global Health Crisis

Current Air Quality Monitoring in India

Why Not Just Add More Sensors?

The Core Question

Problem Formulation

Optimal Sensor Placement (OSP)

Acquisition 1: Random

Acquisition 2: Maximum Variance (MaxVar)

The MI Objective: What We Want to Maximize

MI vs MaxVar: What's the Difference?

Acquisition 3: Greedy MI (Standard Approach)

The Challenge

GD-MI: Gradient-Based Mutual Information Maximization

Our Solution: GD-MI

GD-MI: How It Works

Surrogate Model: Why Neural Processes?

Surrogate Model: Transformer Neural Process (TNP-D)

Experiments

Experiment 1: Regional Validation

Experiment 2: India-Scale (The Real Test)

Why Does GD-MI Win? Qualitative Analysis

Scalability: GD-MI vs Greedy MI

GD-MI: Limitations

GSM: Gumbel-Softmax for Constrained Sensor Placement

Regional Budget Constraints

From Constraints to Selection

From Constraints to Selection

From Constraints to Selection

From Constraints to Selection

How do we make sampling differentiable?

Gumbel-Softmax: Making Sampling Differentiable

GSM: How It Works

GSM: How It Works

The learnable logit matrix W

GSM: How It Works

Enforcing regional budgets via masking

GSM: How It Works

GSM: How It Works

GSM: How It Works

Does It Converge? Training Dynamics

Experiments

Regional Validation: GSM vs Baselines

Continental Results: Unconstrained Deployment

Main Result: Constrained Deployment

Why does GSM Joint perform better?

Qualitative Comparison of Deployments

GSM: Limitations

Future Work

Publications