the largest indoor activity and air pollution dataset with Distributed Air quaLiTy mONitors

Key Features


Uniqueness: This dataset offers extensive indoor air quality data from 30 indoor sites in developing Indian communities, annotated with daily activities and real-time pollution dynamics, filling a gap in large-scale datasets for developing nations.
  • Multi-device: Contains pollution measurements from multiple devices per site, with up to six devices in residential households, offering unique observations of spread.
  • Indoor types: Captures data from five types of locations (residential households, studio apartments, food canteens, classrooms, research labs) across 30 sites.
  • Frequent pollutants: Includes readings for indoor temperature, humidity, and eight pollutants (CO2, VOC, PM1, PM2.5, PM10, NO2, C2H5OH, CO).
  • Human annotations: Real-time activity labels collected via a speech-to-text app, providing necessary context for interpreting pollution readings.
  • Multi-city deployment: Data from four regions in India, covering rural, suburban, and urban populations.
  • Dataset duration: Data collected over six months (Summer and Winter), capturing seasonal pollution dynamics and behavioral variations.
Potential Applications: In general, the dataset can be used in the following applications.
  • Pollution Source Identification and Activity Monitoring: Records instances of pollution patterns with specific activities, aiding in source and activity classification.
  • Analysis of Spreading and Accumulation Patterns in Different Floor Plans: Useful for studying pollutant spread and accumulation in varied room structures.
  • Healthy Home Characterization and Indoor Design Improvement: Helps identify features to mitigate pollution spread, supporting healthier indoor designs.
  • Smart Device Control: Enables design of control policies for ACs, exhausts, air purifiers, and other ventilation devices.
Benchmarking & ML Applications: Enables research in pollution source detection, activity classification, pollutant spread in varied floor plans.
Dataset Size: Includes 89.1 million samples over six months, with 13,646 hours of air quality data and 3,957 activity annotations.

Contributors

prasenjit
Prasenjit Karmakar

IIT Kharagpur, India

swadhin
Swadhin Pradhan

Cisco Systems, US

sandip
Sandip Chakraborty

IIT Kharagpur, India

Teaser Video

Publications


  1. Karmakar, P., Pradhan, S. and Chakraborty, S., Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities. In Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024
  2. Karmakar, P., Pradhan, S. and Chakraborty, S., September. Exploiting Air Quality Monitors to Perform Indoor Surveillance: Academic Setting. In Adjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction, 2024
  3. Karmakar, P., Pradhan, S. and Chakraborty, S., Exploring Indoor Air Quality Dynamics in Developing Nations: A Perspective from India. In ACM Journal on Computing and Sustainable Societies, 2(3), 2024

Funding and Support



For questions and general feedback, contact Prasenjit Karmakar