Environmental Factor – November 2020: Data scientists in Africa tackle
Using advanced data science tools to support environmental health research in Africa was the focus of a Sept. 23 seminar, part of a series on the state of data science. The series is sponsored by Harnessing Data Science for Health Discovery and Innovation in Africa (DS-I Africa), a program of the National Institutes of Health (NIH) Common Fund (see sidebar).
Integrating diverse data sources
Collman emphasized the complex challenge of documenting a person’s exposures. Researchers draw on sources such as those listed below, combined with geospatial (geographic information system, or GIS) or temporal satellite data, she explained.
- Sampling air, water, and soil either directly or with sensors.
- Personal biomonitoring.
- Molecular-level markers of exposure in biosamples.
“One of the major challenges of our field is integrating all these streams of data [so] that we can usefully depict exposure over a person’s lifetime, a field known as exposomics,” Collman said.
Three panelists shared their experiences.
Overcoming human limitations
Exposome data are collected in Africa but gaps remain in the content and access to it. Computational methods and tools are needed to make complex data useful to health professionals and policymakers.
Berhane discussed machine learning, particularly predictive models. Because more data are not necessarily better data, researchers can use human-supervised machine learning to weed out messy or incomplete information. For example, hospital data, available in electronic form in much of the developed world, is manually collected in African hospitals.
“Africa is already facing multiple challenges, such as a wide range of exposures combined with rapid urbanization and industrialization,” said Berhane. “In the face of all this, there’s a lack of high-quality data and limited human capacity in these areas.”
End-to-end data systems are being developed to address pressing environmental health problems like air pollution. From data collection hardware to applying machine learning and data science methods, such systems can generate spatial and temporal air quality patterns across a city.
In Uganda’s capital Kampala, Bainomugisha applies computational methods and tools to the city’s environmental health challenges. He is the project lead for AirQo (see sidebar), which builds and deploys custom internet-connected devices to measure air quality, for example, mounting them on roofs or motor scooters. Policymakers can use the data to develop regulations to protect health, he explained.
“When experts in fields like computer science work on environmental health issues, we can get new innovations,” Collman observed. “For example, the boda boda scooters are one of the main sources of transportation around Kampala. The novelty here is that they take readings every 90 seconds and create very large, geo-tagged, real-time datasets that can map air pollution levels around the city.”
Taking data back to the people
In South Africa, Wright merges GIS, meteorological, socioeconomic, qualitative,…