We are an interdisciplinary research lab at the Spatial Sciences Institute, University of Southern California. Our research focus lies at the intersection of computer science and spatial sciences where we build intelligent algorithms and applications for:
Here are some keywords for the research topics we are working on:
Since 2013, more than 80 students and 6 postdoctoral researchers have worked in the lab, including one local high school student, a number of visiting international students, and some USC undergraduate and graduate students GeoDisgn, electrical engineering, spatial informatics, computer science, and data informatics. One-third of the 50+ research students are female students in science and engineering.
#spark #scala #python #postgres #postgis #qigs #ai #information_integration #data_mining #machine_learning #semantic_web_technologies #knowledge_graph #giscience
JonSnow: Mining Large Online Datasets for Air Quality Prediction
SANSA: Building Knowledge Graph to Support Event Prediction
KarmaCAD: Automatically Inferring User Intent of CAD Models
Karma: A Data Integration Project
#opencv #tesseract #qgis #postgres #postgis #optical_character_recognition #deep_learning #cnn #image_processing #computer_vision #graphics_recognition #pattern_recognition #machine_learning #semantic_web_technologies #giscience
Context-based Map Processing (ContextMP): Fully Automatic Recognition System for Processing Big Map Archives
STRABO: Text Recognition in Maps
#qgis #arcgis #web_map #information_integration #giscience #gis
JonSnow: Mining public datasets for modeling intra-city PM2.5 concentrations at a fine spatial resolution.
Air quality models are important for studying the impact of air pollutant on health conditions. Existing work typically relies on area-specific, expert-selected attributes of pollution emissions (e,g., transportation) and dispersion (e.g., meteorology) for building the model for each combination of study areas, pollutant types, and spatiotemporal scales.
In this project, we are building a data mining approach, JonSnow, which utilizes publicly available OpenStreetMap (OSM) data to automatically generate air quality model for the concentrations for any type of pollutants at various temporal scales. Our approach utilizes the PRISMS-DSCIC infrastructure (image below) developed at USC Information Sciences Institute as the data collection, manipulation, and analysis platform.
The PRISMS-DSCIC (Pediatric Research using Integrated Sensor Monitoring Systems - Data and Software Coordination and Integration Center) is an NIH-NIBIB (National Institutes of Health - National Institute of Biomedical Imaging and Bioengineering) funded initiative to address pediatric asthma as a chronic disease of childhood. PRISMS-DSCIC is responsible for collecting, storing, integrating, and analyzing real-time environmental, physiological and behavioral data obtained from heterogeneous sensors and traditional data sources to help researchers to predict and prevent asthma attacks efficiently.
JonSnow automatically generates (domain-) expert-free model for accurate PM2.5 concentration predictions, which can be used to improve air quality models that traditionally rely on expert-selected input.
The image below shows the PM2.5 AQI predictions from JonSnow (left) and the traditional spatial interpolation method, IDW (right), for Dec 2016 (top) and Jan 2017 (bottom). As expected, IDW could not generate fine-scale predictions while JonSnow successfully identified intra-city areas where the air quality is typically poor (e.g., the south part of the city near the port of San Pedro and downtown Los Angeles).
KarmaCAD: Automatically Inferring User Intent of CAD Models
The difficulties in CAD data interoperability arise from the need for using heterogeneous CAD systems and the lack of a proper notation for describing CAD designs. Existing CAD systems each have their data formats, and the information recorded in these formats vary significantly. Some systems can record design histories for representing the design rationales, but others do not.
In this project, we study the CAD interoperability problem focusing on developing a semi-automatic approach that allows CAD users to record their design intention efficiently. We first created our JSON format to represent the fixed and dimension relations of CAD designs. Then we built a SolidWorks plugin called RelationFixer, which can
Given a few examples of the desired and undesired variations of a CAD design, RelationFixer learns the design rationales automatically and selects a set of fixed and dimension relations that best represent the design rationales. We recreated the ambiguous CAD designs described in Raghothama and Shapiro’s previous work (Raghothama and Shapiro, 2002) and tested RelationFixer with these CAD designs. In the experiment, RelationFixer successfully learned the design rationales from a few examples of design variations and generated constraints between sketches in the CAD models to prevent possible ambiguities.
SANSA: Building Knowledge Graph to Support Event Prediction
A domain expert can process heterogeneous data to make meaningful interpretations or predictions from the data. For example, by looking at research papers and patent records, an expert can determine the maturity of an emerging technology and predict the geographic location(s) and time (e.g., in a certain year) where and when the technology will be a success. However, this is an expert- and manual-intensive task.
In this project, we are building an end-to-end system, Sansa, that leverages data collected from public sources to predict the (geographic) center(s) of a type of technology and when the center(s) will emerge. In our pilot study, we used Sansa to predict the future (geographic) center(s) for fuel cell technologies. Sansa extracts and cleanses data from public sources including research papers and patent records. After data extraction and cleansing, Sansa uses an ontology-based data integration method to generate knowledge graphs in the RDF (Resource Description Framework) format and enables users to switch quickly between machine learning models for predictive analytic tasks. Here's a demonstration of Sansa:
Our work on SANSA won the USGIF and NVIDIA GPU Essay Challenge: “If you were given dedicated access to an NVIDIA GPU-powered supercomputer, what problems could you solve?”
The Visual History Archive (VHA) in the USC Shoah Foundation contains a large digital life story collection of survivors before, during, and after the Holocaust and other genocides. Currently, location information (e.g., place names) mentioned in the VHA is indexed by keywords. For example, using “Poland” as the keyword for place search on the VHA Online returns 5,325 indexing terms in which the indexing terms (place names) with verified locations are displayed in a Google Maps web interface. Since place names and administrative boundaries can change significantly over time, displaying search results on a current map would not provide the best visualization tool for navigating the VHA digital collection through space and time. In addition, a number places mentioned (indexed) in the testimony could not be located due to the lack of historical sources for verifying the location information of these places. This limits the opportunity for researchers, educators, and the general public to access valuable VHA materials and prevents the VHA collection from being indexed and searched by advanced spatial queries (e.g., finding the testimonies mentioned cities or towns in Poland between 1930 and 1945).
Historical maps are a great source of detailed place information in the past. For example, during World War II (WWII), the US Army Map Service (AMS) created around 40,000 maps covering a significant amount area. Other map sources provide detailed historical pre- and post-WWII maps, such as the Polish mapping company, Centrum Kartografii, which offers pre-WWII maps of Poland with a comprehensive list of place names including towns, manufacturing plants, monuments, etc. These historical maps can be found in either paper or scanned (digital) format in map archives such as the David Rumsey Map Collection or libraries including the USC Libraries, UCLA Map Libraries, Western Michigan Libraries, and the Library of Congress. The problem we are addressing here is how to systematically and effectively link places mentioned in the VHA collection to relevant historical maps and other historical materials.
See the poster presented at the 28th International Cartographic Conference for a project overview.
Context-based Map Processing (ContextMP): Automatically Training Deep Learning Models for Image Recognition
Detailed data on the states and changes of landscapes in the past is essential for understanding the causes and consequences of environmental change, which supports a variety of studies, such as cancer and environmental epidemiology. However, existing data sources (e.g., online mapping services) typically contain only contemporary information. Historical maps are a great source of geographic information in the past and are often the only source that provides professionally surveyed historical data. In the U.S., the U.S. Geological Survey (USGS) has created over 200,000 topographic maps since 1884. According to the USGS, in the United States these topographic maps “portray both natural and manmade features. They show and name works of nature including mountains, valleys, plains, lakes, rivers, and vegetation. They also identify the principal works of man, such as roads, boundaries, transmission lines, and major buildings.”
In this project, we are developing a novel map processing system for automated recognition of geographic information from the 200,000 plus scanned maps in the USGS Historical Topographic Map Series. Our system, Arya, uses “contextual data” to determine likely locations of geographic features of interest in a map and uses the locations to automatically collect training data for building convolutional neural networks (CNN). Contextual data are contemporary spatial layers (e.g., current railroad locations) for which at many of these locations railroad pixels can be found in historical maps of the same area.Also see our NSF project website here.
ArcKarma: Efficient cleaning and transformation of geospatial data attributes (an Esri ArcGIS plugin)
A significant challenge in handling geographic datasets is that the datasets can come from heterogeneous sources with various data qualities and formats. Before these datasets can be used in a Geographic Information System (GIS) for spatial analysis or to create maps, a typical task is to clean the attribute data and transform the data into a uniform format. However, conventional GIS products focus on manipulating the spatial component of geographic features and only offer basic tools for editing the attribute data (e.g., one row at a time). This limits the capability for handling large datasets in a GIS since manually editing and transforming attribute data between different formats is not practical for thousands of geographic features. We present ArcKarma, which is built on our previous work on data transformation, to efficiently clean and transform data attributes in a GIS. ArcKarma generates transformation programs from a few user-provided examples and applies these programs to transform individual attribute columns into the desired formats. We show that ArcKarma produces accurate results and eliminates the need for laborious manual data cleaning and scripting tasks.USC Karma
System and Method for Fusing Geospatial Data, Chen, C.-C., Knoblock, C. A., Shahabi, C., and Chiang, Y.-Y. US Patent No. 7660441.
We are always looking for talented and motivated students and summer interns to work on exciting problems in spatial science, data science, and computer science.
Before you send us an email, please talk to at least one PhD student in the group. We are receiving many applications every semester, and we have limited positions. In your email, please include 1) your resume, 2) a short description of your research interests, 3) the PhD student whom you have talked to, and 4) the research projects that you'd like to participate.
Directed research students (Computer Science, Data Informatics, or GIST students): we ask you to put down at least 15 hours a week so that you will have enough time to finish a cool project.
We love to work with undergraduate students. Join us and gain experience in research and build some awesome applications!
We welcome visitors and students from other schools. We had great experiences with international summer interns in the past. Come work with us in Los Angeles and enjoy the nice weather!
We take PhD students from both the USC Ph.D. programs in Computer Science and in Population, Health, and Place. If you want to work with our faculty members for your Ph.D. study, you are welcome to send us an email describing your research interest before you apply. This way we will be able to find your application after the admission committee has reviewed your profile. However, in most of the cases, we will not be able to tell you your chance of getting into the program.