“Deep Web Search” Can Help Scientists
When you do a simple web search on a topic, the results that appear aren’t the whole story. The Internet contains a vast mine of information – sometimes referred to as the “deep web” – that is not indexed by search engines: information that would be useful in tracking down criminals, terrorist activity, sex trafficking and the spread of disease. Scientists could also use it to search for images and data from spacecraft.
The Defense Advanced Research Projects Agency (DARPA) has developed tools as part of its Memex program that access and catalog this mysterious world online. Researchers at NASA’s Jet Propulsion Laboratory in Pasadena, Calif., Have joined Memex’s efforts to harness the benefits of scientific research on the deep web. Memex could, for example, help catalog the vast amounts of data provided daily by NASA spacecraft.
“We are developing next-generation search technologies that include people, places, objects and the connections between them,” said Chris Mattmann, principal investigator for JPL’s work on Memex.
Memex not only checks standard online text content, but also images, videos, pop-up ads, forms, scripts and other information storage to see how they relate to each other.
“We are augmenting web crawlers to behave like browsers, in other words, running scripts and reading advertisements like you would when you usually log in. This information is not normally cataloged by search engines, ”Mattmann said.
Also, a standard web search doesn’t get a lot of information from images and videos, but Memex can recognize what’s in that content and pair it with searches on the same topics. The search tool could identify the same object on multiple frames of a video or even on different videos.
Memex’s video and image search capabilities could one day benefit space missions that take photos, videos and other types of imaging data with instruments such as spectrometers. Finding visual information about a particular planetary body could greatly facilitate the work of scientists in analyzing geological features. Scientists analyzing imagery data from land missions that monitor phenomena such as snowfall and soil moisture could also benefit.
Memex would also improve the search for published scientific data, so that scientists are better informed about what has been published and analyzed on their subjects. The technology could be applied to large NASA data centers such as the Physical Oceanography Distributed Active Archive Center, which makes NASA ocean and climate data accessible and meaningful. Memex would make PDF documents more easily searchable and allow users to access the information they are looking for more easily. Knowledge of existing publications also helps program managers to assess the impact of spacecraft data.
All code written for Memex is open-source. JPL is one of 17 teams working there as part of the DARPA initiative.
Memex is linked to DARPA’s previous Big Data initiative called XDATA, managed by DARPA Program Manager Wade Shen. This research effort also aims to process and analyze large amounts of data, with military, government and civilian applications. JPL was one of 24 groups involved.
“We develop open source, free and mature products, then enhance them with the help of DARPA investments and easily transfer them through our roles to the scientific community,” said Mattmann.
A new area-specific research paradigm to combat human trafficking
Quote: “Deep Web Search” May Help Scientists (2015, May 25) Retrieved September 30, 2021 from https://phys.org/news/2015-05-deep-web-scientists.html
This document is subject to copyright. Other than fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.