SETI
The search for intelligent life, crunching data courtesy of the IBM Cloud.
The SETI@IBMCloud project
The SETI Institute observes radio signals from outer space, monitoring for signs of intelligent life. They do this 24/7, using an array of 42 satellite dishes located in northern California.
Why does this partnership matter to IBM? It presents unique data processing and machine learning problems that the IBM Cloud can help handle.
Data problems, at scale
The SETI Institute and its collaborators have moved more than 16 TBs of data onto IBM Object Storage and Db2 Warehouse on Cloud, and have analyzed that data using the IBM Spark framework. Already, the project has produced new machine learning models that are remarkably accurate. (Run the two winning models [1, 2] on the IBM Data Science Experience and reproduce their work.)
ibmos2spark
Working with IBM Object Storage so closely meant uncovering some warts. The WDP developer advocacy team, in conjunction with DSX engineering, developed the ibmos2spark package to help set Spark Hadoop configurations for connecting to object storage. It helped SETI collaborators spend less time on data ingestion, allowing them to focus on their machine learning models.
Previously, configuration information would have to be copy and pasted each time users wanted to pull data in object storage down into a DataFrame for computation in a Jupyter notebook. With the ibmos2spark package, users need to configure connections once, like so:
ibmos2spark comes pre-installed on the IBM Data Science Experience, with a UI feature for inserting credentials to code for faster data ingestion. The package works for these languages:
- Python
- Scala
- R, sparklyR
- R, sparkR
Projects
-
A.I. for E.T.
Medium | GitHubUsing Watson Data Platform, we've built a citizen scientist project to apply deep learning to improve the state of the art in the search for extraterrestrial intelligence (SETI) research.
-
Defensive IBM Object Storage Containers
MediumHow to prevent accidental writes when collaborating on data analysis.
-
Simulating E.T. Radio Signals
MediumHow to insert individual files into object storage from within a map function in Apache Spark.
-
SETI@IBMCloud: SETI data, publicly available, from IBM
GitHub | dataset | developerWorksRaw data from the SETI Institute for you to crunch in the search for extraterrestrial life.