SETI

The search for intelligent life, crunching data courtesy of the IBM Cloud.

G. Adam Cox

Physicist, Data Scientist and Developer Advocate

The SETI@IBMCloud project

The SETI Institute observes radio signals from outer space, monitoring for signs of intelligent life. They do this 24/7, using an array of 42 satellite dishes located in northern California.

Why does this partnership matter to IBM? It presents unique data processing and machine learning problems that the IBM Cloud can help handle.

Data problems, at scale

The SETI Institute and its collaborators have moved more than 16 TBs of data onto IBM Object Storage and Db2 Warehouse on Cloud, and have analyzed that data using the IBM Spark framework. Already, the project has produced new machine learning models that are remarkably accurate. (Run the two winning models [1, 2] on the IBM Data Science Experience and reproduce their work.)

A simulation of a narrow band radio signal, as observed from outer space. — A classic narrow band signal, simulated as part of the model-training process.

ibmos2spark

Working with IBM Object Storage so closely meant uncovering some warts. The WDP developer advocacy team, in conjunction with DSX engineering, developed the ibmos2spark package to help set Spark Hadoop configurations for connecting to object storage. It helped SETI collaborators spend less time on data ingestion, allowing them to focus on their machine learning models.

Previously, configuration information would have to be copy and pasted each time users wanted to pull data in object storage down into a DataFrame for computation in a Jupyter notebook. With the ibmos2spark package, users need to configure connections once, like so:

ibmos2spark comes pre-installed on the IBM Data Science Experience, with a UI feature for inserting credentials to code for faster data ingestion. The package works for these languages:

Python
Scala
R, sparklyR
R, sparkR

Projects

A.I. for E.T.
Medium | GitHub

Using Watson Data Platform, we've built a citizen scientist project to apply deep learning to improve the state of the art in the search for extraterrestrial intelligence (SETI) research.
- Spark
- Object Storage
- Machine Learning
Defensive IBM Object Storage Containers
Medium

How to prevent accidental writes when collaborating on data analysis.
- Object Storage
Simulating E.T. Radio Signals
Medium

How to insert individual files into object storage from within a map function in Apache Spark.
- Object Storage
- Spark
SETI@IBMCloud: SETI data, publicly available, from IBM
GitHub | dataset | developerWorks

Raw data from the SETI Institute for you to crunch in the search for extraterrestrial life.
- Spark
- Object Storage
- Jupyter Notebook
- Python

SETI

The SETI@IBMCloud project

Data problems, at scale

ibmos2spark

Projects

A.I. for E.T. Medium | GitHub

Defensive IBM Object Storage Containers Medium

Simulating E.T. Radio Signals Medium

SETI@IBMCloud: SETI data, publicly available, from IBM GitHub | dataset | developerWorks

A.I. for E.T.
Medium | GitHub

Defensive IBM Object Storage Containers
Medium

Simulating E.T. Radio Signals
Medium

SETI@IBMCloud: SETI data, publicly available, from IBM
GitHub | dataset | developerWorks