The Data Lab menu
  • Topicsarrow_drop_down
  • Projects
  • Aboutarrow_drop_down
  • Get Connected
  • Machine Learning for Developers
  • Notebooks for Developers
  • Offline First
  • Partners + Data
  • Serverless + Data
  • Working with JSON
  • Team
  • Get Involved
  • Code of Conduct
  • Event Support
  • Projects
  • Topicsarrow_drop_down
  • Aboutarrow_drop_down
  • Get Connected
  • Machine Learning for Developers
  • Notebooks for Developers
  • Offline First
  • Partners + Data
  • Serverless + Data
  • Working with JSON
  • Team
  • Get Involved
  • Code of Conduct
  • Event Support
Partners + Data / Collection

SETI

The search for intelligent life, crunching data courtesy of the IBM Cloud.

G. Adam Cox
Physicist, Data Scientist and Developer Advocate
More by G. Adam Cox

The SETI@IBMCloud project

The SETI Institute observes radio signals from outer space, monitoring for signs of intelligent life. They do this 24/7, using an array of 42 satellite dishes located in northern California.

Why does this partnership matter to IBM? It presents unique data processing and machine learning problems that the IBM Cloud can help handle.

Data problems, at scale

The SETI Institute and its collaborators have moved more than 16 TBs of data onto IBM Object Storage and Db2 Warehouse on Cloud, and have analyzed that data using the IBM Spark framework. Already, the project has produced new machine learning models that are remarkably accurate. (Run the two winning models [1, 2] on the IBM Data Science Experience and reproduce their work.)

A simulation of a narrow band radio signal, as observed from outer space.
A classic narrow band signal, simulated as part of the model-training process.

ibmos2spark

Working with IBM Object Storage so closely meant uncovering some warts. The WDP developer advocacy team, in conjunction with DSX engineering, developed the ibmos2spark package to help set Spark Hadoop configurations for connecting to object storage. It helped SETI collaborators spend less time on data ingestion, allowing them to focus on their machine learning models.

Previously, configuration information would have to be copy and pasted each time users wanted to pull data in object storage down into a DataFrame for computation in a Jupyter notebook. With the ibmos2spark package, users need to configure connections once, like so:

ibmos2spark comes pre-installed on the IBM Data Science Experience, with a UI feature for inserting credentials to code for faster data ingestion. The package works for these languages:

  • Python
  • Scala
  • R, sparklyR
  • R, sparkR
↓ View projects in this collection

Projects

  • A.I. for E.T.
    Medium | GitHub

    Using Watson Data Platform, we've built a citizen scientist project to apply deep learning to improve the state of the art in the search for extraterrestrial intelligence (SETI) research.

    • Spark
    • Object Storage
    • Machine Learning
  • Defensive IBM Object Storage Containers
    Medium

    How to prevent accidental writes when collaborating on data analysis.

    • Object Storage
  • Simulating E.T. Radio Signals
    Medium

    How to insert individual files into object storage from within a map function in Apache Spark.

    • Object Storage
    • Spark
  • SETI@IBMCloud: SETI data, publicly available, from IBM
    GitHub | dataset | developerWorks

    Raw data from the SETI Institute for you to crunch in the search for extraterrestrial life.

    • Spark
    • Object Storage
    • Jupyter Notebook
    • Python
© 2017 IBM Watson Data Lab