View Full Version : Machine Learning... or drown in data

Roger E. Moore
2019-Jan-22, 04:27 PM
The author argues that the only way forward for Astronomy is to develop strategies to deal with planet-loads of data that would give a super-computer a stroke. Reaching out to non-astronomers and citizen scientists is strongly encouraged. The payoff is guaranteed to be magnitudes greater than we know, but what we will discover might turn out to include "unknown unknowns".



Pushing the Technical Frontier: From Overwhelmingly Large Data Sets to Machine Learning

Viviana Acquaviva (Submitted on 17 Jan 2019)

This paper summarizes my thoughts, given in an invited review at the IAU symposium 341 "Challenges in Panchromatic Galaxy Modelling with Next Generation Facilities", about how machine learning methods can help us solve some of the big data problems associated with current and upcoming large galaxy surveys.


[[Roger]] Interested to hear other people's thoughts on the paper, and on the use of non-astronomers to assist with Very Big Data.

2019-Jan-22, 10:37 PM
Understandably, this is an issue the Galaxy Zoo team wrestles with as we look to the enormous data sets expected from the LSST, Euclid... For our favorite kinds of tasks (galaxy morphology, finding New Weird Things), the only workable approach we can see at this point is machine analysis to do what it can (galaxies that are very well fit by simple models, galaxies where the first few classifiers all agree very closely), with humans providing training and consistency sets and looking over as much as possible of the data "lightly" to identify things not recognized by the algorithms at that point. And because they want to. (This has been stressed by volunteers over and over).

Team members have implemented several versions of this approach, and parallel runs show how much various approaches could speed up classifications of large samples (i.e. reduce the number of human views needed for a given confidence).

An example by Melanie Beck et al: Integrating human and machine intelligence in galaxy morphology classification tasks (http://adsabs.harvard.edu/abs/2018MNRAS.476.5516B)

Roger E. Moore
2019-Mar-20, 01:09 PM
Speaking of the data flood that's already upon us...


Modeling with the Crowd: Optimizing the Human-Machine Partnership with Zooniverse

Hugh Dickinson, Lucy Fortson, Claudia Scarlata, Melanie Beck, Mike Walmsley (Submitted on 19 Mar 2019)

LSST and Euclid must address the daunting challenge of analyzing the unprecedented volumes of imaging and spectroscopic data that these next-generation instruments will generate. A promising approach to overcoming this challenge involves rapid, automatic image processing using appropriately trained Deep Learning (DL) algorithms. However, reliable application of DL requires large, accurately labeled samples of training data. Galaxy Zoo Express (GZX) is a recent experiment that simulated using Bayesian inference to dynamically aggregate binary responses provided by citizen scientists via the Zooniverse crowd-sourcing platform in real time. The GZX approach enables collaboration between human and machine classifiers and provides rapidly generated, reliably labeled datasets, thereby enabling online training of accurate machine classifiers. We present selected results from GZX and show how the Bayesian aggregation engine it uses can be extended to efficiently provide object-localization and bounding-box annotations of two-dimensional data with quantified reliability. DL algorithms that are trained using these annotations will facilitate numerous panchromatic data modeling tasks including morphological classification and substructure detection in direct imaging, as well as decontamination and emission line identification for slitless spectroscopy. Effectively combining the speed of modern computational analyses with the human capacity to extrapolate from few examples will be critical if the potential of forthcoming large-scale surveys is to be realized.

Jean Tate
2019-Mar-20, 09:22 PM
This is a quite interesting paper (preprint, actually)! :)

Another Zooniverse one on a similar topic, "Radio Galaxy Zoo: ClaRAN - A Deep Learning Classifier for Radio Morphologies" (arXiv:1805.12008 (https://arxiv.org/abs/1805.12008)):

The upcoming next-generation large area radio continuum surveys can expect tens of millions of radio sources, rendering the traditional method for radio morphology classification through visual inspection unfeasible. We present ClaRAN - Classifying Radio sources Automatically with Neural networks - a proof-of-concept radio source morphology classifier based upon the Faster Region-based Convolutional Neutral Networks (Faster R-CNN) method. Specifically, we train and test ClaRAN on the FIRST and WISE images from the Radio Galaxy Zoo Data Release 1 catalogue. ClaRAN provides end users with automated identification of radio source morphology classifications from a simple input of a radio image and a counterpart infrared image of the same region. ClaRAN is the first open-source, end-to-end radio source morphology classifier that is capable of locating and associating discrete and extended components of radio sources in a fast (< 200 milliseconds per image) and accurate (>= 90 %) fashion. Future work will improve ClaRAN's relatively lower success rates in dealing with multi-source fields and will enable ClaRAN to identify sources on much larger fields without loss in classification accuracy.

Yes, it's very much a concept/early test result, and clearly there's a very long way to go, but the SKA will generate data in amounts and rates comparable to the LSST. And radio morphologies are considerably less well understood than optical (galaxy) ones (plug: go to Radio Galaxy Zoo (https://radio.galaxyzoo.org/#/classify) to do your part in catching up).

Roger E. Moore
2019-Apr-17, 01:10 PM
More like a short handbook on the topic: 37 pages, 18 figures


Machine Learning in Astronomy: a practical overview

Dalya Baron (Submitted on 15 Apr 2019)

Astronomy is experiencing a rapid growth in data size and complexity. This change fosters the development of data-driven science as a useful companion to the common model-driven data analysis paradigm, where astronomers develop automatic tools to mine datasets and extract novel information from them. In recent years, machine learning algorithms have become increasingly popular among astronomers, and are now used for a wide variety of tasks. In light of these developments, and the promise and challenges associated with them, the IAC Winter School 2018 focused on big data in Astronomy, with a particular emphasis on machine learning and deep learning techniques. This document summarizes the topics of supervised and unsupervised learning algorithms presented during the school, and provides practical information on the application of such tools to astronomical datasets. In this document I cover basic topics in supervised machine learning, including selection and preprocessing of the input dataset, evaluation methods, and three popular supervised learning algorithms, Support Vector Machines, Random Forests, and shallow Artificial Neural Networks. My main focus is on unsupervised machine learning algorithms, that are used to perform cluster analysis, dimensionality reduction, visualization, and outlier detection. Unsupervised learning algorithms are of particular importance to scientific research, since they can be used to extract new knowledge from existing datasets, and can facilitate new discoveries.