Results 1 to 5 of 5

Thread: Machine Learning... or drown in data

  1. #1
    Join Date
    Sep 2004
    Location
    South Carolina
    Posts
    2,571

    Exclamation Machine Learning... or drown in data

    The author argues that the only way forward for Astronomy is to develop strategies to deal with planet-loads of data that would give a super-computer a stroke. Reaching out to non-astronomers and citizen scientists is strongly encouraged. The payoff is guaranteed to be magnitudes greater than we know, but what we will discover might turn out to include "unknown unknowns".

    ===

    https://arxiv.org/abs/1901.05978

    Pushing the Technical Frontier: From Overwhelmingly Large Data Sets to Machine Learning

    Viviana Acquaviva (Submitted on 17 Jan 2019)

    This paper summarizes my thoughts, given in an invited review at the IAU symposium 341 "Challenges in Panchromatic Galaxy Modelling with Next Generation Facilities", about how machine learning methods can help us solve some of the big data problems associated with current and upcoming large galaxy surveys.

    ===

    [[Roger]] Interested to hear other people's thoughts on the paper, and on the use of non-astronomers to assist with Very Big Data.
    There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.
    Mark Twain, Life on the Mississippi (1883)

  2. #2
    Join Date
    Mar 2004
    Posts
    3,154
    Understandably, this is an issue the Galaxy Zoo team wrestles with as we look to the enormous data sets expected from the LSST, Euclid... For our favorite kinds of tasks (galaxy morphology, finding New Weird Things), the only workable approach we can see at this point is machine analysis to do what it can (galaxies that are very well fit by simple models, galaxies where the first few classifiers all agree very closely), with humans providing training and consistency sets and looking over as much as possible of the data "lightly" to identify things not recognized by the algorithms at that point. And because they want to. (This has been stressed by volunteers over and over).

    Team members have implemented several versions of this approach, and parallel runs show how much various approaches could speed up classifications of large samples (i.e. reduce the number of human views needed for a given confidence).

    An example by Melanie Beck et al: Integrating human and machine intelligence in galaxy morphology classification tasks

  3. #3
    Join Date
    Sep 2004
    Location
    South Carolina
    Posts
    2,571
    Speaking of the data flood that's already upon us...

    https://arxiv.org/abs/1903.07776

    Modeling with the Crowd: Optimizing the Human-Machine Partnership with Zooniverse

    Hugh Dickinson, Lucy Fortson, Claudia Scarlata, Melanie Beck, Mike Walmsley (Submitted on 19 Mar 2019)

    LSST and Euclid must address the daunting challenge of analyzing the unprecedented volumes of imaging and spectroscopic data that these next-generation instruments will generate. A promising approach to overcoming this challenge involves rapid, automatic image processing using appropriately trained Deep Learning (DL) algorithms. However, reliable application of DL requires large, accurately labeled samples of training data. Galaxy Zoo Express (GZX) is a recent experiment that simulated using Bayesian inference to dynamically aggregate binary responses provided by citizen scientists via the Zooniverse crowd-sourcing platform in real time. The GZX approach enables collaboration between human and machine classifiers and provides rapidly generated, reliably labeled datasets, thereby enabling online training of accurate machine classifiers. We present selected results from GZX and show how the Bayesian aggregation engine it uses can be extended to efficiently provide object-localization and bounding-box annotations of two-dimensional data with quantified reliability. DL algorithms that are trained using these annotations will facilitate numerous panchromatic data modeling tasks including morphological classification and substructure detection in direct imaging, as well as decontamination and emission line identification for slitless spectroscopy. Effectively combining the speed of modern computational analyses with the human capacity to extrapolate from few examples will be critical if the potential of forthcoming large-scale surveys is to be realized.
    There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.
    Mark Twain, Life on the Mississippi (1883)

  4. #4
    Join Date
    Feb 2010
    Posts
    710
    This is a quite interesting paper (preprint, actually)!

    Another Zooniverse one on a similar topic, "Radio Galaxy Zoo: ClaRAN - A Deep Learning Classifier for Radio Morphologies" (arXiv:1805.12008):

    The upcoming next-generation large area radio continuum surveys can expect tens of millions of radio sources, rendering the traditional method for radio morphology classification through visual inspection unfeasible. We present ClaRAN - Classifying Radio sources Automatically with Neural networks - a proof-of-concept radio source morphology classifier based upon the Faster Region-based Convolutional Neutral Networks (Faster R-CNN) method. Specifically, we train and test ClaRAN on the FIRST and WISE images from the Radio Galaxy Zoo Data Release 1 catalogue. ClaRAN provides end users with automated identification of radio source morphology classifications from a simple input of a radio image and a counterpart infrared image of the same region. ClaRAN is the first open-source, end-to-end radio source morphology classifier that is capable of locating and associating discrete and extended components of radio sources in a fast (< 200 milliseconds per image) and accurate (>= 90 %) fashion. Future work will improve ClaRAN's relatively lower success rates in dealing with multi-source fields and will enable ClaRAN to identify sources on much larger fields without loss in classification accuracy.
    Yes, it's very much a concept/early test result, and clearly there's a very long way to go, but the SKA will generate data in amounts and rates comparable to the LSST. And radio morphologies are considerably less well understood than optical (galaxy) ones (plug: go to Radio Galaxy Zoo to do your part in catching up).

  5. #5
    Join Date
    Sep 2004
    Location
    South Carolina
    Posts
    2,571
    More like a short handbook on the topic: 37 pages, 18 figures

    https://arxiv.org/abs/1904.07248

    Machine Learning in Astronomy: a practical overview

    Dalya Baron (Submitted on 15 Apr 2019)

    Astronomy is experiencing a rapid growth in data size and complexity. This change fosters the development of data-driven science as a useful companion to the common model-driven data analysis paradigm, where astronomers develop automatic tools to mine datasets and extract novel information from them. In recent years, machine learning algorithms have become increasingly popular among astronomers, and are now used for a wide variety of tasks. In light of these developments, and the promise and challenges associated with them, the IAC Winter School 2018 focused on big data in Astronomy, with a particular emphasis on machine learning and deep learning techniques. This document summarizes the topics of supervised and unsupervised learning algorithms presented during the school, and provides practical information on the application of such tools to astronomical datasets. In this document I cover basic topics in supervised machine learning, including selection and preprocessing of the input dataset, evaluation methods, and three popular supervised learning algorithms, Support Vector Machines, Random Forests, and shallow Artificial Neural Networks. My main focus is on unsupervised machine learning algorithms, that are used to perform cluster analysis, dimensionality reduction, visualization, and outlier detection. Unsupervised learning algorithms are of particular importance to scientific research, since they can be used to extract new knowledge from existing datasets, and can facilitate new discoveries.
    There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.
    Mark Twain, Life on the Mississippi (1883)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •