Results 1 to 7 of 7

Thread: Machine Learning... or drown in data

  1. #1
    Join Date
    Sep 2004
    Posts
    4,335

    Exclamation Machine Learning... or drown in data

    The author argues that the only way forward for Astronomy is to develop strategies to deal with planet-loads of data that would give a super-computer a stroke. Reaching out to non-astronomers and citizen scientists is strongly encouraged. The payoff is guaranteed to be magnitudes greater than we know, but what we will discover might turn out to include "unknown unknowns".

    ===

    https://arxiv.org/abs/1901.05978

    Pushing the Technical Frontier: From Overwhelmingly Large Data Sets to Machine Learning

    Viviana Acquaviva (Submitted on 17 Jan 2019)

    This paper summarizes my thoughts, given in an invited review at the IAU symposium 341 "Challenges in Panchromatic Galaxy Modelling with Next Generation Facilities", about how machine learning methods can help us solve some of the big data problems associated with current and upcoming large galaxy surveys.

    ===

    [[Roger]] Interested to hear other people's thoughts on the paper, and on the use of non-astronomers to assist with Very Big Data.
    Do good work. —Virgil Ivan "Gus" Grissom

  2. #2
    Join Date
    Mar 2004
    Posts
    3,179
    Understandably, this is an issue the Galaxy Zoo team wrestles with as we look to the enormous data sets expected from the LSST, Euclid... For our favorite kinds of tasks (galaxy morphology, finding New Weird Things), the only workable approach we can see at this point is machine analysis to do what it can (galaxies that are very well fit by simple models, galaxies where the first few classifiers all agree very closely), with humans providing training and consistency sets and looking over as much as possible of the data "lightly" to identify things not recognized by the algorithms at that point. And because they want to. (This has been stressed by volunteers over and over).

    Team members have implemented several versions of this approach, and parallel runs show how much various approaches could speed up classifications of large samples (i.e. reduce the number of human views needed for a given confidence).

    An example by Melanie Beck et al: Integrating human and machine intelligence in galaxy morphology classification tasks

  3. #3
    Join Date
    Sep 2004
    Posts
    4,335
    Speaking of the data flood that's already upon us...

    https://arxiv.org/abs/1903.07776

    Modeling with the Crowd: Optimizing the Human-Machine Partnership with Zooniverse

    Hugh Dickinson, Lucy Fortson, Claudia Scarlata, Melanie Beck, Mike Walmsley (Submitted on 19 Mar 2019)

    LSST and Euclid must address the daunting challenge of analyzing the unprecedented volumes of imaging and spectroscopic data that these next-generation instruments will generate. A promising approach to overcoming this challenge involves rapid, automatic image processing using appropriately trained Deep Learning (DL) algorithms. However, reliable application of DL requires large, accurately labeled samples of training data. Galaxy Zoo Express (GZX) is a recent experiment that simulated using Bayesian inference to dynamically aggregate binary responses provided by citizen scientists via the Zooniverse crowd-sourcing platform in real time. The GZX approach enables collaboration between human and machine classifiers and provides rapidly generated, reliably labeled datasets, thereby enabling online training of accurate machine classifiers. We present selected results from GZX and show how the Bayesian aggregation engine it uses can be extended to efficiently provide object-localization and bounding-box annotations of two-dimensional data with quantified reliability. DL algorithms that are trained using these annotations will facilitate numerous panchromatic data modeling tasks including morphological classification and substructure detection in direct imaging, as well as decontamination and emission line identification for slitless spectroscopy. Effectively combining the speed of modern computational analyses with the human capacity to extrapolate from few examples will be critical if the potential of forthcoming large-scale surveys is to be realized.
    Do good work. —Virgil Ivan "Gus" Grissom

  4. #4
    Join Date
    Feb 2010
    Posts
    772
    This is a quite interesting paper (preprint, actually)!

    Another Zooniverse one on a similar topic, "Radio Galaxy Zoo: ClaRAN - A Deep Learning Classifier for Radio Morphologies" (arXiv:1805.12008):

    The upcoming next-generation large area radio continuum surveys can expect tens of millions of radio sources, rendering the traditional method for radio morphology classification through visual inspection unfeasible. We present ClaRAN - Classifying Radio sources Automatically with Neural networks - a proof-of-concept radio source morphology classifier based upon the Faster Region-based Convolutional Neutral Networks (Faster R-CNN) method. Specifically, we train and test ClaRAN on the FIRST and WISE images from the Radio Galaxy Zoo Data Release 1 catalogue. ClaRAN provides end users with automated identification of radio source morphology classifications from a simple input of a radio image and a counterpart infrared image of the same region. ClaRAN is the first open-source, end-to-end radio source morphology classifier that is capable of locating and associating discrete and extended components of radio sources in a fast (< 200 milliseconds per image) and accurate (>= 90 %) fashion. Future work will improve ClaRAN's relatively lower success rates in dealing with multi-source fields and will enable ClaRAN to identify sources on much larger fields without loss in classification accuracy.
    Yes, it's very much a concept/early test result, and clearly there's a very long way to go, but the SKA will generate data in amounts and rates comparable to the LSST. And radio morphologies are considerably less well understood than optical (galaxy) ones (plug: go to Radio Galaxy Zoo to do your part in catching up).

  5. #5
    Join Date
    Sep 2004
    Posts
    4,335
    More like a short handbook on the topic: 37 pages, 18 figures

    https://arxiv.org/abs/1904.07248

    Machine Learning in Astronomy: a practical overview

    Dalya Baron (Submitted on 15 Apr 2019)

    Astronomy is experiencing a rapid growth in data size and complexity. This change fosters the development of data-driven science as a useful companion to the common model-driven data analysis paradigm, where astronomers develop automatic tools to mine datasets and extract novel information from them. In recent years, machine learning algorithms have become increasingly popular among astronomers, and are now used for a wide variety of tasks. In light of these developments, and the promise and challenges associated with them, the IAC Winter School 2018 focused on big data in Astronomy, with a particular emphasis on machine learning and deep learning techniques. This document summarizes the topics of supervised and unsupervised learning algorithms presented during the school, and provides practical information on the application of such tools to astronomical datasets. In this document I cover basic topics in supervised machine learning, including selection and preprocessing of the input dataset, evaluation methods, and three popular supervised learning algorithms, Support Vector Machines, Random Forests, and shallow Artificial Neural Networks. My main focus is on unsupervised machine learning algorithms, that are used to perform cluster analysis, dimensionality reduction, visualization, and outlier detection. Unsupervised learning algorithms are of particular importance to scientific research, since they can be used to extract new knowledge from existing datasets, and can facilitate new discoveries.
    Do good work. —Virgil Ivan "Gus" Grissom

  6. #6
    Join Date
    Sep 2004
    Posts
    4,335
    Good news for arXiv readers! If you have an account, get your news sorted by AI at this website: IArxiv.org . No kidding!

    https://arxiv.org/abs/2002.02460

    Intelligent Arxiv: Sort daily papers by learning users topics preference
    Ezequiel Alvarez (ICAS), Federico Lamagna (CAB), Cesar Miquel (Easytech), Manuel Szewc (ICAS)
    (Submitted on 6 Feb 2020)

    Current daily paper releases are becoming increasingly large and areas of research are growing in diversity. This makes it harder for scientists to keep up to date with current state of the art and identify relevant work within their lines of interest. The goal of this article is to address this problem using Machine Learning techniques. We model a scientific paper to be built as a combination of different scientific knowledge from diverse topics into a new problem. In light of this, we implement the unsupervised Machine Learning technique of Latent Dirichlet Allocation (LDA) on the corpus of papers in a given field to: i) define and extract underlying topics in the corpus; ii) get the topics weight vector for each paper in the corpus; and iii) get the topics weight vector for new papers. By registering papers preferred by a user, we build a user vector of weights using the information of the vectors of the selected papers. Hence, by performing an inner product between the user vector and each paper in the daily Arxiv release, we can sort the papers according to the user preference on the underlying topics.

    We have created the website IArxiv.org where users can read sorted daily Arxiv releases (and more) while the algorithm learns each users preference, yielding a more accurate sorting every day. Current IArxiv.org version runs on Arxiv categories astro-ph, gr-qc, hep-ph and hep-th and we plan to extend to others. We propose several new useful and relevant implementations to be additionally developed as well as new Machine Learning techniques beyond LDA to further improve the accuracy of this new tool.


    LATER: Having some password trouble myself, will keep trying.
    LATER LATER: Got in now, getting to learn the ropes.
    Last edited by Roger E. Moore; 2020-Feb-10 at 02:42 AM.
    Do good work. —Virgil Ivan "Gus" Grissom

  7. #7
    Join Date
    Sep 2004
    Posts
    4,335
    Quote Originally Posted by Roger E. Moore View Post
    Good news for arXiv readers! If you have an account, get your news sorted by AI at this website: IArxiv.org . No kidding!

    https://arxiv.org/abs/2002.02460

    Intelligent Arxiv: Sort daily papers by learning users topics preference
    Still bugs in the system. If you adjust the range dates at the top, the system will freeze. However, it does seem to learn what your preferences are.
    Do good work. —Virgil Ivan "Gus" Grissom

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •