PDA

View Full Version : Astronomical imaging: The theory of everything



parejkoj
2008-Oct-22, 04:55 AM
I wasn't sure where to put this, since it touches on so many different parts of the forum, and I can't cross-post, so I figured I'd go for the area with the most traffic (I hope!). Well, most traffic outside of OTB, anyway...

Astronomical imaging: The theory of everything (http://arxiv.org/abs/0810.3851)
David W. Hogg (NYU), Dustin Lang (Toronto)

When Hogg talks, I listen. Carefully.

I heartily recommend this paper to anyone even remotely interested in how astronomical research is done now, and how it should be done in the future. For those of you not familiar with statistics or some basic astronomical terms, it may be a bit challenging, but keep your dictionary (or google) at hand, and you should get through it. Feel free to ask any questions here.

Once you've gotten through the paper, just take a few moments to think about the implications, once someone actually implements this. Then, once you've done that, think bigger...

Yeah, that big.

Of course, the "implement" part is the really hard part. But if anyone can do it, Hogg and company can (http://astrometry.net).

I'm going to sleep on this, and then talk about it at the astro-ph lunch session tomorrow. I suspect there will be a lot of astronomers doing exactly the same thing over the course of this week. Some of them will likely not be happy about the idea (I predict that the older and most established observational astronomers will make up the majority of that group... Hmm... Hey, ngc3314: I hate to blow your cover, but now I'm really curious about your opinion!), but I'm quite convinced they're in the wrong here.

StupendousMan
2008-Oct-22, 03:32 PM
I wasn't sure where to put this, since it touches on so many different parts of the forum, and I can't cross-post, so I figured I'd go for the area with the most traffic (I hope!). Well, most traffic outside of OTB, anyway...

Astronomical imaging: The theory of everything (http://arxiv.org/abs/0810.3851)
David W. Hogg (NYU), Dustin Lang (Toronto)


Of course, the "implement" part is the really hard part. But if anyone can do it, Hogg and company can (http://astrometry.net).


Yes, implementation is hard. There are a lot of other "hard" aspects which were not addressed in detail in the paper, such as the proper way to characterize all the data. Which filter was used in that image? And at what airmass? And which sort of detector? What, you didn't write all this down?



I'm going to sleep on this, and then talk about it at the astro-ph lunch session tomorrow. I suspect there will be a lot of astronomers doing exactly the same thing over the course of this week. Some of them will likely not be happy about the idea (I predict that the older and most established observational astronomers will make up the majority of that group... Hmm... Hey, ngc3314: I hate to blow your cover, but now I'm really curious about your opinion!), but I'm quite convinced they're in the wrong here.

Some people have advocated digitizing the great plate archives of the world, for years. Not much progress, because it costs money. The astronomical community, as a whole, prefers to build new instruments and use them, rather than to go back and inspect old data.

Hogg's proposal might be easy, in some ways, for existing archives of digital information ... which means, for the most part, space missions, with a few large ground-based surveys thrown in (SDSS, 2MASS, FIRST). I don't see any realistic chance that NSF or NASA or individual scientists will provide the $$ required to incorporate ALL images EVER TAKEN of ANY OBJECT by ANYONE.

So, a more realistic question is, can one put together a scheme which uses only a small subset of relatively homogeneous and internally consistent data from a small number of surveys to create this model of the universe? The answer is probably yes, since similar things have already been done on a smaller scale; see papers by Cohen et al. on a model of the infrared sky, for example, or the work of the Besancon group to model the properties of the Milky Way. Not exactly the same thing, but heading in the same direction.

The principle of GIGO remains: if the inputs to the model are not properly characterized, the outputs of the model will be incorrect.

Is it a good idea? If you're both an astronomer and a computer scientist, or part of a group which contains members of these groups, then, sure, you should apply for money from the granting agencies to do it.

Will it change the way we do astronomy, as a community? If it works in a big way, it might. But on a smaller scale, people are already putting together meta-catalogs of information. I suspect that the changes made by the smaller projects will add up to be more significant over the next 10 or 20 years than the changes due to the proposed super-model.

Keep in mind that the net result will be to remove scientists even further from the process of acquiring, reducing and analyzing data. We can expect (whether this particular proposal moves forward or not) a larger and larger number of papers reporting completely bogus results, because the authors simply queried a big database, unaware of the errors and limitations of the data within. Sigh.

(Oh, and get off my lawn, too!)

John Mendenhall
2008-Oct-22, 05:22 PM
Because of the rate of technological and information processing improvements, it might be better to use digital information pouring in from current and future projects than to digitize existing data.

I think.

Unless you could disribute the work the way the stardust people did.

At any rate, standards for digital data are something to think about for future projects, if it hasn't been done already.

astromark
2008-Oct-22, 06:31 PM
It does make sense to compile a data base of information. A technical advancement tool. Its happening already. Saying the obvious and doing it costs time and we all know that is money. Some one has got to pay. Education institutes could use this sort of compiled information. The complete works of humanity. Its not going to happen overnight. Astronomy is a enormous subject with many specialist areas. Defining any part of it with absolute detail will be a work ongoing for eternity.
This thing called the INTERNET has done just this.

ngc3314
2008-Oct-22, 06:35 PM
(I predict that the older and most established observational astronomers will make up the majority of that group... Hmm... Hey, ngc3314: I hate to blow your cover, but now I'm really curious about your opinion!)

Older, yeah - more established, thanks! The only substantial difference I have with a lot of Hogg's ideas is the trumpeting of "the best possible" this and that. His "catalog" would obviously have a level of detail (i.e. errors in source position, proper motion, spectral shape, flux, structure) which varies wildly around the sky depending on the available data. This has, to some extent, been done on very limited data sets in small areas of sky, albeit in a cruder way than he proposes (I've got to get more fluent in working in Bayesian ways). Now, I would differ from him in one of the dicta from his research wiki about no significant discovery in recent astronomy arising from visual morphological inspection (cough, Hanny's Vorwerp, cough) - arguing that our understanding is already that complete seems to me to rise to serious sophistry.

I am seriously impressed with the progress Hogg and crew have made in software which can tell what piece of sky an image shows with no additional data, although a rough image scale speeds things up), and I am struck that they can generally work out when (within a few years) from proper motions. The point in trying to work backwards is clearly trying to include time-variable things that happened in the century between the introduction of astrophotography and the era of digital images with automatic metadata (which means that there was over 20 years of CCD imagery with very spotty metadata).

As he lays it out, this is clearly an enormous effort - work out how the PSF varies across a field, come up with a standard to store this information, use external information to calibrate photographic plates for intensity information in an automated way, store what for galaxies is a complex and wavelength-dependent image structure as your model rather than a source list... I'm sort of surprised that he thinks years rather than decades. But if he's being provocative again, it works!

Nereid
2008-Oct-22, 08:30 PM
Ironic that this Hogg (et al.) paper comes out within a week of Lessons Learned from Sloan Digital Sky Survey Operations (http://arxiv.org/abs/0810.2811), whose authors include a certain J.E. Gunn.

Interesting, provocative, with just a hint of hubris perhaps?

It'd be interesting to see how well this approach would work with archival HST or IRAS data, and what it would do when examined before and after the various 'physical modelling' efforts ...

I wonder too how much more broadly Hogg et al. feel their approach would be applicable, than the UV/visual/NIR wavebands that their paper seems to focus on (think COMPTON, the various x-ray observatories, COBE, the multitude of radio telescopes; LOFAR?), and the few types of imaging devices they seem to be talking about (think HIPPARCOS, spectroscopy, CHARA, MAD, VLBI, LOFAR). How about recovering the 'point sources' from WMAP data - let's feed their monster data processor with raw WMAP data (plus relevant calibration data) and see how well it can recover the ~400 point sources in WMAP5! :)

In light of the paper which has J.E. Gunn as an author, and the actual, historical timeline of SDSS, inputting "We are confident that - without substantial new technology and in a timescale of years (not decades)—the entirety of preserved, digitized or digital astronomical imaging can be assembled and calibrated," into my 'Nereid model' (which is neither frequentist nor Bayesian) produces both "hugely overconfident" and "in a timescale of decades (at least)" as output.

parejkoj
2008-Oct-23, 03:28 PM
Yes, implementation is hard. There are a lot of other "hard" aspects which were not addressed in detail in the paper, such as the proper way to characterize all the data. Which filter was used in that image? And at what airmass? And which sort of detector? What, you didn't write all this down?

Well, if someone didn't write that information down, they shouldn't have published a paper using it. If the images didn't come from a publication, but rather someone's archives, some of that information can be derived from a system like astrometry.net. If it cannot, the data just may not be useable.

However, see my later comments. I don't think you're thinking big enough here...


Some people have advocated digitizing the great plate archives of the world, for years. Not much progress, because it costs money. The astronomical community, as a whole, prefers to build new instruments and use them, rather than to go back and inspect old data.

Hogg's proposal might be easy, in some ways, for existing archives of digital information ... which means, for the most part, space missions, with a few large ground-based surveys thrown in (SDSS, 2MASS, FIRST). I don't see any realistic chance that NSF or NASA or individual scientists will provide the $$ required to incorporate ALL images EVER TAKEN of ANY OBJECT by ANYONE.

Although your points are well taken, some of the movements to digitize plates have begun to make progress, and there has been more and more pressure from some quarters (Mike Brown's planet searches, for example) to get this done. Whether that translates into real cash money, I don't know. As to applying it to data already in digital form, that was already part of the proposal, but as I read it--and extrapolating from what I've seen of how Hogg thinks--the goal is to apply the system to all data taken in the future.

Think about some of the big "general" problems in astronomical imaging: deblending sources (deep Spitzer images have a big problem with this, and I suspect LSST will as well), PSF determination (this can have a time-varying component), faint source identification (do you pick 4-sigma sources, or 5-sigma? What if there is a known source at that location in another bandpass or previous image?), what about the flat-field or filter non-linearities (if you have lots of images to work with, you can do a much better job constraining these: see the ubercal of SDSS). These are all problems that are typically part of the "art" of astronomy (or as Hogg calls it, the 'observer "folklore"'). How does one determine which method of dealing with each of these is the "best?" One generally asks "the experts," but such experts may disagree on the optimal approach, and can rarely prove that their method really is best in any general, objective sense.

My hope (and I suspect Hogg & Lang's too) is that this isn't something that someone has to provide money for, but rather something that everyone just does as a part of their analysis. If your data reduction can incorporate all the "folklore" of anyone else who has used that particular system, plus all previous images taken with that system and their reductions, and all of this done in a consistent manner, that would be a compelling reason for people to just use it. I should hope! Instead of loading up some obscure IRAF tool and messing with parameters until you "get it right" (whatever that may mean), you leverage all the previous data to find a globally optimal reduction of the data. I'm sure you can probably come up with examples from your own observations where such a tool would have been useful.

As Hogg & Lang say, "We seek to remove astronomers from the calibration step." That's why I think "classical" observational astronomers would be most unhappy about it, since it is turning an "art" into a "science."



The principle of GIGO remains: if the inputs to the model are not properly characterized, the outputs of the model will be incorrect.

But that's true of all astronomy results. If you don't understand your own calibration, then your results are going to be suspect. And part of understanding your calibration involves being able to get someone else to produce the same results, preferably without any hand holding. So I see this is just quantifying what astronomers are already trying to do, and then applying it universally and uniformly.



Will it change the way we do astronomy, as a community? If it works in a big way, it might. But on a smaller scale, people are already putting together meta-catalogs of information. I suspect that the changes made by the smaller projects will add up to be more significant over the next 10 or 20 years than the changes due to the proposed super-model.

But that's the point: this isn't just a meta-catalog of information. As Hogg & Lang say, catalogs are frozen. This is a "living document" where adding new information improves the system as a whole (by providing better constraints and tweaking the fitting parameters).



Keep in mind that the net result will be to remove scientists even further from the process of acquiring, reducing and analyzing data. We can expect (whether this particular proposal moves forward or not) a larger and larger number of papers reporting completely bogus results, because the authors simply queried a big database, unaware of the errors and limitations of the data within. Sigh.

But how many results are already bogus because someone messed up their calibration? And again, the goal here is to make the "errors and limitations" a transparent part of your "query a big database," since there won't be a static catalog. Think bigger!



(Oh, and get off my lawn, too!)

Sorry, sir! Won't happen again. (at least until you turn around... Hey, that dog has a poofy tail! *sneak* *sneak*)


The only substantial difference I have with a lot of Hogg's ideas is the trumpeting of "the best possible" this and that. His "catalog" would obviously have a level of detail (i.e. errors in source position, proper motion, spectral shape, flux, structure) which varies wildly around the sky depending on the available data.

But that's the point: what defines "the best possible?" Give two astronomers the same imaging data, and they will likely produce two different catalogs. What if we could sample from the whole parameter space of possible generated catalogs and use that knowledge to best approximate the "best possible" results. I too need to get more familiar with Bayesian statistics, but I think this is the general idea.


Now, I would differ from him in one of the dicta from his research wiki about no significant discovery in recent astronomy arising from visual morphological inspection ...

Where did he say that? I only recently became aware of hoggresearch.blogspot.com, so I may have missed it.

And remember, everyone, this [i]isn't just for digitized plates! One of the take-away lessons from the 2008 SDSS meeting in Chicago was three simple words from Jim Gunn:

"Do it right!"

I think that applies here: proper calibration (of all astronomical data!) should not be viewed as a local problem, but rather a global minimization. So, in answer to your question, Nereid: I'm quite sure that Hogg wants this to apply to all wavelengths. Though the paper is titled "Astronomical Imaging," so spectroscopy isn't included in this iteration...

Hmm... maybe there is something left for me to write...


In light of the paper which has J.E. Gunn as an author, and the actual, historical timeline of SDSS, inputting "We are confident that - without substantial new technology and in a timescale of years (not decades)—the entirety of preserved, digitized or digital astronomical imaging can be assembled and calibrated," into my 'Nereid model' (which is neither frequentist nor Bayesian) produces both "hugely overconfident" and "in a timescale of decades (at least)" as output.

"neither frequentist nor Bayesian" - Heh, heh...

On the other hand, those lessons are now out in the open. Future collaborations will ignore them at their own peril. I should hope that Hogg & Lang are aware of, and incorporate, those lessons into their vision. To do otherwise would be utterly foolish.

StupendousMan
2008-Oct-23, 04:17 PM
Instead of loading up some obscure IRAF tool and messing with parameters until you "get it right" (whatever that may mean), you leverage all the previous data to find a globally optimal reduction of the data.

Okay, let's try moving the discussion to a more concrete level.

I make some measurements of the sky. I know how to act along the first approach: run IRAF and mess with parameters until I end up with -- say -- a list of positions and magnitudes for the stars in my field.

Please explain why I should do with my images if I want to follow the second approach. Exactly what does it mean to "leverage all the previous data?" Where do I find that data? What tool do I use to apply leverage? Where are the platforms on which it runs? Where's the documentation? What results does the leveraging tool provide to tell me when I've found the globally optimal solution?

If the answers to some of these questions are, "Well, you have to wait 1 (or 3, or 5) years for someone to create the tools", that's fine. It just means that I'll forget about this notion for the next 1 (or 3, or 5) years.

parejkoj
2008-Oct-23, 05:05 PM
Please explain why I should do with my images if I want to follow the second approach. [series of excellent questions removed for brevity]

If the answers to some of these questions are, "Well, you have to wait 1 (or 3, or 5) years for someone to create the tools", that's fine. It just means that I'll forget about this notion for the next 1 (or 3, or 5) years.

Your point is well taken. These tools don't yet exist for the most part; except for astrometry.net, which will likely be a part of the whole. But I think it is worth keeping this whole scheme in mind for the future, or even contributing a piece of it. Do you have some instrument "folklore" ensconced in your code that others would benefit from?

Let me sidestep your questions and ask one back. If such a scheme as Hogg & Lang are proposing becomes available a bit at a time, in a cross-platform or web-based format (like python, the code behind astrometry.net), would you begin to make use of it, or would you wait until the whole thing is available? I realize the answer is probably "it depends on how easy it is for me to transition my workflow," and I know that is important, but still...

Astronomers often see the "code" that is used in reduction and analysis as "just a tool." But it is, of course, more than that: it is an integral part of the results themselves. I think it is worthwhile considering how to develop our "tools" so that they can be combined to produce a better whole.

StupendousMan
2008-Oct-23, 09:20 PM
Let me sidestep your questions and ask one back. If such a scheme as Hogg & Lang are proposing becomes available a bit at a time, in a cross-platform or web-based format (like python, the code behind astrometry.net), would you begin to make use of it, or would you wait until the whole thing is available? I realize the answer is probably "it depends on how easy it is for me to transition my workflow," and I know that is important, but still...


Let me make this even more concrete. I take a picture of some region of the sky with an optical telescope and CCD, with a u-band filter in place. Now, suppose that Hogg et al. have a tool of some sort which includes a model of the sky in my region. That model would presumably contain a list of all the astronomical sources in that region, at all wavelengths, with positions, proper motions, spectra, etc.

So, the idea is that I somehow compare my images to the model, right? Okay. If all the objects I detect in my images are matched with items in the model (correctly), then I have just discovered that my observation was redundant. Hmmm. That's not very exciting. I guess I should have used the tool first to answer my question.

If the object(s) of interest in my observations aren't in the model, then it's likely that the model didn't help me. I can see a few cases in which the model might still help -- for example, if my targets are really faint, and could be blended with brighter objects which are in the model, so that the model helps me to deblend the targets in my observations -- but my guess is that that is an unusual situation. More likely, I'm interested in, say, some variable stars, and the model doesn't include their variability. Thus, the model doesn't help me.

Well, then, I might conclude, I ought to analyze my measurements, come up with detailed descriptions of the variable stars, and add them to the model. The first part is easy: I have to write down compact descriptions of the variability when I publish my paper on the targets. But the second part --- how do I add this new information to the model? Can any astronomer just reach in and modify the model? Are there special users with admin power? How are conflicts over changes resolved? How frequently are changes committed?

There are some devils in these details.

If all goes correctly, then eventually, a better model evolves, which now not only has a list of the stars in that region of sky, but also analytic expressions for the variability of some of the them. In the future, astronomers will now find the proper brightnesses for all those variable stars when they ask for properties of that portion of the sky, right?

If I'm interested in stars in our own Milky Way, how much does this new model provide above and beyond the information stored in a small set of, say, 5 or 10 catalogs? I don't see much of a gain, in this particular case.

Which type of astronomer gains the most from this detailed model of the entire sky? Is there some type of person who will be much better off?