2012-09-19

Can you fix charge-transfer inefficiency without a theory-driven model?

The Gaia mission needs to centroid stars with accuracies at the 10-3-pixel level. At the same time, the detector will be affected by charge-transfer inefficiency degradation as the instrument is battered by cosmic radiation; this causes significant magnitude-dependent centroid shifts. The team has been showing that with reasonable models of charge-transfer inefficiency, they can reach their scientific goals. One question I am interested in—a boring but very important question—is whether it is possible to figure out and fix the CTI issues without a good model up-front. (I am anticipating that the model won't be accurate, although the team is analyzing lab CCDs subject to sensible, realistic damage.) The shape and magnitude of the effects on the point-spread function and positional offsets will be a function of stellar magnitude (brightness) and position on the chip. They might also have something to do with what stars have crossed the chip in advance of the current star. The idea is to build a non-trivial fake data stream and then analyze it without knowing what was put in: Can you recover and model all the effects at sufficient precision after learning the time-evolving non-trivial model on the science data themselves? The answer—which I expect to be yes—has implications for Gaia and every precision experiment to follow.

In order to work on such subjects I built a one-dimensional (yes the sky is a circle, not a 2-sphere) Gaia simulator. It currently doesn't do what is needed, so fork it and start coding! Or build your own. Or get serious and make a full mission simulator. But my point is not Will Gaia work? it is Can we make Gaia analysis less dependent on mechanistic CCD models? In the process we might make it more precise overall. Enhanced goal: Analyze all of Gaia's mission choices with the model.

2012-09-16

scientific reproducibility police

At coffee this morning, Christopher Stumm (Etsy), Dan Foreman-Mackey (NYU), and I worked up the following idea of Stumm's: Every week, on a blog or (I prefer) in a short arXiv-only white paper, one refereed paper is taken from the scientific literature and its results are reproduced, as well as possible, given the content of the paper and the available data. I expect almost every paper to fail (that is, not be reproducible), of course, because almost every paper contains proprietary code or data or else is too vague to specify what was done. The astronomical literature is particularly interesting for this because many papers are based on public data; for those it comes down only to code and procedures; indeed I remember Bob Hanisch (STScI) giving a talk at ADASS showing that it is very hard to reproduce the results of typical papers based on HST data, despite the fact that all the data and almost all the code people use on them are public.

Stumm, Foreman-Mackey, and I discussed economic models and incentive models to make this happen. I think whoever did this would succeed scientifically, if he or she did it well, both because it would have huge impact and because it would create many new insights. But on the other hand it would take significant guts and a hell of a lot of time. If you want to do it, sign me up as one of your reproducibility agents! I think anyone involved would learn a huge amount about the science (more than they learn about reproducibility). In the end, it is the community that would benefit most, though. Radical!

2012-09-15

standards for point-spread-function meta data

When we share astronomical images, we expect the images to have standards-compliant descriptions of their astrometric calibration—the mapping between image position and sky position—in their headers. Naturally, it is just as important to have descriptions of the point-spread-function, for almost any astronomical activity (like photometry, source matching, or color measurement). And yet we have no standards. (Even the WCS standard for astrometry is seriously out of date). Develop a PSF standard!

Requirements include: It should be very flexible. It should permit variations of the PSF with position in the image. It should have a specified relationship between the stellar position and the position of the mean, median, or mode of the PSF itself. That latter point relates to the fact that astrometric distortions can be sucked up into PSF variations if you permit the mode of the PSF to drift relative to the star postion. I like that freedom, but whether you permit it or not it should be explicit.

2012-09-12

impute missing data in spectra

Let me say at the outset that I don't think that imputing missing data is a good idea in general. However, missing-data imputation is a form of cross-validation that provides a very good test of models or methods. My suggestion would be to take a large number of spectra (say stars or galaxies in SDSS), censor patches (multi-pixel segments) of them randomly, saving the censored patches. Build data-driven models using the uncensored data by means of PCA, HMF, mixture-of-Gaussians EM, and XD, at different levels of complexity (different numbers of components). Compare in their ability to reconstruct the censored data. Then use the best of the methods as your spectral models for, for example, redshift identification! Now that I type that I realize the best target data are the LRGs in SDSS-III BOSS, where the (low) redshift failure rate could be pushed lower with a better model. Advanced goal: Go hierarchical and infer/understand priors too.

2012-09-11

galaxy photometric redshifts with XD

Data-driven models tend to be very naive about noise. Jo Bovy (IAS) built a great data-driven model of the quasar population that makes use of our highly vetted photometric noise model, to produce the best-performing photometric redshift system for quasars (that I know). This has been a great success of Bovy's extreme deconvolution (XD) hierarchical distribution modeling code. Let's do this again but for galaxies!

We know more about galaxies than we do quasars—so maybe a data-driven model doesn't make much sense—but we also know that data-driven models (even ones that don't take account of the noise) perform comparably well to theory-driven models, when it comes to galaxy photometric redshift prediction. So a data-driven model that takes account of the noise might kick ass. This was strongly recommended to me by Emmanuel Bertin (IAP). In other news, Bernhard Schölkopf (MPI-IS) opined to me that it might be the causal nature of the XD model that makes it so effective. I guess that's a non-sequitur.

2012-09-10

de-blur long exposures that show the rotation of the sky

Here at Astrometry.net headquarters we get a lot of images of the night sky where the exposure is long and the stars have trailed into partial circular arcs. If we could de-blur these into images of the sky, this would be great: Every one of these trailed images would provide a photometric measurement of every star. Advanced goal: Every one of these trailed images would provide a photometric light curve of every star. That would be sweet! Not sure if this is really research, but it would be cool.

The problem is easy, because every star traverses the same angle in a circle with the same center. Easy! But the problem is hard because the images are generally taken with cameras that have substantial field distortions (distortions in the focal plane away from a pure tangent-plane projection of the sky). Still, it seems totally do-able!

Pedants beware: Of course I know that it is the Earth rotating and not the sky rotating! But yes, I have made that pedantic point on occasion too.

2012-09-07

design strategy for vector and tensor calibration

In Holmes et al 2012 (new version coming soon) we showed practical methods for designing an imaging survey for high-quality photometric calibration: You don't need a separate calibration program (separate from the main science program) if you design it our way. This is like a scalar calibration: We are asking What is the sensitivity at every location in the focal plane? We could have asked What is the astrometric distortion away from a tangent-plane at every location in the focal plane?, which is a vector calibration question, or we could have asked What is the point-spread function at every location in the focal plane?, which is a tensor calibration question. Of course the astrometry and PSF vary with time in ground-based surveys, but for space-based surveys these are relevant self-calibration questions. We learned in the above-cited paper that certain kinds of redundancy and non-redundancy make scalar calibration work, but the requirements will go up as the rank of the calibration goes up too. So repeat for these higher-order calibrations! Whatever you do might be highly relevant for Euclid or WFIRST, which both depend crucially on the ability to calibrate precisely. Even ground-based surveys, though dominated by atmospheric effects, might have fixed distortions in the WCS and PSF that a good survey strategy could uncover better than any separate calibration program.

2012-09-06

track covisibility of stars

The Astrometry.net system sees a huge amount of heterogeneous data, from wide-field snapshots to very narrow-field professional images, to all-sky fish-eye cloud cameras. Any image that is successfully calibrated by the system has been matched to a dataabase of four-star figures (quads) and then verified probabilistically using all the stars in the image and in the USNO-B1.0 Catalog in that region (down to some effective magnitude cut). Of course the quad index and the catalog are both suspect, in the sense that they both contain stars that are either non-existent or else have wrong properties. The amusing thing is that we could construct a graph in which the nodes are catalog entries and the edges are instances in which pairs of stars have been observed in the same image.

This graph would contain an enormous amount of information about the sky. For example, the network could be used to create a brightness ordering of stars on the sky, which would be amusing. But more importantly for us, the covisibility information would tell us what pairs of stars we should be using together in quads, and what pairs we shouldn't. That analysis would take account not just of their relative magnitudes, but also the typical angular scales of the images in which stars of that magnitude tend to be detected. It would also identify (as nodes with few or no edges) catalog entries that don't correspond to stars, and groups of catalog entries that are created by certain kinds of artifacts (like handwriting on the photographic plates, etc) that generate certain kinds of false positive matches in our calibrations.

This idea was first suggested to Dustin Lang (CMU) and me by Sven Dickinson (Toronto) at Lang's PhD defense. Advanced goal: Make a directed graph, with arrows going from brighter to fainter. Then use statistics of edge directions to do a better job on brightness ranking and also classify images by bandpass, etc. Even more advanced goal: Evolve away from star catalogs to covisible-asterism catalogs! At the bright end (first or second magnitude), we might be able to propose a better set of constellations.

2012-09-05

show that low-luminosity early-type galaxies are oblate

Here's an old one from the vault: Plot the surface brightness of early-type galaxies (red, dead) as a function of ellipticity and show that surface brightness rises with ellipticity. This is what is expected if early-type galaxies are transparent and oblate. I know from nearly completing this project many years ago that this will work well for lower-luminosity early types and badly for higher-luminosity early types. The cool thing is that, under the oblate assumption, the true three-dimensional axis-ratio and three-dimensional central stellar density distribution function can be inferred from the observed two-dimensional distributions under the (weak) assumption of isotropy of the observations. That assumption isn't perfectly true but it is close. You can use high signal-to-noise imaging and SDSS spectroscopy to do the object selection, so observational noise in selection and measurement won't provide big problems.

This is another Scott Tremaine (IAS) project. Mike Blanton (NYU) and I basically did this many years ago with SDSS data, but we never took it through the last mile to publication, so it is wide open. Actually, it seems likely that someone has done this previously, so start with a literature search! Bonus points: Figure out what's up with the high-luminosity early types. They are either triaxial or a mix of oblate and prolate.

2012-09-04

find catastrophes in the stellar distribution

In Zolotov et al (2011) we asked the question: Might tiny dwarf galaxy Willman 1 be just a cusp in the stellar distribution of the Milky Way? If you generically have lines and sheets in phase space—and we very strongly believe that the Milky Way does—then generically you will have folds in those (in non-trivial projections they are required), and those folds generically produce catastrophes (localized regions of very high density) of various kinds (folds, cusps, swallowtails, and so on), which could mimic gravitationally bound or recently disrupted overdensities in the stellar distribution. The cool thing is that the catastrophes have quantitative two-dimensional morphologies that are very strongly constrained by mathematics (not just physics). The likelihood test we did in the Zolotov paper could easily be expanded into a search technique, maybe with some color-magnitude-diagram filtering mixed in. The catastrophes pretty much have to be there so get ready to get rich and famous! If you go there, send email to Scott Tremaine (IAS), who first proposed this idea to me.

2012-09-03

analyze quadratic star centroiding

Inside the core SDSS pipelines and inside the Astrometry.net source-detection code simplexy, centroiding—measurement of star positions in the image—is performed by fitting two-dimensional second-order polynomials to the central 3×3 pixel patch centered on the brightest pixel of each star. This is known to work far better than taking first moments of the light distribution (integrals of x and y times the brightness above background) for the (possibly obvious) reason that it is a quasi-justified (in terms of likelihood) fit.

Of course not all of the information about a star's position is contained in that central 3×3 pixel patch (and the method doesn't make use of any point-spread function information to boot). For this reason, Jo Bovy (IAS) and I did some work a few years ago to test it. Things intervened and we never finished, but our preliminary results were really surprising: For well-behaved point-spread functions, the two-dimensional quadratic fit in the 3×3 patch performed almost indistinguishably from fits that made use of the true point-spread function and larger patches. That is, it appeared in our early tests that the 3×3 patch does contain most of the centroiding information! A good research project would see how the 3×3 patch inference degrades relative to the point-spread-function inference, as a function of PSF properties and the signal-to-noise, with an eye to analyzing when we need to be thinking about doing better. I will call out Adrian Price-Whelan (Columbia) here, because he is all set up with the machinery to do this!

2012-09-02

Find more of the GD-1 stream

The GD-1 stream spans many tens of degrees in the SDSS data. The stellar density in the stream is inhomogeneous, but the stream appears to be terminated by the survey boundary, and not before. So we should be able to find much more of it! And more stream means better constraints on the mass model for the Milky Way and the formation of cold streams. A few years back we made a model of GD-1, so we can predict where the stream will be on the unobserved parts of the sky and at what heliocentric distance. These properties of the stream set the parameters for a simple (say) three-color ground-based imaging survey to recover the stream in the Southern Hemisphere. Before you go get the observing time, I would recommend looking in the various data archives; there might already be sufficient data out there to map parts of the stream right now.

2012-09-01

remove satellite trails from arbitrary astronomical imaging

Satellite trails appear as long lines in astronomical imaging, often nearly unresolved or slightly resolved. They are easy to find, fit, and subtract away, at least in principle. I have had several undergraduate researchers, however, who got close but couldn't deliver a robust, reliable piece of code.

The code I imagine takes an image (and an optional inverse variance image). It identifies if the image contains a satellite trail (possibly using the Hough Transform and some heuristics). If it does, it fits the trail using robust fitting techniques. If that all works, it returns to the user an updated image and an updated inverse variance map. Not hard! The only hard parts are making it robust and making it fast. I have a lot of good ideas on both parts of that; I think this is very do-able, and it is only a few weeks work for the right person. It would be hella useful too, especially for the human-viewable image projects I am working on. Enhanced goal: Fit for satellite tumbling or blinking (both things are common in the data I have).