Retrieving and downloading data

get_data – Get data from MAST or hard disk

Kepler FITS retrieval. This module will retrieve Kepler lightcurve FITS files based on a specified source. It defines a class called DataStream that stores the relevant lightcurve data for one-or-many Kepler targets and one-or-many Quarters as a 64-bit encoded string to be passed along to other modules directly through memory. Reference: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

class get_data.DataStream(arrays=None)

This class is a wrapper for a string of time, flux, and flux error.

pprint()

Prints the data stream to the screen in a human-readable format.

class get_data.DiskDataLoader(data, datapath, outstream=None)

Retrieves data from a disk given a specified root directory (relative or absolute). The FITS files are expected to be under that directory with the pathspec <root directory>/<4-digit short KepID>/<full KepID>/

_DiskDataLoader__get_fits_path(datapath, kepler_id, quarter, suffix)

Construct file path on disk, given the base path, Kepler ID, and quarter.

_DiskDataLoader__read_fits_file(input_fits_file, kepler_id)

Read FITS file from disk for each quarter and each object into memory.

class get_data.MASTDataDownloader(data, outstream=None)

Retrieves data from the MAST archive over the web.

_MASTDataDownloader__download_file_serialize(uri)

Downloads the FITS file at the given URI; if that fails, attempts to download the file from the backup URI. On success, returns a raw character stream. On failure, output is an empty string.

_MASTDataDownloader__get_mast_path(kepler_id, quarter, suffix)

Construct download path from MAST, given the Kepler ID and quarter.

_MASTDataDownloader__process_fits_object(fits_string)

Process FITS file object and extract info. http://stackoverflow.com/questions/11892623/python-stringio-and-compatibility-with-with-statement-context-manager Returns the temporary file name and DataStream object.

_MASTDataDownloader__tempinput(*args, **kwds)

Handle old legacy code that absolutely demands a filename instead of streaming file content.

get_data.__read_input(file)
get_data.main(source, datapath, instream=<open file '<stdin>', mode 'r' at 0x7f1ccfac90c0>, outstream=None)

Get data from the specified source and optional data path.

Parameters:
  • source (str) – Either “disk” or “mast”
  • datapath (str) – If source is “disk”, then the path to the files; ignored otherwise

join_quarters – Stitch multiple quarters of data together

A more advanced Reducer, using Python iterators and generators. From http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

join_quarters.main(instream=<open file '<stdin>', mode 'r' at 0x7f1ccfac90c0>, outstream=None)