View Full Version : Kepler Data Tutorial : What can you do?

2012-Jan-13, 10:24 PM
I am in no way a scientist (always wish I was), but find science of all kinds to be extremely fascinating like I know the majority on this forum are. In that spirit, I present a basic tutorial that anyone can do and ask for suggestions from other folks who might be interested. This tutorial will teach you how to download Kepler public light curves that are available for download, extract specific data from from within the dataset and view it using a spreadsheet program. You will need software the extract the data, which you can find at http://heasarc.gsfc.nasa.gov/ftools/fv/fv_download.html

Step 1 : Which light curve should I look at?

Most if not all of the 150000+ light curves collected over the first 6 quarters of its mission are available. These can be viewed online and offer a search form to narrow down what you are looking for. Your search can be broad, or very narrow depending on your specific search criteria. The search form is at the following website:

archive.stsci.edu (http://archive.stsci.edu/kepler/data_search/search.php)

If you want to view one of the already released planet candidates or confirmed planets, you can find a complete list here (http://exoplanetarchive.ipac.caltech.edu/cgi-bin/ExoTables/nph-exotbls?kepler=1) at NASAs Exoplanet Archive. The 'KepID' column is the number you can use in the search form to view the light curve on the website previously mentioned.

For the purposes of this demonstration, I will use confirmed planet Kepler-17b. (Row #44, KOI #203). The 'KepID' given is 10619192.

To view the light curve, type in the KepID into the 'Kepler ID' input box and hit the search button. You will see that data is available for Quarters 1-6. Check the box for Q1 and click the 'Plot marked light curves' button. This will show you the raw and corrected data for the star:


Step 2 : Download and view

Now we want to download the data that produced this light curve so that we can manipulate it. To do this, we can go to http://archive.stsci.edu/pub/kepler/lightcurves/ You will note that the data have been grouped by the first four digits of the Kepler ID, e.g. 0007, 0008....0129. Under each of these directories, there is a directory for each public Kepler ID, where all public Kepler lightcurves are stored. [1] (http://archive.stsci.edu/kepler/publiclightcurves.html)

Ultimately, the file we want is the first file at http://archive.stsci.edu/pub/kepler/lightcurves/0106/010619192/ (http://archive.stsci.edu/pub/kepler/lightcurves/0106/010619192/)

Right click the file and 'save link as' to download the file. Note the location where you put the file on your computer. Once you have installed your viewer, you can open the file. (I have renamed my file to be the KeplerID and the Quarter) :


You can view the light curve with the software by clicking "Plot'. Highlight TIME and click the X button. Highlight PDCSAP_FLUX and click the Y button. Then click Plot. To

Step 3 : Extraction

Now click the 'Select' button on the LIGHTCURVE row of the summary panel:


Check only TIME and PDCSAP_FLUX and click 'Display Table'. In the table panel, click 'File' > Export as Text. Click save and a dialog box will come up. Save the text file in CVS format.

Open the text document and a spreadsheet editor. I use OpenOffice, so I don't know the exact function but it is probably similar in other programs. Copy the text in the text file. Right click the spreadsheet (box A1) and 'Paste Special'. Follow the options in the dialog box to have the spreadsheet program separate the columns by comma.

Now you have the data in your spreadsheet and can view it in a number of ways. One way is described here:

Step 4 : Viewing the data

Select the first 200 rows of data and (in OpenOffice) insert chart. Designate the first column as a label. You should end up with something similar to this:


Look Familiar? The first 2 dips in the light curve are the first 2 dips in the displayed light curve in step 1. Now for something that you can't do unless you have these tools.

Every 72 (or so) datapoints in column B is a one period of Kepler-17b. By reorganizing the data and averaging out the periods, we can get new ways to display the data:


I hope this helps anyone interested, and would love to see your results!

2012-Jan-14, 07:31 AM
On a very related note - back when I had a lot more time on my hands (when Planethunters (http://www.planethunters.org/) first started) I wrote a bunch of blog entries describing how to use and interpret the data there.

You can find the entries here: http://evildrganymede.net/tag/planethunters/

Start at the bottom of the page and work your way up, and that should get you started on Planethunters and finding lots of interesting stuff (not just planets - all sorts of really cool things like eclipsing binaries and variable stars!).

2012-Jan-14, 03:06 PM
On a very related note - back when I had a lot more time on my hands (when Planethunters (http://www.planethunters.org/) first started) I wrote a bunch of blog entries describing how to use and interpret the data there.

You can find the entries here: http://evildrganymede.net/tag/planethunters/

Start at the bottom of the page and work your way up, and that should get you started on Planethunters and finding lots of interesting stuff (not just planets - all sorts of really cool things like eclipsing binaries and variable stars!).

That is so cool. I wish I had seen that when you posted it. Find anything interesting?

2012-Jan-14, 09:01 PM
I did find something weird - it's probably a low mass stellar companion in an eclipsing binary, it seemed too big to be a jovian or a brown dwarf but its eclipse profile looked more like a planet than a star. I don't think it really got investigated unfortunately (I attempted to model it quite a bit, but didn't get very far).

2012-Jan-21, 03:43 PM
So, I've started to form my own techniques for looking at Kepler data, and used Kepler 11 (KID 6541920) as a test subject. I used Quarter 1 - 6 of the long cadence data that is available publicly. I used the methods above to extract the data, then adjusted each quarter so that it was one continuous coherent flow of data. I then created a MySQL database that I could access with PHP programming language. My intentions were to be able to split up the data by an adjustable period, and to highlight data that would be considered a 'dip in the light'. This is the result of splitting the data into the various periods of the already confirmed Kepler 11 system.


The total number of days for Q1-Q6 is 497.7844, which works out to being able to look at up to a 165 day orbit with 3 passes.

Kepler 11 b has a ~10.3 day orbit. Splitting the data into a 10.3 day orbit puts the data in a 504 row (data points per period) X 45 column (period). A white pixel represents a value that is higher than a predetermined one (F), and a darker pixel represents a value that is lower than (F). A red value is missing data.

Kepler 11 g has a ~118.38 day orbit.

This star also has short cadence data available for Q1-Q6, but the files are about 10 times larger and there are 3 times more of them. I'm finding it currently takes me about 3 hours to prepare a dataset and I literally need a supercomputer for the long cadence data. What a mind blowing amount of data that I had not wrapped my head around until working with it, and there is still 6 more quarters without an extended mission that will eventually get released.

2012-Feb-07, 12:33 AM
The graphic above shows the data from Kepler 11 and is color coded according to its value. Each column represents a fixed time period. Each row represents the .02(ish) day increment across the total number of periods. Another way to visualize data from a star is to average the values in each row (as long as it is not 0). The following animation displays the averages across each increment. The period increases by .02 days and halts briefly where planets are known to be.


The visualization was created using a script that reads the data. As you can see in the video, additional dips occur across many periods. All additional dips in this particular video are directly related to the known planets. This happens when the data that has lower values associated with transiting planets stacks mathematically which gives the appearance of movement as the animation progresses. At periods of 1/4, 1/2, 2x 2.5x 3x and even 1.33x, the data stacks to reveal a dip. My next step with this particular star is to modify the data to 'delete' the planets, then run the script again to see what if anything could be leftover.

2012-Feb-15, 03:24 PM
Here is a video that shows the exact same data as the video above, except that every data point is represented, rather than summed with other values. It's easier to see from this video why periodic dips occur at regular intervals. 'Deleting' the planets from the data may have revealed a couple of areas that are similar in structure to other dips that occur, but data from q1-q6 is not enough to really tell. If there is anything, it has a period of more than 120 days.

The 6 planets of Kepler 11 are represented in this video by different colors.

Kep-b is red
Kep-c is blue
Kep-d is purple
Kep-e is cyan
Kep-f is yellow
Kep-g is green (shown at the end, but not stacked.)


One question that has popped into my mind while studying this data is, can we constrain the inclination of the system relative to us? I intend to look at the data where multiple planets are transiting at the same time to see the difference in transit depth as planets are in the same view.

More about Kepler 11 (http://arxiv.org/abs/1102.0291)

One of the next things I would like to do with this system of looking at data is to add the ability to apply polynomials to data that I have entered. Since I didn't know what a polynomial was 2 weeks ago, I guess I should get started on learning about how to create and apply them. [Has anyone ever used Eureqa?] This should be able to 'smooth' out the data from binary stars, thereby making those systems much easier to analyze with my methods. Especially exciting because of the discovery that these types of systems could harbor tons of planets.

2012-Feb-24, 06:31 PM
Can someone point me to which resource would be the best at finding out about a variable star? I'm using the NVO website:


I'm trying to find out about KID 892667. I've added a method viewing the data the results in this:


I've extracted this data in the same way described in post #6, but instead of simply finding 'periods' that show an average intensity comparison, the range of intensity is plotted. The values seems to correlate to: every 2.2633765 there is an intense reduction in flux compared to other periods.

Here is one last visual for this star:


The grey area is individual data points, and the yellow-black area is the average intensity per row of data. The red arrows represent the area that is darkest and the green arrows show a second ... oscillation? I have no choice but to learn the correct terminology at this point, which is really kind of the most enjoyable part of this new found 'hobby'. I'm not sure if there is actually any interest in me posting this stuff though, so I'll post more based on feedback.

EDIT: Just came across this article on amateur data mining. Is anyone else on this forum mining data?


2012-Mar-20, 07:48 PM
This work is now being continued at extrasolar.us/AKO (http://www.extrasolar.us/AKO) if interested.