Monday, June 30, 2008

Perl vs. Python - update 1

In one of my previous blogs on Perl and Python, I stated that according to some online sources, Perl is the better choice for compatibility with other programming languages such C++. The sources also mentioned that while Perl can do everything Python can, the same cannot necessarily be said for Python.

According to Kurt, this is not true. As an avid Python user (and Perl too I believe, not sure), he says that Python can certainly do everything Perl can, and it is definitely compatible with other languages. Python can also apparently export into an ArcGIS format, something that I find really cool - sigh, i really do have an inner-geek.

So far, Python is definitely in the lead!

Sunday, June 29, 2008

Open-Source GIS!

I just stumbled across this open-source GIS suite while reading a journal paper.


"Commonly referred to as GRASS, this is a Geographic Information System (GIS) used for geospatial data management and analysis, image processing, graphics/maps production, spatial modeling, and visualization. GRASS is currently used in academic and commercial settings around the world, as well as by many governmental agencies and environmental consulting companies. GRASS is official project of the Open Source Geospatial Foundation."

I just installed it and am excited to play around with it once I have some free time. Install on a windows machine requires Cygwin, but was a breeze. There are also install packages for MacOS and a bunch of Unix-based OS (such as Ubuntu).

Perls, Pythons, and Rs...

I am not a coder, plain and simple. I may work in Matlab some, but even that involves a lot of trial and error for me. What would take some of my friends 5 minutes can take me an hour or more as I muddle my way through function calls and syntax. I am definitely keen to learn though. In my geodesy class we had a semester-long project in which we became a virtual GPS receiver and had to solve for different position fixes (basically using multiple iterations of a least-squares analysis). Most people pick Excel to do it in, including the prof. All that copy and paste and only one line in which to type your equations just gets me, and I cannot do it. I decided on Matlab. I definitely struggled with it a lot, but it was relatively easy to debug when there were issues, and in the end it actually worked.

Now one of my other profs is trying to sway me towards R instead of Matlab. The draw of R is that it is freeware, highly supported by users, and completely object-oriented. Apparently it is rapidly becoming the main coding software of statisticians. The drawback, for me obviously, is switching over and starting from scratch all over again. Other profs tell me to stick with Matlab, it is more mainstream and therefore more widely-accepted. I wonder though, which is better? Perhaps R is more intuitive, which would certainly be nice...

Perl versus Python is another deciding point for me. I recently worked with some Python code involving the same tide data from the blog below. Python seems relatively straightforward on first glance and is pretty powerful. It certainly ran through an entire month of 12-second tide data and converted all the raw temperatures, water levels, and time stamps in mere seconds. Mightily impressive. Then someone told me about Perl. It is also a great language and can do the same things as Python. If you look it up online, people rant and rave about both. A lot of folks seem to think that while Python is somewhat easier to script in than Perl, Perl is still the better choice. Python appears to be more user-friendly, but Perl can work with other programming languages (such as C++) better than Python. Also, it seems that while Perl can do everything Python can, the vice-versa is not necessarily true. My question is for someone who really has no experience with either, but wants to learn one to help with data processing, which one is better? Should I go for the one that is easier to use, or the one that has more compatibility? Does it even really matter for what I will be doing? How much coding will analyzing full-waveform lidar data even require?

Blah, why are there so many options?

Saturday, June 28, 2008

Tide data filtering in Matlab and the mysterious filtfilt issue

Last week I found myself trying to deal with a ton of tide data retrieved from a tide gauge installed in Castine, ME. The gauge takes 4 seconds to capture a data point. Raw (unconverted) water level is recorded first, followed by raw temperature 4 seconds later, and a reference value 4 seconds after that. A full record, therefore, is generated every 12 seconds. This produces a lot of data.

In order to facilitate the comparison of this gauge data to a NOAA primary station, I wanted to automate the task of determining the higher highs and lower lows. A six-minute averaging scheme was developed in Matlab by my friend Val in order to generate a six-minute record similar to the NOAA format. This still left us with squared peaks, so I decided to apply a filter to the data.

Problem: I wanted to use filtfilt, which filters data in the forward direction and then again in the reverse direction, resulting in a zero phase distortion. I tested filtfilt out on the full 12-second water level data and everything seemed hunky dory. The problem came when I tried it out on the 6-minute averaged data. The filter seemed to run okay with no errors, but the result was all NaN (no data) values. At first I thought it was caused by zeros in my vector data, so I changed all 0.00 into 0.01. This did not solve the problem. To make the problem even more frustrating, filtfilt worked on every other column of data coming out of the 6-min-averaging scheme except water level (time, std, etc.). What is up with that?

Work around: I used the convn function in Matlab in order to apply a convolution to the data. This filter can cause a phase distortion, however, by specifying the shape as 'same', the central part of the convolution is returned and the distortion is minimized.

For the comparison of our preliminary gauge to the NOAA primary gauge in Bar Harbor, a slight phase distortion will not affect the results. In order to really be correct, however, a filter that guarantees no phase distortion should have been used. If anyone has any ideas why filtfilt would not work on a running-averaged dataset, I am all ears. Even one of my profs could not figure it out.

I must say though, the results of my code using convn do look pretty nice on first inspection. Rounded peaks and automatically picked highs and lows (using the extrema function, modified so that returns are in linear time order, not descending):

Wednesday, June 11, 2008

No spoilers here...

You may have noticed, (and by you I mean the 3 people I know that have read this blog) that a couple posts have been edited already. Apparently 2 out of 3 people recommend not explicitly stating your research ideas, especially if they are things you want to publish. Therefore, I will still continue to post about lidar- and multibeam-related news and some of the general research I do, but there will be no spoilers here. If you are interested I would say wait for the book in 5 years, but hopefully you will get your chance in the next year and a half or so as papers start coming out. :)

Sunday, June 8, 2008

a simplified lidar waveform

Here is a simplified schematic of the green laser pulse waveform from lidar. Both surface return and bottom return times are picked at the half-peak amplitude. The volume backscatter is caused by the light reflecting off particles and colored dissolved organic material (CDOM) in the water column.

In reality, these curves would not be smooth but would actually be quite noisy. As I start to work with some of the lidar data, I will post more information about the waveforms.

Friday, June 6, 2008

Mars Weather: Brought to you by lidar!

Lidar is providing the weather reports from Mars!

Check it out: Lidar Mars weather

What is lidar?

Lidar: light detection and ranging (think sonar, only with light!)

There are both land- and water-based applications of lidar.

The basic principle of bathymetric lidar is that a laser mounted on an airplane or helicopter generates both a green (532 nm) and infrared (1064 nm) light pulse. The infrared pulse, which essentially has no attenuation in water, reflects off the air-sea interface and provides a surface return. The green pulse, which does attenuate, penetrates the water and reflects off the seabed giving a bottom return. The difference in travel time between the surface return and the bottom return can be used to estimate water depth. Because lidar is optical it is constrained to the photic zone and is heavily dependent on water clarity. Therefore, lidar typically will only work in 50 meters of water or less. A couple of the tech specs claim 70 meters, but this has yet to be actually acquired in the field. If I learned anything at all during this tech review, it is to take company technical specifications with a huge grain of salt!

Here is a nice little schematic of the green energy pulse that I modified from (Guenther 2001 [pdf]).

When the green energy pulse hits the water, a small portion of it reflects off the surface. The remainder penetrates the water column, where it is subjected to refraction, absorption, and scattering. When the light wave hits the seabed, it reflects back towards the laser and is again subjected to refraction, absorption, and scattering. Only about 4% of the incident green energy reaches the seafloor.

If you are interested in lidar, the Guenther chapter is a great introduction.