Peak Picking Algorithm
Recently, I was hired for a project for which I had to develop a peak-finding algorithm, graph a set of data points, and mark the peaks. Internet searches revealed no easy-to-follow instructions on how to actually find peaks within a data set, so I decided to tackle the problem myself. Although I came up with a solution specific to chemical spectroscopic datasets, this should work for all sorts of data. Read on for instructions!
Chemical spectroscopy comes with a lot of data. Each spectrum has roughly 1700 data points, so be mindful to write your algorithms efficiently (read up on complexity theory, maybe?). My job was to take a set of data points, graph them (I used amCharts, a wonderful free flash based graphing utility. It has some drawbacks in the data format, though…), and mark all the peaks.
Wait. All of the peaks?
If you look at a chemical spectrum, you can clearly see where the peaks are. In mathematics, to find a peak, you take the differential (derivative) of a function, and find the point at which it changes from positive to negative. But, mathematically, a data set of discrete non-continuous points can have tons of peaks, especially where noise is involved. So, we have to introduce two concepts (both of which should be parameterizable: derivative smoothing and derivative tolerance.
Derivative smoothing involves taking the derivative (slope) over a range of points, not just two adjacent points. So, iterate through your set of points, assigning a slope to each one. Calculate the slope not by looking at the dy/dx between your current point and the next one, but instead by looking at your current point and the 5th (or 10th, or 300th–this should be parameterized) one away. Now you should have an array of slopes, significantly smoother than the original data set.
Now you can look through the array of slopes (or you can combine all the operations into one iteration through your dataset to be a little more efficient) for changing derivatives. If your current slope is positive, and the next slope is negative, you have a peak (don’t forget to offset the point you mark as a peak by the derivative tolerance you used). See a problem? You should. Although the derivative is much smoother, this will mark off even the most shallow peaks (for example, a slope change of 0.005 changing to -0.0001 will mark as a peak). So, introduce a tolerance.
Derivative tolerance is another parameterized value that you can use to reject shallow or sloping peaks (you get these a lot in chemical spectra–only sharp peaks should be marked). Simply find the difference in slopes, and if this value is less than a certain amount, reject that point as a peak.
So, all should be good now. You should now have wonderful peaks in all the right places–sometimes. Depending on the values you’ve used (odd or even), you may get a mis-marked peak, potentially one off to the left or the right of the real peak. I took care of this pretty easily by looking at my list of peaks, and for each one, looking at it’s neighboring points. If the neighboring point was higher, then that’s the real peak.
So, in summary: 1. Smooth your derivatives 2. Set a tolerance on peaks 3. Make sure what you have is really a peak
Happy peak finding!
Comments
No comments have been made. post comment