Monday, June 4, 2007

Textpad Searches

I use textpad a lot because of its cool features. Some of the common usecases are
1) Typing just text not worrying about the formating and yet see the typed matter nicely.
2) Searching for a specific words in a group of files
3) Search for a word and replacing it with some other word.

Here is what I am going to write something interesting about the third case. I stumbled upon some data which is in html format with lots of other details and i want to extract some numbers and use them in excel to find out averages, percentiles etc.

The screenshot attached below shows you the content that i was looking. The numbers in the last but one column is the one that i was interested in.



So I just copied the whole content and pasted in textpad and I thought most of the unwanted data is repeating and so i can remove them by just do a search and replace until i got stumbled by the date field. The date field is tricky to replace, every entry changes by milli second. I thought i should use a Regular expression to find all the date entries. Though I know the use of regular expressions in textpad i never tried it as I was lazy to search the net and look for good quick reference for RE(Regular Expressions). This time I thought to give a try and somehow i got an idea to use the help provided by Textpad.

I found the reference given in the Textpad help very useful and within minutes I was able to write a basic unoptimized regular expression: 9/26/2008 [0-9]*:[0-9]*:[0-9]* [A-Z]* . This has matched one of the various words: 9/26/2008 6:16:16 AM. This looked cool to me and found one way of using my technical skills to handle a day to day problem...lol..

I finally was able to get only the numbers, pasted them on excel single column and got what i wanted. Before i finished publishing this post, I talked to my friend Sandeep about what i am going to write in my blog and he said I could have used Text to column feature in Excel. :)) ... Here is my conclusion..Text to column is very easy to use feature to handle simple data but RE stands out for handling complex data having complex separation rules...


No comments: