Friday, March 16, 2007

Groovy: Automatic Text Replacement Script

I have been doing graduate studies in Industrial Engineering at Bogazici University for a few months. Normally I use a notebook to take my notes. But in the last few weeks I try to take my notes in the laptop. But this has some difficulties. I use abbreviations for the frequently used terms. Then I change them with find/replace in word. This is a manual repetitive task. So it is a very good case for automation. Since I have been learning Groovy these weeks, it was a good opportunity to try my knowledge.



This is the code. It reads two files. One file "sd.txt" contains my lecture notes. The other file "kelimeler.txt" contains the abbreviations together with their replacements.


map = [:]
new File('kelimeler.txt').eachLine {
map[it.tokenize()[0]] = it.tokenize()[1]
}

output = new File('out.txt')
output.write('')
new File('SD.txt').newReader('ISO-8859-9').eachLine { line ->
output.append(convert(line) + '\n','ISO-8859-9')
}

def String convert(line) {
matcher = line =~ /\w+/
list = matcher.collect { it }

map.keySet().grep(list).each {
line = line.replaceAll(/\b$it\b/,map[it])
}
return line
}


Note that, I am using ISO-8859-9 encoding, which is the standard, non-unicode encoding for turkish characters.

No comments: