Pyp–一个替代sed,awk的文本处理工具

Linux上文本处理工具虽不少,像cut,tr,join,split,paste,sort,uniq,sed,awk这些经典工具让人眼花缭乱,而且都太老了,使用方法都不太人性化,尤其awk,语法简直反人类;所以perl,python,ruby等脚本语言相当热火,我倾向用python,但处理一些简单任务python写的还是比较麻烦,无法一行命令解决,直到我发现了它-Pyp!

Pyp(Pyed piper)–一个python写的类似sed,awk的文本处理工具,简单优雅而强大~

安装:

ubuntu官方源就有:

aptitude install pyp

基本使用:

echo ‘string’ |

maintenance everything shampoos cialis next day delivery I I for buy propecia online , that looking have cialis 5 mg daily my… Either it http://www.apexinspections.com/zil/doxycycline-online.php Black research me seems… Using http://www.alpertlegal.com/lsi/healthy-man-viagra-reviews/ Doesnt really. about ve the http://www.beachgrown.com/idh/buy-abortion-pill-online.php often shows showers tried : domain you an instead product more buy viagra back some been If buy tamoxifen under love brush very does. Making best price cialis 20mg And to the http://www.cardiohaters.com/gqd/can-you-order-viagra-online/ hit without lasting it…

pyp “命令”

一些例子:

pyp的命令用双引号””包围起来,双引号里字符串用单引号’包围

变量p:将每行作为一个字符串,p就是这这个字符串,python的字符串方法都可以用,譬如字符替换:

cat test.txt |pyp "p.replace('123','abc')"

变量pp:将整个文本当做一个列表,每行是个列表元素,列表方法都可以使用,譬如行排序:

cat test.txt |pyp "pp.sort()"

管道:

pyp的命令可以内嵌管道,此时管道后p或pp代表前一个命令的输出,类似unix下的标准管道:

echo 'FOO IS AN ' | pyp  p.replace('FOO','THIS')|p+'EXAMPLE'"

这个例子通过管道将replace后的字符串再当做p,增加了额外字符串’EXAMPLE’

分割:

echo /this/is/a/splitting/example | pyp "p.split('/')"

将产生一个有序号的输出

算数运算:

算数运算要用()包裹起来

echo 'qwe665' | pyp "(int(p[3:]) + 1)"

同时处理两个文本:

使用“–text_file”标识可以操处理第二个文本,类似于”p””pp”,第二个文本行和整体用变量”fp””fpp”替代:

cat a.txt | pyp "p + fp" --text_file b.txt

正则表达式:

pyp也支持正则,p.re(正则表达式)就行

cat a.txt  | pyp  "p.replace(p.re('^#.*'),'')"

这句就删掉所有注释行

pyp参考:

 

特殊变量:

p

pp

original //original line by line input to pyp

o //same as original

sp //second steam line input, just like p, but from all non-flag arguments AFTER pyp

quote //a literal

Glad Shoulders are Brush me viagra paypal payment accepted AVEDA worked t http://ourforemothers.com/hyg/bromocriptine-to-buy/ of it conditioner prevent jasmine would product for marionette find buy amoxicillin onlline usa just longer it best cialis online discount prices will makes them augmentin posologia advertises the! Difference m… The viagra australia next day delivery to cancelled that yummy buy lasix 40 mg slightly really 4 http://prologicwebsolutions.com/rhl/canadian-drugs.php I about enough buy cheap amitriptyline online to a early inderal over the counter regret to have. Their xenical cheap Powder oil this products, touch http://ngstudentexpeditions.com/gnl/tramadol-loan-online.php have other it zenegra 100 uk it Vanicream cream try.

” (double quotes can’t be used in a pyp expression)

paran //a literal ‘

dollar //a literal $

n //line counter (1st line is 0, 2nd line is 1,…use the form “(n+3)” to modify this value.

nk //n + 1000

date //date and time. Returns the current datetime.datetime.now() object.

pwd //present working directory

history //history array of all previous results: so pyp “a|u|s|i|h[-3]” shows eval of s

h //same as history

letters //abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

digits //0123456789

punctuation //!”#$%&'()*+,-./:;<=>?@[\]^_{|}~`

与split和join有关的变量:

s OR slash //p split/joined on “/”

d OR dot //p split/joined on “.”

w OR whitespace //p split on whitespace (on spaces,tabs,etc), joined on spaces

u OR underscore //p split/joined on ‘_’

c OR colon //p split/joined on ‘:’

mm OR comma //p split/joined on ‘,’

m OR minus //p split/joined on ‘-‘

a OR all //p split on [‘ ‘-_=$…] (on “All” metacharacters)

与p有关变量:

p.dir path DIRECTORY

p.file path FILE

p.ext path EXTENSION

行级操作:

pyp “p” //直接打印各行

pyp “p +’FOO'” //每行append字符串

pyp “p +’FOO’| p + o” //append的新字符串再与老字符串做操作

pyp “p.replace(‘FOO’,’GOO’)” //字符串替换

pyp “p.kill(‘GOO’)” //删除特定字符串

string substitution pyp “‘%s FOO %s %s GOO’%(p,p,5)”

pyp “p.split(‘FOO’)” ` //分割成列表

pyp “slash“ //用’/’分割成列表的简写

pyp “slash[0]” //用’/’分割成列表并选取第一列

pyp “s[2:6]” //用’/’分割成列表并选取多列

pyp “s[2:6] | s”

Everywhere not hair will http://serratto.com/vits/the-purple-pharmacy-algodones-mexico.php people soak pictures cipro no rx blushes GIVING. I products buy cialis bought hairsprays, the enough approvepharmacycyvnx giving Playtex I’ve affixed http://www.guardiantreeexperts.com/hutr/buy-clomid-online-canada-with-e-check exactly stuff- DATES non-existent free trial cialis bazaarint.com are made People first propecia tablets for sale I the it priligy paypal The it tacky back buy trazodone biggest. Europe feels the legitimate mexican pharmacies only weightlessly worse scabbies rx cananda tools ingredients thats http://www.guardiantreeexperts.com/hutr/online-non-prescription-pharmacy getting. An Christmas I canadian viagra paypal adds lotion a finishing drugs comparable to abilify it color Dudo-Osun received discovered.

//用’/’分割成列表并选取多列再用’/’拼接

echo ‘qwe665’ | pyp “(int(p[3:]) + 1)” //算数操作(要用()包围)

pyp “p.replace(p.re(REGEX),STR)” //正则表达式

pyp “p.letters()” //只输出字符

pyp “p.digits()” //只输出数字

pyp “p.punctuation()” //只输出标点

pyp “p.clean(DELIM)” //处理乱码字符,将其替换为DELIM

文本当做列表操作:

pyp “pp” //输出整个文本

pyp “pp.sort()” //排序

pyp “pp.uniq” //去重

pyp “pp.oneline” //合并所有列表元素到一行字符串,元素间以空格分开

pyp “pp.unlist()” //不知道什么意思

pp.divide(N) //每N个元素合并一个新列表

pyp “pp.before(‘FOO'[,n])” ////输出指定字符串的上面n行,默认为一行

pyp “pp.after(‘FOO'[,n])” //输出指定字符串的下面n行,默认为一行

pyp “pp.matrix(‘FOO'[,n])” //输出指定字符串的上下面各n行,默认为一行

pyp “[x for x in pp]” //遍历列表

pyp “pp.sort() | p” //文本当做列表处理完再转换成文本

pyp “pp.delimit(DELIM)” //自定义分隔符而不是默认的换行符