awk - 数据提取和报告工具 (2) 进阶

  • 原创
  • Madman
  • /
  • 2018-04-13 09:10
  • /
  • 0
  • 312 次阅读

awk - 数据提取和报告工具 (2) 进阶-min.png

Synopsis: 详细描述了awk编程语言的基础语法,其中操作符和正则表达式组合后,即为awk中的匹配模式。action是一系列的执行语句,一般包括赋值语句、条件语句、循环语句、输入/输出语句等。awk数组也叫关联数组,其下标既可以是数字也可以是字符串,awk提供了逻辑上模拟二维数组的访问方式,其本质上还是一维数组。可以将数据重定向到一个文件,重定向出现在print或printf语句之后。awk中的重定向与shell命令中的重定向一样,只是它们写在awk程序中。要输出美观的报告,printf格式化怎么少得了

awk系列:


1. 操作符

1.1 算术运算符

1. 加
# awk 'BEGIN { a = 50; b = 20; print "(a + b) = ", (a + b) }'
(a + b) =  70

2. 减
# awk 'BEGIN { a = 50; b = 20; print "(a - b) = ", (a - b) }'
(a - b) =  30

3. 乘
# awk 'BEGIN { a = 50; b = 20; print "(a * b) = ", (a * b) }'
(a * b) =  1000

4. 除
# awk 'BEGIN { a = 50; b = 20; print "(a / b) = ", (a / b) }'
(a / b) =  2.5

5. 余数
# awk 'BEGIN { a = 50; b = 20; print "(a % b) = ", (a % b) }'
(a % b) =  10

6. 指数
# awk 'BEGIN { a = 10; a = a ^ 2; print "a =", a }'
a = 100
或者:
# awk 'BEGIN { a = 10; a = a**2; print "a =", a }'
a = 100

1.2 递增和递减运算符

1. 先自增,再赋值
# awk 'BEGIN { a = 10; b = ++a; printf "a = %d, b = %d\n", a, b }'
a = 11, b = 11

2. 先自减,再赋值
# awk 'BEGIN { a = 10; b = --a; printf "a = %d, b = %d\n", a, b }'
a = 9, b = 9

3. 先赋值,再自增
# awk 'BEGIN { a = 10; b = a++; printf "a = %d, b = %d\n", a, b }'
a = 11, b = 10

4. 先赋值,再自减
# awk 'BEGIN { a = 10; b = a--; printf "a = %d, b = %d\n", a, b }'
a = 9, b = 10

1.3 复合赋值运算符

1. 简单赋值
# awk 'BEGIN { name = "Jerry"; print "My name is", name }'
My name is Jerry

2. 左边变量加上右边的值,再赋值给左边的变量
# awk 'BEGIN { cnt = 10; cnt += 10; print "Counter =", cnt }'
Counter = 20

3. 左边变量减去右边的值,再赋值给左边的变量
# awk 'BEGIN { cnt = 100; cnt -= 10; print "Counter =", cnt }'
Counter = 90

4. 左边变量乘以右边的值,再赋值给左边的变量
# awk 'BEGIN { cnt = 10; cnt *= 10; print "Counter =", cnt }'
Counter = 100

5. 左边变量除以右边的值,再赋值给左边的变量
# awk 'BEGIN { cnt = 100; cnt /= 5; print "Counter =", cnt }'
Counter = 20

6. 左边变量与右边的值求余数,再赋值给左边的变量
# awk 'BEGIN { cnt = 100; cnt %= 8; print "Counter =", cnt }'
Counter = 4

7. 左边变量为底数,右边的值为指数,求出指数,再赋值给左边的变量
# awk 'BEGIN { cnt = 2; cnt ^= 4; print "Counter =", cnt }'
Counter = 16
或者:
# awk 'BEGIN { cnt = 2; cnt **= 4; print "Counter =", cnt }'
Counter = 16

1.4 关系运算符

表达式成立时返回true,执行后续的命令;否则返回false,不执行后续的命令

# awk 'BEGIN { a = 10; b = 10; if (a == b) print "a == b" }'
a == b
# awk 'BEGIN { a = 10; b = 20; if (a != b) print "a != b" }'
a != b
# awk 'BEGIN { a = 10; b = 20; if (a < b) print "a  < b" }'
a  < b
# awk 'BEGIN { a = 10; b = 10; if (a <= b) print "a <= b" }'
a <= b
# awk 'BEGIN { a = 10; b = 20; if (b > a ) print "b > a" }'
b > a
# awk 'BEGIN { a = 10; b = 10; if (b >= a ) print "b >= a" }'
b >= a

1.5 逻辑运算符

1. expr1 && expr2 逻辑与,同时为真时结果才为真。也叫短路与,当且仅当expr1为真时,才验证expr2
# awk 'BEGIN { num = 5; if (num >= 0 && num <= 7) printf "%d is in octal format\n", num }'
5 is in octal format

2. expr1 || expr2 逻辑或,只要一个为真结果就为真。也叫短路或,当且仅当expr1为假时,才验证expr2
# awk 'BEGIN { ch = "\n"; if (ch == " " || ch == "\t" || ch == "\n") print "Current character is whitespace." }'
Current character is whitespace.

3. ! expr1 逻辑非,取反,当expr1为真时,结果为0;当expr1为假时,结果为1。说明: awk中0代表false,非0代表true
# awk 'BEGIN { name = ""; if (! length(name)) print "name is empty string." }'
name is empty string.

1.6 三元运算符

语法为condition expression ? statement1 : statement2,如果表达式返回真,则执行statement1,否则执行statement2

# awk 'BEGIN { a = 10; b = 20; (a > b) ? max = a : max = b; print "Max =", max}'
Max = 20

1.7 正负号

负数的负数是正数

# awk 'BEGIN { a = -10; a = +a; print "a =", a }'
a = -10
# awk 'BEGIN { a = -10; a = -a; print "a =", a }'
a = 10

1.8 字符串连接运算符

空格是合并两个字符串的字符串连接运算符。

# awk 'BEGIN { str1 = "Hello, "; str2 = "World"; str3 = str1 str2; print str3 }'
Hello, World

1.9 数组成员操作符

# awk 'BEGIN { arr[0] = 1; arr[1] = 2; arr[2] = 3; for (i in arr) printf "arr[%d] = %d\n", i, arr[i] }'
arr[0] = 1
arr[1] = 2
arr[2] = 3

1.10 是否匹配正则表达式

规则是pattern { action},其中pattern可以是正则表达式,比如/li/ { print $0 },但是这是说当前记录整个是否与正则表达式匹配,假如我要判断第1列是否匹配/li/呢?这就需要~操作符了。 gawk默认使用GNU ERE,扩展正则表达式。

1. ~ 表示匹配,如果当前记录的第1列匹配/li/正则表达式,则返回true,然后执行后面的命令,因为省略了,默认是{ print $0 }
# awk '$1 ~ /li/' mail-list

2. !~ 表示不匹配
# awk '$1 !~ /li/' mail-list

2. 正则表达式

1. . 匹配除行结尾之外的任何单个字符
# echo -e "cat\nbat\nfun\nfin\nfan" | awk '/f.n/'
fun
fin
fan

2. ^ 待搜索的字串(word)在行首
# echo -e "This\nThat\nThere\nTheir\nthese" | awk '/^The/'
There
Their

3. $ 待搜索的字串(word)在行尾
# echo -e "knife\nknow\nfun\nfin\nfan\nnine" | awk '/n$/'
fun
fin
fan

4. 在[ ]当中仅代表一个待查找的字符
# echo -e "Call\nTall\nBall" | awk '/[CT]all/'
Call
Tall

5. 在[ ]当中^表示反向选择的意思
# echo -e "Call\nTall\nBall" | awk '/[^CT]all/'
Ball

6. | 允许正则表达式进行逻辑或运算
# echo -e "Call\nTall\nBall\nSmall\nShall" | awk '/Call|Ball/'
Call
Ball

7. ? 匹配前一个字符的零次或一次
# echo -e "Colour\nColor" | awk '/Colou?r/'
Colour
Color

8. * 匹配前一个字符的零次或多次
# echo -e "ca\ncat\ncatt" | awk '/cat*/'
ca
cat
catt

9. + 匹配前一个字符的一次或多次
# echo -e "111\n22\n123\n234\n456\n222"  | awk '/2+/'
22
123
234
222

10. ( | ) 表示只在这个组里面进行逻辑或运算
# echo -e "Apple Juice\nApple Pie\nApple Tart\nApple Cake" | awk '/Apple (Juice|Cake)/'
Apple Juice
Apple Cake

3. 数组

awk数组语法array_name[index] = value,也叫做关联数组associative array,其下标既可以是数字也可以是字符串。

1. 创建数组
# awk 'BEGIN {
>    fruits["mango"] = "yellow";
>    fruits["orange"] = "orange"
>    print fruits["orange"] "\n" fruits["mango"]
> }'
orange
yellow

2. 删除数组元素
# awk 'BEGIN {
>    fruits["mango"] = "yellow";
>    fruits["orange"] = "orange";
>    delete fruits["orange"];
>    print fruits["orange"]
> }'

二维数组:

awk提供了逻辑上模拟二维数组的访问方式,其本质上还是一维数组,例:可以使用array[1,2]=10,此时awk是把(1,2)整个当作键,并且将逗号转换为一个特殊字符SUBSEP \034,所以真正的键是1\0342

1. 判断数组键值是否存在,if ( (i, j) in array ) { } 注意下标 i 和 j 必须放在小括号 ()# awk 'BEGIN { a[1,1]="My"; a[1,2]="name"; a[2,1]="is"; a[2,2]="wangy"; if( (2, 2) in a) { print "Item is exist!" } else { print "Not found!" } }'
Item is exist!
# awk 'BEGIN { a[1,1]="My"; a[1,2]="name"; a[2,1]="is"; a[2,2]="wangy"; if( (2, 3) in a) { print "Item is exist!" } else { print "Not found!" } }'
Not found!

2. 循环输出数组内容,按整体键
# cat multiple_array.awk
BEGIN{  
   for(i=1; i<=3; i++){  
       for(j=1; j<=3; j++){  
           array[i, j] = i * j;  
           print i," * ",j," = ",array[i, j];  
       }  
   }  
   # output newline
   print;
   # notice: now the k is (i, j)
   for(k in array){  
       print "Key:",k,"\tValue:",array[k];  
   }  
}
# awk -f multiple_array.awk 
1  *  1  =  1
1  *  2  =  2
1  *  3  =  3
2  *  1  =  2
2  *  2  =  4
2  *  3  =  6
3  *  1  =  3
3  *  2  =  6
3  *  3  =  9

Key: 21     Value: 2
Key: 22     Value: 4
Key: 23     Value: 6
Key: 31     Value: 3
Key: 32     Value: 6
Key: 33     Value: 9
Key: 11     Value: 1
Key: 12     Value: 2
Key: 13     Value: 3

3. 循环输出数组内容,将原来键 i\034j 这个字符串再分隔为一个新的数组 a ,此时 i 和 j 分别为数组 a 的值(键为1 和 2# cat multiple_array.awk
BEGIN{  
   for(i=1; i<=3; i++){  
       for(j=1; j<=3; j++){  
           array[i, j] = i * j;  
           print i," * ",j," = ",array[i, j];  
       }  
   }  
   # output newline
   print;
   # notice: now the k is (i, j), a[1]=i, a[2]=j
   for(k in array){
       split(k, a, SUBSEP);
       print a[1]," * ",a[2]," = ",array[k];  
   }  
}
# awk -f multiple_array.awk 
1  *  1  =  1
1  *  2  =  2
1  *  3  =  3
2  *  1  =  2
2  *  2  =  4
2  *  3  =  6
3  *  1  =  3
3  *  2  =  6
3  *  3  =  9

2  *  1  =  2
2  *  2  =  4
2  *  3  =  6
3  *  1  =  3
3  *  2  =  6
3  *  3  =  9
1  *  1  =  1
1  *  2  =  2
1  *  3  =  3

4. 流程控制

像其他编程语言一样,awk提供了条件语句来控制程序的流程

4.1 if语句

基本用法if (condition) action,如果有多个action语句,则要用大括号包括起来if (condition) { action-1; action-2; ... action-n }

# awk 'BEGIN {num = 10; if (num % 2 == 0) printf "%d is even number.\n", num }'
10 is even number.

4.2 if-else语句

# awk 'BEGIN { num = 11; if (num % 2 == 0) printf "%d is even number.\n", num; else printf "%d is odd number.\n", num }'
11 is odd number.

4.3 if-else if嵌套

# awk 'BEGIN { a=30; if(a == 10) print "a = 10"; else if (a == 20) print "a = 20"; else if (a == 30) print "a = 30"; }'
a = 30

5. 循环

循环用于以重复的方式执行一组操作。只要循环条件为真,循环就会继续执行。

5.1 for

# awk 'BEGIN { for (i = 1; i <= 5; ++i) print i }'

5.2 while

# awk 'BEGIN {i = 1; while (i < 6) { print i; ++i } }'

5.3 do-while

# awk 'BEGIN {i = 1; do { print i; ++i } while (i < 6) }'

5.4 break

# awk 'BEGIN { sum = 0; for (i = 0; i < 20; ++i) { sum += i; if (sum > 50) break; else print "Sum =", sum } }'

5.5 continue

# awk 'BEGIN { sum = 0; for (i = 1; i <= 20; ++i) { if (i % 2 == 0) print i ; else continue } }'

5.6 exit

# awk 'BEGIN { sum = 0; for (i = 0; i < 20; ++i) { sum += i; if (sum > 50) exit(10); else print "Sum =", sum } }'

# echo $?
10

6. 输出重定向

到目前为止,我们在标准输出流中显示数据。我们也可以将数据重定向到一个文件。重定向出现在print或printf语句之后。awk中的重定向与shell命令中的重定向一样,只是它们写在awk程序中。

1. > 覆盖
# echo "Old data" > /tmp/message.txt
# awk 'BEGIN { print "Hello, World !!!" > "/tmp/message.txt" }'
# cat /tmp/message.txt 
Hello, World !!!

2. >> 追加
# echo "Old data" > /tmp/message.txt
# awk 'BEGIN { print "Hello, World !!!" >> "/tmp/message.txt" }'
# cat /tmp/message.txt 
Old data
Hello, World !!!

3. | 管道
# awk 'BEGIN { print "hello, world !!!" | "tr a-z A-Z" }'
HELLO, WORLD !!!

7. printf格式化输出

7.1 转义序列

1. 换行符
# awk 'BEGIN { printf "Hello\nWorld\n" }'

2. 水平制表符
# awk 'BEGIN { printf "Sr No\tName\tSub\tMarks\n" }'

3. 垂直制表符
# awk 'BEGIN { printf "Sr No\vName\vSub\vMarks\n" }'

4. 退格符,删除前一个字符
# awk 'BEGIN { printf "Field 1\bField 2\bField 3\bField 4\n" }'

5. 回车符,在以下示例中,在打印每个字段后,我们执行回车并在当前打印值之上打印下一个值。 这意味着,在最终的输出中,只能看到字段4,因为它是在所有以前的字段之上打印的最后一个东西
# awk 'BEGIN { printf "Field 1\rField 2\rField 3\rField 4\n" }'

6. 换页符
# awk 'BEGIN { printf "Sr No\fName\fSub\fMarks\n" }'

7.2 格式说明符

1. %c ,它打印一个字符。 如果用于%c的参数是数字,则将其视为字符并进行打印。 否则,该参数被假定为一个字符串,并且该字符串的唯一第一个字符被打印出来
# awk 'BEGIN { printf "ASCII value 65 = character %c\n", 65 }'
ASCII value 65 = character A

2. %d 和 %i, 它仅打印十进制数的整数部分
# awk 'BEGIN { printf "Percentags = %d\n", 80.66 }'
Percentags = 80

3. %e 和 %E
# awk 'BEGIN { printf "Percentags = %e\n", 80.66 }'
Percentags = 8.066000e+01
# awk 'BEGIN { printf "Percentags = %E\n", 80.66 }'
Percentags = 8.066000E+01

4. %f
# awk 'BEGIN { printf "Percentags = %f\n", 80.66 }'
Percentags = 80.660000

5. %o, 八进制; %u, 十进制; %x/%X, 十六进制

6. %s,打印字符串
# awk 'BEGIN { printf "Name = %s\n", "Sherlock Holmes" }'

7. 控制宽度,默认是右对齐
# awk 'BEGIN { num1 = 10; num2 = 20; printf "Num1 = %10d\nNum2 = %10d\n", num1, num2 }'
Num1 =         10
Num2 =         20

8. 以0填充
# awk 'BEGIN { num1 = -10; num2 = 20; printf "Num1 = %05d\nNum2 = %05d\n", num1, num2 }'
Num1 = -0010
Num2 = 00020

9. 左对齐
# awk 'BEGIN { num = 10; printf "Num = %-5d\n", num }' | cat -vte
Num = 10   $

10. 它总是在数字值前添加一个符号,即使该值为正数
# awk 'BEGIN { num1 = -10; num2 = 20; printf "Num1 = %+d\nNum2 = %+d\n", num1, num2 }'
Num1 = -10
Num2 = +20

11. 井号, 对于%o,它提供了一个前导零。对于%x和%X,只有当结果不为零时,它才分别提供前导0x或0X。 对于%e,%E,%f和%F,结果总是包含一个小数点。 对于%g和%G,尾随的零不会从结果中删除
# awk 'BEGIN { printf "Octal representation = %#o\nHexadecimal representaion = %#X\n", 10, 10 }'
Octal representation = 012
Hexadecimal representaion = 0XA
分类: Linux
标签: awk gawk GNU
未经允许不得转载: LIFE & SHARE - 王颜公子 » awk - 数据提取和报告工具 (2) 进阶

分享

作者

作者头像

Madman

如果博文内容有误或其它任何问题,欢迎留言评论,我会尽快回复; 或者通过QQ、微信等联系我

0 条评论

暂时还没有评论.

发表评论前请先登录