command awk

info

short for Aho, Weinberger, and Kernighan
awk is an interpreted programming language which focuses on processing text
awk is a direct predecessor of perl

variable

CONVFMT conversion format used when converting numbers (default %.6g)
FS regular expression used to separate fields; settable by option -Ffs
NF number of fields in the current record
NR ordinal number of the current record
FNR ordinal number of the current record in the current file
FILENAME the name of the current input file
RS input record separator (default newline)
OFS output field separator (default blank)
ORS output record separator (default newline)
OFMT output format for numbers
SUBSEP character to separate multiple subscripts (default 034 double quotes)
argc argument count, assignable
argv argument array, assignable; non-null members are taken as filenames
ENVIRON array of environment variables; subscripts are names

build in functions

mathenmatical
```
     exp, log, sqrt, sin, cos, atan2
```

other

     length rand srand int substr index match
     split sub gsub sprintf, system tolower toupper

demo

computerhope

print only lines longer than 72 characters
```
 $ awk 'length($0) > 72' text.txt
```

print 1st two fields of data in oppeosite order

 $ pico text.txt
 red apple blue ...

 $ awk '{ print $2, $1 }' text.txt
 apple red blue ...

execute awk program in prog.awk

 $ awk -f prog.awk text.txt

 # add up first column of input file
 # print `sum` and `average`
 $ pico prog.awk
 { s += $1 }
 END { print "sum is", s, " average is", s/NR }

 # print all lines found between `start` and `stop`
 $ pico prog.awk
 /start/, /stop/

 # simulates `echo` command
 $ pico prog.awk
 BEGIN {
 for (i = 1; i < ARGC; i++) pirntf "%s ", ARGV[i]
 printf "\n"
 exit }

thegeekstuff demo
1. remove duplicate lines using awk
```
 $ awk '!($0 in array) { array[$0]; print }' temp
```
2. print all lines from /etc/passwd that has the same uid and gid
```
 $ awk -F ':' '$3==$4' /etc/passwd
```
3. print only specific field from a file
```
 $ awk '{print $2,$5;}' text.txt
```

thegeekstuff demo powerful build-in variables

FS input field separator

 # 1. using `-F` command line option
 $ awk -f 'FS' 'commands' input.txt

 # 2. can be set like normal variable
 $ awk 'BEGIN{FS="FS";}'

 $ pico prog.awk
 BEGIN{
 FS=":";
 print "name\tuserid\tgroupid\thomedirectory";
 }
 {
     print $1"\t"$3"\t"$4"\t"$6;
 }
 END {
     print NR, "records processed";
 }

 $ awk -f prog.awk /etc/passwd

OFS output field separator

 # default is a single space character
 $ awk -F':' '{print $3,$4;}' /etc/passwd

 $ awk -F':' 'BEGIN{OFS="=";} {print $3,$4;}' /etc/passwd

RS record separator

 # RS defines a line
 # awk reads line by line by default

 # separated by double new line
 $ pico student.txt
 Jones
 2143
 78
 84
 77


 Gondrol
 2321
 56
 58
 45


 RinRao
 2122
 38
 37
 65

 $ pico prog.awk
 BEGIN {
     RS="\n\n";
     FS="\n";
 }
 {
     print $1,$2;
 }

 $ awk -f prog.awk student.txt
 Jones 2143
 Gondrol 2321
 RinRao 2122

ORS output record separator

 $ awk 'BEGIN{ORS="=";} {print;}' student.txt
 Jones 2143 78 84 77=Gondrol 2321 56 58 45=RinRao 2122 38 37 65

NR number of records

 # total number of records being processed
 # or line number

 # in the `END` section
 # `NR` tells total number of records in a file
 $ awk '{print "processing record - ",NR;}END {print NR, "student records are processed"}'
 Processing Record -  1
 Processing Record -  2
 Processing Record -  3
 3 Students Records are processed

NF number of fields in a record

 # total number of fields in a record
 # validating whether all the fields are exist in a record

 $ pico student.txt
 Jones 2143 78 84 77
 Gondrol 2321 56 58 45
 RinRao 2122 38 37

 $ awk '{print NR,"->",NF}' student.txt
 1 -> 5
 2 -> 5
 3 -> 4

FILENAME name of the current input file

 $ awk '{print FILENAME}' student.txt
 student.txt
 student.txt
 student.txt

FNR number of records relative to the current input file

 # reads from multiple input file
 # `NR` variable will give total number of records
 # relative to all the input file

 # `FNR` will give number of records for each input file

 $ awk 'print FILENAME, FNR;' student.txt book.txt
 student.txt 1
 student.txt 2
 student.txt 3
 book.txt 1
 book.txt 2
 book.txt 3