Wednesday, December 28, 2011

Fedora 15 installation notes

These are some notes for my Fedora 15 setup.  Not very interesting, just a reminder of the steps I need to perform to get a fresh system up and running.

Steps

  1. read release notes
  2. read bug list
  3. install Software Development packages
  4. set network proxy (if installing at corp)
  5. enable network card - activate on boot
  6. edit /boot/grub/menu.lst
  7. add xterm to task bar, change default profile to unlimited, 9pt font
  8. add System Monitor to taskbar, change prefs to all monitors
  9. add my user to sudoers "su -", then edit /etc/sudoers
  10. download and install VMware Player "sudo sh VMwarePlayer.bundle"
  11. setup ntlmaps (if installing at corp), point firefox to use localhost:5865
  12. add free/non-free repos
  13. setup Google yum repo, download Chrome
  14. setup Google as default home page in Firefox
  15. install cinelerra dependencies
  16. download cinMonty
  17. compile cinMonty (autogen.sh/make --disable-mmx/make install)
  18. update VMware Tools
  19. remove nouveau (rpm -e xorg-x11-drv-nouveau --nodeps, nouveau.modeset=0 rdblacklist=nouveau in grub, yum install kmod-nvidia)
  20. install NVidia driver
  21. install rdesktop, setup sessions
  22. setup .bashrc with aliases
  23. install iptraf, mplayer, libcurl, iotop, ImageMagick, xfce4, xfce4 power, session and window managers
  24. run update
  25. System -> Sessions and Startup -> Auto save session on logout
  26. install system-config-lvm
That will provide for a decent base Fedora system.
TAG

Tuesday, December 27, 2011

scary awk magic

AWK is quite a powerful program that can do all sorts of gymnastics with structured text files.  Here's one example.

Say you have a file with four columns (depicted below) delimited by tabs:
-an id field
-a JSON string of dates
-another JSON string of values associated to those dates
-a last column that has the total of those values in column 3

Source File
6237    [20111102,20111103,20111104,20111105,20111106,20111107,20111108,20111109,20111110,20111111,20111112,20111113,20111114,20111115,20111116,20111117,20111118,20111119,20111120,20111121,20111122,20111123,20111124,20111125,20111126,20111127,20111128,20111129,20111130,20111201,20111202,20111203,20111204,20111205,20111206,20111207,20111208,20111209,20111210,20111211,20111212,20111213,20111214,20111215,20111216]  [286,291,276,274,339,424,428,408,458,401,354,398,543,571,631,581,515,516,571,721,768,943,960,917,899,941,1168,1274,1256,1527,1472,1315,1323,1741,1826,1958,1988,1760,1662,1770,2088,2117,2325,2588,2214]    47786
6237    [20111102,20111103,20111104,20111105,20111106,20111107,20111108,20111109,20111110,20111111,20111112,20111113,20111114,20111115,20111116,20111117,20111118,20111119,20111120,20111121,20111122,20111123,20111124,20111125,20111126,20111127,20111128,20111129,20111130,20111201,20111202,20111203,20111204,20111205,20111206,20111207,20111208,20111209,20111210,20111211,20111212,20111213,20111214,20111215,20111216]  [286,291,276,274,339,424,428,408,458,401,354,398,543,571,631,581,515,516,571,721,768,943,960,917,899,941,1168,1274,1256,1527,1472,1315,1323,1741,1826,1958,1988,1760,1662,1770,2088,2117,2325,2588,2214]    47786
6237    [20111102,20111103,20111104,20111105,20111106,20111107,20111109,20111110,20111111,20111112,20111113,20111114,20111115,20111116,20111117,20111118,20111119,20111120,20111121,20111122,20111123,20111124,20111125,20111126,20111127,20111128,20111129,20111130]       [286,291,276,274,339,424,408,458,401,354,398,543,571,631,581,515,516,571,721,768,943,960,917,899,941,1168,1274,1256]        17684
6237    [20111102,20111103,20111104,20111105,20111106,20111107,20111108,20111109,20111110,20111111,20111112,20111113,20111114,20111115,20111116,20111117,20111118,20111119,20111120,20111121,20111122,20111123,20111124,20111125,20111126,20111127,20111128,20111129,20111130,20111201,20111202,20111203,20111204,20111205,20111206]        [286,291,276,274,339,424,428,408,458,401,354,398,543,571,631,581,515,516,571,721,768,943,960,917,899,941,1168,1274,1256,1527,1472,1315,1323,1741,1826]      27316
10778   [20111102,20111103,20111104,20111105,20111106,20111107,20111108,20111109,20111110,20111111,20111112,20111113,20111114,20111115,20111116,20111117,20111118,20111119,20111120,20111121,20111122,20111123,20111124,20111125,20111126,20111127,20111128,20111129,20111130,20111201,20111202,20111203,20111204,20111205,20111206]        [48,47,37,41,49,35,49,47,59,46,37,41,39,67,60,59,45,39,44,58,62,47,62,56,70,55,57,53,35,59,57,42,45,61,63]  1771
10778   [20111102,20111103,20111104,20111105,20111106,20111107,20111109,20111110,20111111,20111112,20111113,20111114,20111115,20111116,20111117,20111118,20111119,20111120,20111121,20111122,20111123,20111124,20111125,20111126,20111127,20111128,20111129,20111130]       [48,47,37,41,49,35,47,59,46,37,41,39,67,60,59,45,39,44,58,62,47,62,56,70,55,57,53,35]   1395
10778   [20111102,20111103,20111104,20111105,20111106,20111107,20111108,20111109,20111110,20111111,20111112,20111113,20111114,20111115,20111116,20111117,20111118,20111119,20111120,20111121,20111122,20111123,20111124,20111125,20111126,20111127,20111128,20111129,20111130,20111201,20111202,20111203,20111204,20111205,20111206,20111207,20111208,20111209,20111210,20111211,20111212,20111213,20111214,20111215,20111216]  [48,47,37,41,49,35,49,47,59,46,37,41,39,67,60,59,45,39,44,58,62,47,62,56,70,55,57,53,35,59,57,42,45,61,63,55,10633,51,41,36,37,62,41,53,45] 12825

The Goal
Notice that the row id's repeat a variable number of times.  My task was to use the first column as an index, and only take the last line in the file related to that index.


The Solution
The solution looks something hideous like this:
awk -F"\t" 'NR!=1 {a[$1]; b[$1]=$2 ; c[$1]=$3; d[$1]=$4} \
END{OFS="\t";for (i in a) printf("%s\t%s\t%s\t%s\n", i, b[i], c[i], d[i])}' \
file.txt

Breaking it apart into separate lines helps a bit (the backslash allows you to enter the pieces of the command on multiple lines in a Linux terminal window):
awk -F"\t"\
'NR!=1 {a[$1]; b[$1]=$2 ; c[$1]=$3; d[$1]=$4} \
END\
{OFS="\t";\
for (i in a)\ 
printf("%s\t%s\t%s\t%s\n", i, b[i], c[i], d[i])}'|\
file.txt

Let's take this apart piece by piece to see what it means.

awk -F"\t" start awk and use the tab delimiter to separate out the columns
NR!=1 read in all but the first record
a an array of values called a
a[$1] assign values to array named a, the values are to be found in the first column ($1) of the data file
b[$1]=$2 for the array b, at the index of $1, assign value of the second column ($2) of the data file
c[$1]=$3 for the array c, at the index of $1, assign value of the third column ($3) of the data file
d[$1]=$4 for the array d, at the index of $1, assign value of the fourth column ($4) of the data file
END do the following at the last
OFS="\t" output field separator is a tab
for (i in a) for each value in the array a, do the following
print.. print the current value for i (the index, the first column)
print the last value for the second column at index i: b[i]
print the last value for the third column at index i: c[i]
print the last value for the fourth column at index i: d[i]

Final Output
Now when I run the program, I get this for the output:
6237    [20111102,20111103,20111104,20111105,20111106,20111107,20111108,20111109,20111110,20111111,20111112,20111113,20111114,20111115,20111116,20111117,20111118,20111119,20111120,20111121,20111122,20111123,20111124,20111125,20111126,20111127,20111128,20111129,20111130,20111201,20111202,20111203,20111204,20111205,20111206]        [286,291,276,274,339,424,428,408,458,401,354,398,543,571,631,581,515,516,571,721,768,943,960,917,899,941,1168,1274,1256,1527,1472,1315,1323,1741,1826]      27316
10778   [20111102,20111103,20111104,20111105,20111106,20111107,20111108,20111109,20111110,20111111,20111112,20111113,20111114,20111115,20111116,20111117,20111118,20111119,20111120,20111121,20111122,20111123,20111124,20111125,20111126,20111127,20111128,20111129,20111130,20111201,20111202,20111203,20111204,20111205,20111206,20111207,20111208,20111209,20111210,20111211,20111212,20111213,20111214,20111215,20111216]  [48,47,37,41,49,35,49,47,59,46,37,41,39,67,60,59,45,39,44,58,62,47,62,56,70,55,57,53,35,59,57,42,45,61,63,55,10633,51,41,36,37,62,41,53,45] 12825

Note there are only two lines now instead of seven and that I only have the last two instances of the value indexed in column one.

And that's some funky AWK magic.

Reference
http://en.wikipedia.org/wiki/AWK
Feel free to drop me a line or ask me a question.