1 #! perl -lw
2 # prep aid to solving Chu-Carrol's cryptogram statistically
3
4
I was annoyed that a mathematician's introduction to Crytanalysis
http://scienceblogs.com/goodmath/2008/08/introducing_cryptanalysis.php
was in the worked example largely wordsmithing and inadequately statistical. While the crypto puzzle fans have tools that run on MSWindows, I don't by choice. So I hacked together the following. I added Getopt to suppress later enhancements by default for presenting this code to Boston.PM.
This script does word counts only, to see the common words.
30 31 use warnings; 32 use strict; 33 34 use List::Util qw/max/; 35 use Getopt::Long; 36 my ($verbose, $file, $dump,); 37 GetOptions( 38 "file=s" => \$file, # solve a different example 39 "verbose" => \$verbose, # flag 40 "dump" => \$dump, # flag 41 ); 42 43 44
since lower case is easier to read as natural language and upper case is easier to see non-liguistic patterns, it is conventional to write the enciphered text in upper case, eg WSSHU FDVH and clear text in all hower case as in this paragraph. (in military circles, red pencil or ink is used for cleartext, if only to ensure worksheets with clear is burned.)
51
52
53
54 # printf "%s=>'%s', ",$_,$Xlate{$_} for sort keys %Xlate ;
55
56
57
58
The Chu-Carroll cryptogram is appended to the source so read from data.
64 65 # slurp ... must be in {} or after dict load 66 67 my $all = do { local $/; <DATA>; }; 68 69 if (@ARGV) 70 { 71 local $/; 72 $all = <>; 73 } 74 75 # enforce convention 76 $all =~ y/a-z/A-Z/; 77 78 79 if ($dump){ 80 print $all; 81 exit; 82 } 83 84 85 86 my %Count; # used repeatedly 87 88 89 90 91 92 93 94 95 my @Words = split /[\W]/, $all; 96 $Count{$_}++ for @Words; 97 98 99 100 my $n=30; 101 my $seen=0; 102 103 # _score is length * count, except 20*count if length==1 104 # because the words A, I, and plural 'S are very important 105 sub _score {my $w = shift; 106 return $Count{$w}*(1==length($w) ? 20 : length($w)); }; 107 108 for my $w ( 109 sort { +1 * ( _score($b) <=> _score($a) ) } 110 grep { 111 /\w/ 112 113 # ... and $Count{$_}>1 114 } keys %Count 115 ) 116 { 117 118 119 print sprintf("%-12s\t%3s\t%3s", 'word', 'Count', 'length' ) 120 unless $seen++; 121 print sprintf("%-12s\t%3d\t%3d", $w, $Count{$w}, length($w) ); 122 123 124 125 last if $n-- == 0; 126 } 127 128 129 130 131 132 133 134 135 136
140 141 __END__ 142 b czfbczc bh bu gqlvh hbxz hl uhgih vd xt lkp qrly. lpz hjbpy hjgh b jgaz 143 plhbfzc bp xt hbxz qilkubpy hjz qrlyludjziz bu hjgh hjziz giz g kjlrz 144 rlh lm hziibmbf ufbzpfz qrlyu lvh hjziz: ligf, djgitpyvrg, gzhblrlyt, 145 evuh hl pgxz g mzk. qvh hjziz bu plh pzgirt ul xvfj lvh hjziz czcbfghzc 146 hl xghj - gpc bp dgihbfvrgi hl hjz xbuvuz lm xghj. b hjbpw hjgh hjgh bu 147 g cgxp ujgxz, qzfgvuz bp xt zodzibzpfz, lpz lm hjz xluh frzgi kgtu lm 148 bczphbmtbpy g figfwdlh bu hjilvyj xghj. pl xghhzi hjz udzfbmbf uvqezfh, 149 hjz figfwdlhu grkgtu zbhjzi galbc li ufizk vd hjz xghj. kjzhjzi bh'u 150 hjz "xzifvit fgvuzu gvhbux" mlrwu, hjz azrbwlauwbgpu, fizghblpbuhu, 151 grh-xzcbfbpz svgfwu, izdvqrbfgp dlrruhziu, li ufbzphlrlybuhu - tlv fgp 152 grkgtu izflypbnz hjz figfwdlhu qt hjzbi xghj. ul b gx ylbpy hl cl xt 153 qzuh hl dilabcz g albfz lm xghjzxghbfgr ugpbht - qlhj qt 154 ujlkbpy kjgh'u kilpy kbhj hjz qgc xghj urld dvxdzc lvh qt hjz rllpbzu, 155 gpc qt ujlkbpy jlk yllc xghj kliwu. 156