unroll.pl

   1 #!/usr/bin/env perl
   2 use 5.14.1;
   3 use warnings;
   4 use utf8;
   5 use open qw/:std :utf8/;
   6 

unroll - a Twitter thread unroll-er for offline storage

usage

Modules used

Twitter::API

The useful wrapper of Twitter's API.

Data::Dump

using only pp for pretty-printing debug dumps of nested HoAoH structures.

Try::Tiny

Everyone's favorite error handler, needed as remote API's can choke up.

constant

To allow an if DEBUG symbol

  47 
  48 use Twitter::API;
  49 use Data::Dump qw/pp/;
  50 use Twitter::API::Util 'is_twitter_api_error';
  51 use Try::Tiny;
  52 use constant { DEBUG=>0, } ;
  53 

Before using any twitter API application you must use OATH to authorize API access with your account.

See oauth_desktop.pl in the Twitter::API distribution for details.

A permanent application should create it's own application key, but test apps may use the Twitter::API module's own.

The application can read the four security parameters from a protected config file or process environment. This demo reads from ENV.

  69 
  70 my $client = Twitter::API->new_with_traits(
  71     traits => [ qw/ApiMethods RateLimiting DecodeHtmlEntities NormalizeBooleans/ ],
  72     consumer_key        => $ENV{CONSUMER_KEY},
  73     consumer_secret     => $ENV{CONSUMER_SECRET},
  74     access_token        => $ENV{ACCESS_TOKEN},
  75     access_token_secret => $ENV{ACCESS_TOKEN_SECRET},
  76 );
  77 
  78 my $r = $client->verify_credentials;
  79 # say "$$r{screen_name} is authorized";
  80 
  81 my $mentions = $client->mentions;
  82 # for my $status ( @$mentions ) {
  83 my $status; 
  84 

The sole command-line argument is a Twitter status number, which is the =last tweet in a chain from which to unroll backwards.

The imediate purpose for this unroller was to have a non-caching, not-tracking unroll of a historical project's thread on #ordainedslavery, Mass Bay Puritan preachers who owned human beings.

The 101st entry in the thread is the default start in this script https://twiter.com/elevennames/status/1509876985744355329

As a bonus in addition to doing an unroll, this script will also take a heuristic attempt to make a Town index, so it collects tweets in a reversing list @Keepers and an HoA %Towns.

$id is the tweet to next process, starting with the starting point (tail) from argument or default.

 103 
 104 my $id= shift @ARGV // 1509876985744355329; # Latest end of thread, should be a parameter
 105 
 106 my @Keepers;
 107 my %Towns;
 108 

Loop logic is simple, continue looking up $id and chaining until at begining.

 114 
 115 while ($id) {
 116 
 117     try {
 118         $status = $client->show_status($id, { cache=>'none', tweet_mode=>'extended' } );
 119         say ref $status if DEBUG;
 120     }
 121     catch {
 122         die $_ unless is_twitter_api_error($_);
 123      
 124         # The error object includes plenty of information
 125         say $_->http_request->as_string;
 126         say $_->http_response->as_string;
 127         say 'No use retrying right away' if $_->is_permanent_error;
 128         if ( $_->is_token_error ) {
 129             say "There's something wrong with this token."
 130         }
 131         if ( $_->twitter_error_code == 326 ) {
 132             say "Oops! Twitter thinks you're spam bot!";
 133         }
 134 
 135     };
 136 
 137     # say $status->{user}->{screen_name}, q(: ), $status->{full_text};
 138 

$s->{full_text} is the message body needed.

Heuristically grab serial number, names of prelate, town from the tweet.

Hash-tags that apply to the whole series are skipped but otherise likely indicate the town.

User mentions are likely a Historical Society account, and indicate a town.

This heuristic section is tuned to the specific use and would be greatly simplified for generic use! For use on a conversational thread, would want to capture user names (handle and/or display), but since purpose was unrolling soliloquoy thread, that isn't done here.

 154 
 155     # What to save
 156     my @Temp = ($status->{full_text});
 157 
 158     # grab post number, and lead name if possible.
 159     # NOT case-insensitive to avoid needing stop-words
 160     my ($num, $reverend) = ($status->{full_text}) =~ / (?: ^ | \s)  (\d+(?: [.][0-9]+)? ) [.]? \s+ ((?: Rev[erend]*[.]? \s* )? (?: Mr[.]? \s*)?  (?: [[:upper:]][[:word:]]+ \s* )+ )? /xsm;
 161     $num //= q(??);
 162     $reverend //= q();
 163     say pp($status) if  $num eq q(??) or (! defined $reverend and $num !~ /\d+[.]\d+/);
 164 
 165     if ($status->{entities}->{hashtags}) {
 166         my @tags = grep {not $_ =~ m/slavery/ } 
 167                 map {$_->{text}}
 168                      $status->{entities}->{hashtags}->@* ;
 169         unshift $Towns{$_}->@*, "$num. $reverend" for @tags;
 170 
 171         }
 172     if ( scalar $status->{entities}->{user_mentions}->@* ){
 173         push $Towns{$_}->@*, "$num. $reverend" for map {$_->{screen_name}} $status->{entities}->{user_mentions}->@* 
 174 
 175     }
 176     

Annoyingly, pictures ("media") and links ("urls") are in two different forks of the nested HoAo? structure.

 182 
 183     # Need to use extended_entities to see > 1 photo.
 184     # Expanded URLs all have /1, not /1 .. /4, so need media_url instead.
 185 
 186 
 187     if ($status->{extended_entities}->{media}){
 188         # say "Media ", $_->{media_url} for $status->{extended_entities}->{media}->@* ;
 189         push @Temp, $_->{media_url}   for $status->{extended_entities}->{media}->@* ;
 190     }
 191 
 192     # But links are ok in plain entities.
 193     if ($status->{entities}->{urls}){
 194         # say "Link ", $_->{expanded_url} for $status->{entities}->{urls}->@* ;
 195         push @Temp, $_->{expanded_url} for $status->{entities}->{urls}->@* ;
 196     }
 197 

Re-establish our loop invariant.

(Debug code here will dump the tail tweet and bail on the loop; good for debugging heurisic collection.)

 207 
 208     # loop chaining
 209     $id = $status->{in_reply_to_status_id} // undef;
 210     say "PREVIOUS $id" if DEBUG;
 211         
 212     say pp($status) if  DEBUG;
 213 
 214     last if DEBUG;
 215 

by unshiftinging onto @Keepers, the list of tweets is reversed as its collected. (If we pushd, we'd have to do a pop loop or explicit reverse.)

Effect is as if we'd done unshift @Keepers, [ "full text", "url", ...];

 224 
 225     unshift @Keepers, \@Temp;
 226 } # While
 227 

First output is the %Town index, which is produced in sorted order.

This uses the modern postfix-deref notation.

 235 
 236 # Give town index
 237 for my $town  (sort keys %Towns ) {
 238     my $aref = $Towns{$town};
 239     say "$town: ",join(q(, ), $aref->@* );
 240 
 241 }
 242 
 243 

And finally, print the message thread, in original sequence, full text with saved media links.

 250 
 251 # Give full list in order
 252 # say pp \@Keepers;
 253 say "\n\n";
 254 
 255 for my $kept (@Keepers){
 256     say $_ for $kept->@*;
 257     say "";
 258 }