Spinning and Text Processing
I have a dirty secret to share, and I hope you won't think less of me once you learn it. I used to be in the internet marketing world and pitched my coaching programs and DVD sets from stages around the United States. Yes, for $999, I'd teach you how to make money online, and if you were one of the first three to sign up, I'd even throw in my friend's dynamite ebook absolutely free!
Truth is, I didn't last long in that space because I'm much more of a do-er than a salesperson, and it would bug me to no end when people would buy my coaching package—at 20% off, but only if you sign up right now!—and then never actually open it and use it to at least try their hand at creating an online business.
That's all in the past, fortunately, but I've retained an interest in those business opportunity pitches and what they're actually selling. Just like the cliché envelope-stuffing job (you know: "Send me $200 in an envelope, and I'll show you how to ask people to send you money!"), it turns out that a lot of online businesses still are predicated on gaming search engines to gain traffic to pages selling daft and usually worthless things.
And, one way that these entrepreneurs game Google and other search engines is by "spinning" to produce lots and lots of content from a single article that they've paid someone a few bucks to write in the first place.
It's all rather uninspiring, except the spinning idea itself is rather interesting, and I've been toying with writing a shell script to allow easy article spinning for quite a long time. There are more prosaic, less questionable uses for this technology too, like in programs or even games that have text messages useful to vary.
The {idea|concept|inspiration} is that each time you'd use a {word|phrase} you instead list a set of {similar words|synonyms|alternative words} and the software automatically picks one {randomly|at random}.
So the previous sentence would come out of the spinner as "The idea is that each time you'd use a phrase you instead list a set of alternative words and the software automatically picks one at random." Got it? Easy enough.
A more advanced spinner might actually tap a thesaurus, and each time it sees a word, push out a set of synonyms automatically, which the other script then randomly simplifies each time it's invoked.
In fact, go read spam blog comments or spam email, and you'll see the output of these sort of contextless sentence manipulations. They can be...weird, like this:
she's got arriving in can easily dresses, still Beth may be 36 yr old men's city servant, outdoors of waking time 'en femme'. she's single, symmetrical in addition thinks to achieve marital, "Eventually..."
But hey, just because there are bad uses, doesn't mean it's not an interesting project to try to code, right? I trust you to exercise good judgment of your own when you explore this script, okay?
Spinning Out the Spinner
The basic tasks of the script are straightforward: parse the input, isolate each word-choice block, pick one at random, then reassemble everything and display it.
To make things a bit easier, I'm going to start by using
fmt
to make
each paragraph one really long line. That way, I then can break the input
into lines that don't have a word-choice block and those that do:
fmt -w$bigwidth "$1" | tr '{' '\n' | tr '}' '\n'
An input line like {this|demo} would then transform.
An input line like
this|demo
would then transform.
See how that works? I'm going to use fmt
again at the end of the
process to clean up the output.
One facet of shell script programming that most people don't realize is
that every loop structure acts as its own subshell, so rather than waste
space and time with a temporary file, I'll pipe the output of
the fmt|tr
sequence directly into a while loop:
fmt -w$bigwidth "$1" | tr '{' '\n' | tr '}' '\n' | \
while read line
do
if [ $( echo "$line" | grep -c '|' ) -gt 0 ] ; then
echo "SPIN THIS: $line"
else
echo "$line"
fi
lines=$(( $lines + 1 ))
done
See how the fmt
line ends with |
\
, and that feeds directly into the while
loop? Very handy structure!
Now I'm going to run this code snippet with the sample input file to see what happens:
$ sh spinner.sh spinme.txt
The
SPIN THIS: idea|concept|inspiration
is that each time you'd use a
SPIN THIS: word|phrase
you instead list a set of
SPIN THIS: similar words|synonyms|alternative words
and the software automatically picks one
SPIN THIS: randomly|at random
.
That pesky period on its own line is a glitch that'll need to be fixed later, but the basic structure of the script is sound: you can parse and break down the input file data and identify which new lines are selector lines.
The Spinning Function
Instead of just prepending SPIN THIS:
before a line that has
choices, that's a perfect place to put in a function call to a separate
block of code that does the actual work.
One of the most interesting parts of the function is how it figures out how
many options there are in the given string. It's a specific instance of
the general question "how many occurrences of X are in string
Y?", and it
exploits the little known -o
flag to
grep
:
grep -o '|' <<< "$*" | wc -l
Take a deep breath; I can talk you through this one! The
<<<
notation is a variation on the here document
(<<
) you've
hopefully already seen in scripts. The difference is that the result is fed
as a single string on stdin.
The "$*"
produces the entire argument as given to the function in
the main block of the script; the |
is the character being
counted, and of course, wc -l
produces the number of matching lines (in
this case, the number of delimiters in the line).
All that, and it's not quite what I want, because a line like
word|phrase
has one delimiter, but two choices. Here's how I solve that
in this first, skeletal version of the function:
function spinline()
{
source="$*"
choices=$(grep -o '|' <<< "$*" | wc -l)
choices=$(( $choices + 1 ))
echo $choices options, spinning --- $source
}
In use:
$ sh spinner.sh spinme.txt
The
3 options, spinning --- idea|concept|inspiration
is that each time you'd use a
2 options, spinning --- word|phrase
you instead list a set of
3 options, spinning --- similar words|synonyms|alternative words
and the software automatically picks one
2 options, spinning --- randomly|at random
.
That's it for this month. Next month, I'll finish up the function, including implementing a way to pick one entry randomly from a set of n choices, then output the cleaned up copy, ready to use in whatever program or utility you'd like.