Winston Chang’s R Graphics Cookbook

Winston, who is a total class act and a huge contributor to the success of R (not to mention a Twin Cities person) will be at the Twin Cities R Meetup on 6 June at Common Roots Cafe.

There’s a meetup.com for it here.

Come out, buy a copy of Winston’s book and help support one of the people who helped make R as wonderfully usable as it is today.

Tell Me A Story

It’s so easy to get caught up in the shiny part of data stuff: software and packages and plugins and algorithms and and and, that one can completely forget that what we analysts do isn’t just analyze.

We also communicate.

Here’s the basics, from Pixar:

HOW TO TELL A STORY:

Once upon a time, there was ____________.

Every day, ___________.

One day, ____________.

Because of that, _____________.

And because of THAT, ____________.

Until finally, _______________.

The End.

 

Using Google Authenticator on Raspberry Pi

Well, this was a lot easier than I thought it would be.

So if you’re the type who wants to have a *nix box accessible from the outside world (maybe you had a find/grep/sed/awk emergency in your past and now you’re traumatized) but don’t want to have people just beating on your machine all day until they crack the password (which they will if they’re patient and smart enough), you can set up Google two-factor auth on your Pi.

Simple steps:

  • Log into your Pi as whatever user you need.  I’m guessing most people just use the Pi  username.
  • Do a sudo apt-get install libpam-google-authenticator.  This will install the PAM and also install the dependent library (the name of which eludes me at the moment).
  • Run google-authenticator at a shell prompt (no need for a GUI here) and it’ll generate your QR for scanning (again, works just fine in a terminal window) and give you some options about timing, rate limiting, etc. I make things as secure as I can because otherwise, why would I bother adding 2-factor auth to start with?
  • Scan that code with your phone/tablet/whatever.  It’ll add your username automagically.
  • Do a sudo vim /etc/ssh/sshd_config and change ChallengeResponseAuthentication to yes (it’s defaulted as no). Be sure to comment when and why you changed it and what it was originally.
  • Do another sudo vim /etc/pam.d/sshd and add this at the very top of the file (with a nice comment about when, who and why you added it, of course): auth required pam_google_authenticator.so
  • Roll your sshd process with service ssh restart

That’s it.  Log out, log back in.  It’ll ask for your authentication code before it asks for your password.

Git Along, Little Dogies

So a thing is happening that has really got me thinking about my analysis and coding and other stuff.

So I’m taking the opportunity to pick up my game, and have started diving back into Project Euler. In specific, I’m going through my 23 completed problems that were done in Python and Perl, and reworking them in R.

Project Euler, generally speaking, is a site where you can work math problems computationally, submit the answers, discuss them on a forum, and badge up as you go. It really seems like a “Many are called, few are chosen” type of thing.  From their site:

  • There are a total of 298314 registered members who have solved at least one problem.
  • 4580318 correct solutions have been submitted which is an average of 15.4 per member.
  • So far 49766 members have solved 25 or more problems, which represents 16.68% of all members.
  • There are 123 outstandingly talented members at the current maximum level (solved 375+ problems).

Which means that I’ve already done more than the median and I’m nearing that 16.68% group. That’s pretty cool. Especially that I started this knowing pretty much jack squat about applied math (doing Euler actually got me inspired to start a BS degree in Applied Math at my local state university, that’s how cool this stuff is) or Python. I knew some Perl but I certainly wasn’t and am still not in any way a guru, ninja, rockstar or bro. I don’t even lift. 

So back to the R part of this. I can hear you complaining already. R has a weird syntax, it’s difficult, etc. All of which are true. But in its own way, it’s kind of an elegant creature. As I go back through these problems, I find myself giggling in that, “Oh, that is beautiful” way that happens when I have a geek win.  I didn’t do that so much with Python or Perl solutions to the Euler problems.

I think a lot of that comes from that when I find myself struggling (read: failing repeatedly) with these in R, it’s almost always because I’m not doing them R-ly.  I’m trying to force the old Python or Perl code into R. But the beauty of R is that it’s a vector language and once you can get your code vectorized, it’s smooth like a buttered bobsled run.

For bonus points, after having finally upgraded This Old Mac (srsly, it’s a 2007 aluminum core duo iMac.  But I’m bound and determined to get another 2 yrs out of it before I sell it and upgrade. Because I’m stubborn that way.) to Mountain Leon from Snow Leonard, I can finally use the GitHub client again. So I decided to learn how to use git and GitHub by pushing my Euler R code out there for the world to see (and laugh at).

https://github.com/ClintWeathers/ProjectEulerInR

So there you have it.  Comment, criticize, fork at will.

Avoid The Headaches of Pip

So since my compadre Geo got his Raspberry Pi working, I decided to make another foray (one might remember my very fun explorations of getting Debian going on a hacked Pink Pogoplug) and got my Raspberry Pi rev 2 back out.

Previously, to be honest, it wasnt a whole lot of fun.  The OS install was way simple, there wasn’t much to it and once I got it going, I realized “Well, Ok.  I have a small linux box. Woot.” And then I pretty much put it back in the box.

But since I need access to a Calibre server I figured I’d use the Pi this time instead of the Pogoplug.

So.

After another easy OS install (see: The Googs), I got thinking maybe this would be a fun platform for some nerd stuff.  Euler problems, going through Hilary Mason/Drew Conway’s data science/machine learning videos, etc.

Let me save you hours of headache and hassle and just cut to the chase:

Don’t use pip install for anything if you can avoid it.

I know. It’s counter-intutitive. Maybe even blasphemous.

sudo apt-get install is your friend.

Works just peachy for numpy scipy matplotlib Pycluster hcluster and all that stuff the cool kids are using these days.

Just keep your apt-get update updated and your apt-get upgrade upgraded and you’ll have  fresh versions.

Friends don’t let friends use pip.

 

Using R Variables As Parameters

Notes for myself of R stuff. Maybe someone will get some use from them.

So I bumped into this problem this week:

I was doing some datamungery of json that I’d gotten out of our Solr engine. Now keep in mind that Solr will return json, xml or csv. But csv is just ugly unless you’re needing stuff to drop directly into a dataframe or Excel and XML wouldn’t work because what I needed was actually being listed as an attribute rather than a result.

<data “The data that Clint needs here”> ,8 </data>

Like that, see?  And using the XML package in R and xmlTreeParse wasnt getting me anywhere because the only thing I could search on was the results. “,8″ in this case.

Eventually I hit on something from StackOverflow that used something like this:

lines2bParsed <- readLines(textConnection(getURL(“http://oursolrserver.company.com?querygoeshere”)  which strings together the Curl request piped to a text connection piped to a readLines.

So text connections in R are freaking cool. They let you read in data from a character vector like it was a file. How awesome is that? Then you can just readLines it and you get actual text that you can do something with instead of some weird XML/json/HTML tree thing object. (Note, I tried readLines(“URL of our solr server with the paramaterized query here”) and it didn’t like it, even a little. It really needed the Curl request for the authentication.)

I digress.

So after I got all of that, I had to munge and clean it.  Basically get it from json to clean data of the type I needed in the format that my next step needed.

A few regexes later and I’ve got it cleaned (It might be totally uncool, but I really like to do my regexes in increments in case something breaks, that way I can isolate the fault), but I only need the lines from the json string that have the stuff I need. Luckily, the stuff I need kind of stands out, so again from StackOverflow I find out how to subset based on some sort of parameter:

cleanedData <- lines2bParsed[grep(“your regex goes here”, lines2bParsed)

But there was a problem. Something in my regex was making the whole thing puke. On an off chance, I tried something I didn’t think would work at all, but did and that’s the whole point of this post: You can use an object in R as your regex in a grep. And probably in lots of other stuff where it calls for a regex.  Check this out:

horribleComplicatedAwfulHeironymousBoschRegEx <- “your reg ex goes here”

Now when you do your subset, you can just drop the name of your object in place of the actual pattern to be grepped. Separate the regex itself as an object and it becomes that much simpler to deal with. Unhairball that stuff.

Simplify, simplify, simplify.

SQL Joins as Venn Diagrams

This image has been getting some love, so I thought I’d post it here so I can find it when I need.

It’s pretty self-explanatory, except for my own $.02 of that I love getting to see discrete math come to life. Set theory is pretty cool stuff, not too tough to wrap your head around, and anytime you can make math useful, that’s big nerd points.

Inner/outer joins have given me that vein in the forehead kind of headache before and I like this chart a lot more than I like having to take Advil.

sql_joins

Datageeks-BNA

Greetings from Nashville, Tennessee!

I’m down here visiting family and working remote for a bit, so while I’m down here I’m going to dig around and try to find some other #rstats/python/data geeks to talk to.

Sadly, even though Use-R 2012 was just 10 miles from where I am currently, I wasn’t able to make it. Hopefully I can make up for that on this trip.

If nothing else, I hope to provide some measure of reproducable data to prove that Kansas City BBQ is indeed better than Tennessee BBQ.