LegNeato! Christian Legnitto's blog about Mozilla, Apple, technology, and random stuff

31Oct/11Off

New tool to generate CSVs from input.mozilla.org data

Our feedback tool (input.mozilla.org) has great data but I found some of the analysis tools lacking. I wrote a new (ugly) python script to generate CSVs from input data. I then feed the CSVs into a spreadsheet program to visualize:

 

This has already become pretty valuable to me for looking at overall feedback, positive/negative for a particular version or query, etc.

It's a bit rough / hacky, there may be bugs, but you can find it here:

http://hg.mozilla.org/users/clegnitto_mozilla.com/release_tools/file/default/scrape_input.py

The tool requires:

  • Python (v2.6+, which is used on Mac 10.6+)
  • The module "requests". Install it with 'sudo easy_install requests' or 'sudo pip install requests'
  • The module "argparse" (should be in python 2.6+). Install it with 'sudo easy_install argparse' or 'sudo pip install argparse'

The tool currently dumps the CSV to stdout, so you'll most likely want to redirect output to a file.

Input now supports OR queries

When writing the tool I realized I needed to support searching for multiple terms. For example, when trying to get feedback on hangs just searching for "hang" wouldn't give me the complete picture. I needed to search for something more like "hang OR freeze OR responding". I supported this in my tool with queries for each search term and aggregating the results.  I quickly realized there were duplicates in the aggregated counts (input doesn't give me unique ids to de-dupe) so I would only get an upper bound.

I hopped into #input on irc.mozilla.org and asked about OR queries. Dave Dash said it'd probably be easy to support, disappeared for a bit, and then BAM, input supported OR queries (using '|' in the web UI, ',' in my tool)! Thanks Dave!

Pivoting on Firefox version (--version)

Details:

  • Show input types and ratios
  • Allows you to see general "quality"...(praise/issues) is essentially a quality index
  • You can restrict the analysis to a particular search if you want by using '--search'
  • Beta and alpha versions are folded into the main version number

Example usage:

  • python scrape_input.py --version 8.0
  • python scrape_input.py --product mobile --version 8.0
  • python scrape_input.py --version 7.0 --search memory
  • python scrape_input.py --version 6.0 --search hang,freeze,responding

Pivoting on input type (--type)

Details:

  • Track input over the life of a product
  • Breaks down input by version and overall
  • You can restrict the analysis to a particular search if you want by using '--search'
  • Beta and alpha versions are folded into the main version number

Example usage:

  • python scrape_input.py --product mobile --type issues
  • python scrape_input.py --type all
  • python scrape_input.py  --type issues --search hang,responding,freeze

Manually graphing in Numbers

  1. Open the CSV file
  2. Select the first row and make it a header row
  3. Select the date column and other columns to graph
  4. Choose "Share X values" after selecting the data in the gear menu
  5. Click the chart button then scatter chart button
  6. Open the inspector (View > Show Inspector)
  7. Click the chart tab
  8. Click the series subtab
  9. Data symbol -> None
  10. Connect points -> Straight

Have fun!

Tagged as: , , , No Comments
28Jan/11Off

Mining input.mozilla.com for fun and profit

I've been looking at Aakash's awesome Firefox Input site to try to figure out how we are doing with Firefox 4. Unlike others who have sifting through individual reports and writing bugs, I've been trying to look at the data in aggregate. I did some adhoc queries over the lifespan of the Firefox 4 betas and thought the results were interesting. Of course, this is not scientific in any way but I think the insights are still valuable. I think some sort of triggers / alarms doing this analysis automatically should be added to input (bug coming soon).

What prompted me to look specifically at input data was a mention of bug 628872 on Twitter. It seemed like a bug that should block Firefox 4 but I wanted to know the extent of the problem. Rather than try to reproduce I went to input and saw the following:

Though likely not statistically significant, there has definitely been an uptick in negative feedback containing "iplayer". This graph even helps to narrow down a regression timeframe. Recognizing the usefulness of this approach, I did a bunch more queries that came to mind.

I first decided to see if the YouTube player had a similar graph:

From the graph it is easy to tell that YouTube feedback has been relatively consistent, spiking every time we release.

In weekly Mozilla meetings there has been talk about a bug on Hotmail's side causing problems for Firefox 4 beta users. Searching for "hotmail" I got:

Clearly users are feeling the pain and letting us know. Similarly, one of the top issues discussed in support reports has been copy and paste. Searching for "copy paste" gave me:

We're on the case (bug 613915) and need to fix the issue before final.

Next I thought about bug 626016 (which is about Facebook chat) so I searched for "chat":

The two spikes are interesting. My completely uninformed guess about the spike on August 12 is Facebook (or some other chat) going down or a server-side website release that went wrong and was quickly rolled back. The spike on the right is likely bug 626016.

Up next I looked at "netflix":

This is another interesting graph. The left spike was likely due to the known issue of bad user-agent detection on Netflix's side (bug 522957). The increased displeasure on the right is likely due to bug 598406. From that known issue "hulu" had similar sniffing problems which look to have been resolved:

This method can also be used to gauge general user sentiment. I knew the removal of the status bar was contentious, so I pulled up the graph for "status":

Clearly you can see the initial displeasure when the change landed in a release and the resulting dropoff. Of course, there is still a level of sustained feedback which has prompted some additional product changes.

Finally, I searched for "apple" with no bug particularly in mind. I was pretty surprised to find this graph:

The spike seemed to spike and fade too suddenly to be a Firefox issue. I did a quick Google search for "apple october 21" and immediately saw what was going on. On the 23rd Apple reported their earnings. Such an event wouldn't normally impact Firefox in any way, but Apple live-streams their earnings report. Because Apple is heavily invested in H.264 they streamed it using that technology. Firefox doesn't support H.264, Firefox users couldn't access the stream and thus were complaining. The complaints were only relevant while the live stream was relevant and disappeared the next day. Fascinating!

I found this sort of analysis interesting and thought provoking and am glad I have a tool like Firefox Input available to me (and the world!).