LegNeato! Christian Legnitto's blog about Mozilla, Apple, technology, and random stuff

22Dec/10Off

Another update on Pulse

With Firefox 4 beta 8 and Firefox 4 beta 3 for mobile out (whew!) it's time to give an update on pulse.mozilla.org.

View Pulse messages live from your browser!

I set up a page to show the messages flowing through the system. You can view it at http://pulse.mozilla.org/live. The page makes for a nifty (barebones) status dashboard of work going on around Mozilla.

The technology behind the page is fairly interesting. I am using Node.js to connect to RabbitMQ via AMQP and serve the messages over Socket.io (which uses long polling or WebSockets) to client code served by Django via WSGI to the browser. That's a lot of alphabet soup but it was surprisingly easy to set up. It's pretty exciting and fun to use theĀ  cutting edge technologies everyone is talking about.

A future blog post will detail exactly what I did, but you can see the Node.js code at http://hg.mozilla.org/users/clegnitto_mozilla.com/pulsewebsite/file/default/node/browser_amqp.js. Simple yet powerful.

General

  • I no longer consider Pulse a prototype. I now consider it in beta and will be doing more evangelism to get people writing tools against the system. Geo in QA has a prototype system written against Pulse and has already suggested some great improvements
  • The system moved to a new, beefier VM in the Phoenix data center. Hopefully the days of running out of disk space and memory are over!
  • I have documented (with video!) how to go from our stock RHEL 5 VM image to a running Pulse instance. I will be writing it up and finishing up the video in the coming weeks

Website

  • The website is finally in Mercurial. It is basically the old static site stuck into a Django template, but at least it lays the groundwork for future work
  • Added the live view functionality as mentioned above

Scrapers

Messages

  • There is now a heartbeat message sent every minute. This message lets people playing around with the system know their code is working and makes it so the web view doesn't have periods of inactivity. To see the messages you can use "PulseTestConsumer" from the python helper library or you can connect to the "org.mozilla.exchange.pulse.test" exchange via a standard AMQP client. There may be other messages sent through that exchange, so if you want to only listen for the heartbeat messages listen for "heartbeat"
  • One of the RelEng buildbot masters is now publishing build messages into Pulse (see bug 614576). To see the messages you can use "BuildConsumer" from the python helper library or you can connect to the "org.mozilla.exchange.build" exchange via a standard AMQP client
  • James Socol and Jeff Balogh have set up some of their Mozilla GitHub repos to publish commit events into Pulse using the GitHub service hook I created. To see the messages you can use "CodeConsumer" from the python helper library or you can connect to the "org.mozilla.exchange.code" exchange via a standard AMQP client. To only listen to GitHub messages filter on "github.#". If you listen for "#" you will also get messages from Mozilla's Mercurial repositories

Code

  • The python helper library now defaults to temporary/non-durable queues. This should make experimentation easier for folks and will lessen the resource requirements on the server
  • The python helper library now specifies its requirements in such a way that easy_install and pip will automatically download necessary dependencies
  • I have started to put example code into Mercurial so new users don't need to copy and paste from the website. The repository is at http://hg.mozilla.org/users/clegnitto_mozilla.com/pulsequickstart/. I intend to expand it a fair amount, add other language examples, etc.

There is also exciting work going on to instrument assorted systems so I can retire the shim/scraper scripts. I'll likely have more to report about that in the coming weeks though.

If anyone has graphic skills and would like to help me out with the website, a logo, or create a cool dashboard using the data flowing though Pulse get in touch! I've been playing around with interesting ways to visualize the data and hope to have more to show in the coming quarter.

10Sep/10Off

bugzilla-amqp is now buzgilla-push, supports the STOMP protocol

The Bugzilla server-side extension I released previously has been renamed to bugzilla-push. It can now be found at http://github.com/LegNeato/bugzilla-push. The main impetus for the name change was that it now supports STOMP in addition to AMQP. It seemed silly to keep "amqp" in the name when it supports multiple protocols.

The reason for adding STOMP support is to keep my message broker options open. While RabbitMQ is pretty nice, it may not meet the needs for pulse.mozilla.org once it gets out of prototype mode. All open source brokers (HornetQ, Apache ActiveMQ & Qpid, Redhat MRG, etc) have pledged to support AMQP eventually, but many have not implemented it. Most have implemented STOMP though, as the protocol is both stable and simple. The extension now gives the Bugzilla administrator the option of choosing which protocol to use based on their requirements.

Notable changes since I last blogged:

  1. Pluggable backends with optional CPAN dependencies. If you don't want to use AMQP, you don't need to have those dependencies installed
  2. Added simple support for message security. This was a major hurdle for getting bugzilla-push rolled out on bugzilla.mozilla.org. I intend to beef it up more in the coming weeks as well
  3. Fixed a bug where false values were being sent as "0" in the JSON messages instead of JSON's false
  4. Fixed support for using vhosts that are not "/" (the AMQP default)

Planned in the next week:

  1. Supporting YAML for message encoding
  2. Supporting Python's pickle for message encoding
  3. Get the extension rolled out on landfill.bugzilla.org
  4. Test, test, test
  5. Ask for security review from Mozilla's web security team
20Aug/10Off

Push notifications for Bugzilla!

I've had some downtime between Firefox releases and chose to work on a pet project on-and-off for the past week. I'm announcing it today as bugzilla-amqp.

What is bugzilla-amqp?

A server-side Bugzilla extension that sends messages to a message broker via AMQP whenever a Bugzilla object (bug, keyword, component, etc) is created or modified.

Why?

It enables push notifications for interesting events in Bugzilla! This is a big deal. Tools no longer have to poll the various APIs when dealing with bug data...instead they can sit back and get notified! Want to know when you are CC'd? Easy! Want to know when a new bug is written? No problem! Take a look at the quick demo video (webm, theora...warning, large!)

Because it talks AMQP, tools interested in the Bugzilla messages/events can be written in just about any language you want for any platform you want.

The impetus for writing this extension came from the desire to integrate Mozilla Pulse (running RabbitMQ) with bugzilla.mozilla.org, having push messages end-to-end.

Sounds awesome! I want this on bugzilla.mozilla.org now!

It won't be rolled out on bmo for a bit yet. All these need to happen:

  1. There are some features that need to be added first (like, uh, security)
  2. After that, because there is a fair amount of code (as far as Bugzilla extensions go), it will likely need to go through a security review
  3. Performance testing needs to happen so that it doesn't bring down bmo inadvertently
  4. The server running Mozilla Pulse needs to get beefier and the traffic expectations with IT have to be revisited (I promised them it was a prototype after all...)

I have filed bug 589322 to track putting the extension into production on bmo.

Ok, still sounds awesome...where do I get the code?

I've put it at http://github.com/LegNeato/bugzilla-amqp. Let me know if you use it and/or find any issues and feel free to fork away!

Are you some hardcore Bugzilla hacker?

Nope, I'm a Firefox release manager :-) . The Bugzilla extension system is pretty easy...I highly suggest you take a look if you ever wished Bugzilla did something differently or wanted a feature added.

17Jul/10Off

Mozilla Pulse and RabbitMQ

I did a lightning talk at the Mozilla Summit about my pet infrastructure project, Mozilla Pulse. I'll be talking about it in more depth in a future blog post. This post is more a call for help from message broker experts.

I've been running into issues with RabbitMQ (the erlang message broker that runs on pulse). I griped a little on Twitter and got some responses, so I decided to write a more in-depth description of what I am running into. I'm not going to explain any message broker specific terminology, so feel free to skip this post if you don't know what I am talking about. None of this should be important if you just want to use pulse in the future.

The general idea of using a message broker at Mozilla is to make useful tools on top of infrastructure, with the infrastructure (producers) being loosely coupled from the tools (consumers). Because of this, I came up with this configuration for an initial prototype:

Exchanges

org.mozilla.exchange.bugzilla (topic)

  • All Bugzilla messages are routed in here. Bugzilla is the producer, with permissions of ".*bugzilla" ".*bugzilla" ".*bugzilla". That is, the Bugzilla producer can do anything to the Bugzilla exchange
  • The message routing key hierarchy looks like bug.added, bug.changed.[field], etc
  • The plan was to add more, sticking logic in the producer (that is, bug.changed.resolution when the message data is CLOSED should be elevated to bug.closed instead, etc)
  • The message rate is very high-volume for Mozilla's Bugzilla, as you can imagine

org.mozilla.exchange.hg (topic)

  • All hg.mozilla.org messages are routed in here. HG is the producer, with permissions of ".*hg" ".*hg" ".*hg". That is, the HG producer can do anything to the HG exchange
  • The message routing key hierarchy looks like hg.mozilla.central.repo.[opened/closed], hg.releases.mozilla.1.9.2.[commit/push], etc
  • The message rate is not that high-volume, though when watching all repositories it could be a bit bursty

org.mozilla.exchange.build (topic)

  • All build.mozilla.org messages are routed in here. Buildbot is the producer, with permissions of ".*build" ".*build" ".*build". That is, the Buildbot producer can do anything to the build exchange
  • This is currently experimental and the routing keys haven't been figured out to provide the most value
  • Very high-volume, though less so than the Bugzilla exchange

Consumers

These were my general goals for consumers:

  1. Be as simple as possible so people can start playing with pulse, proving the idea and getting some momentum
  2. I do not want to be the bottleneck for experimentation, so no user accounts or administration tasks necessary to just consume messages
  3. Users writing consumers should not need to learn about any of the underlying message broker terminology or technology
  4. Users could be running consumers on their local machines, and when they reconnect all the messages they missed should be there waiting (they could clear the old messages or process them depending on their needs)

Because of those, I came up with the following plan:

  1. Create a user named public with a password of public and permissions of "" "" ".*", which as far as I know means the user can read from anything but not write or create. The public user can still write and create server-created resources, which means when it asks for the foo queue, the server will create it if it doesn't exist and public will then only have access to read from it
  2. Create a trivial shim library in python on top of carrot to abstract out the message broker bits and help Mozilla-specific consumers get up and running quickly
  3. Make sure people testing set a unique string for their applabel, which means their queue will be unique and message delivery will not fall back to round-robin between different people

So, seemed like a good plan, right? And it worked! Until...

Issues

Deleting unused queues

It became clear people (myself included) created some queues and then later changed to a different queue. The old queues were sitting there accumulating messages which would never be consumed. I went to delete the queues and.....rabbitmqctl doesn't have a delete queue command. Darn. Ok, I have the BQL plugin installed, so not a huge deal to pop in and delete them through that, but it seems odd this functionality is missing.

Running out of memory with old persister

There were some bugs in the Bugzilla producer which caused messages to be extremely throttled. I fixed them and immediately the broker ran out of memory and fell over. This was because there were 10 or so queues that weren't having messages actively consumed, each with ~1000 messages. I didn't see this in testing because all my testing consumers were running and consuming the messages that were sent without any buildup. Additionally, the server is running on a VM (it's a prototype after all) which doesn't have a bunch of memory to begin with.

I tried to connect to the queues with a python consumer (using carrot) to drain them, but everything just hung. I could not drain the queues and unblock the server, which meant I couldn't write an administration script that removed 500 messages out of any queue with > 500 un-acked messages.

Reading around, a lot of people are running into this problem. The good news is that the new persister is supposed to fix it, though it isn't quite done yet. It looks like the new persister is in QA and many people on the mailing lists are running it, so I decided to take the plunge on this prototype system.

Incompatibilities between RabbitMQ 1.7.x and 1.8.x

The prototype pulse system was running RabbitMQ 1.7.x and everything was working well (except for the out of memory bit above). To get the new persister, I had to update to 1.8 (as the latest persister branch is 1.8 based). I decided to upgrade to 1.8 release and make sure everything else still worked before adding the additional layer of pre-release code on top. This is what I did:

  1. Downloaded rabbitmq-public-umbrella
  2. Compiled, installed, and then activated some plugins

I deleted the old persister log, started the server, and immediately found an issue.

The public user couldn't seem to create queues anymore. Darn, that meant people wouldn't be able to use my shim lib. Reading around, it looked like it could be caused by having a 1.7.x data directory with 1.8.x, so I deleted the whole data directory and let RabbitMQ recreate it. I then built up the exchanges, users, and permissions exactly as before. The problem was still there.

So, it looks like the RabbitMQ change to the new AMQP semantics in 1.8 broke what I was doing. Apparently, it is no longer possible to have a read-only user create a queue. I guess this makes sense, though it was my (naive) understanding that automatic queue creation was built into the AMQP spec. That is, the read-only user is requesting it, and if it exists it is handed back to the user, otherwise the server creates it on their behalf. Perhaps this is a bug?

In any case, I opened up the permissions for the public user (this is a prototype system with no real users remember).

Running out of memory with new persister

I decided to take the plunge and make sure the new persister fixed my memory issue before pursuing the permissions issue. This is roughly what I did to upgrade:

  1. Downloaded rabbitmq-public-umbrella
  2. Downloaded the new persister branch
  3. Replaced rabbitmq-server in rabbitmq-public-umbrella with the persister branch
  4. Compiled, installed, and then activated some plugins

I then created some queues, started up the Bugzilla producer, and sent thousands of messages through. RabbitMQ fell over again, as far as I can tell with the same problem. I deleted the whole data directory and let RabbitMQ recreate it. I then built up the exchanges, users, and permissions exactly as before. And it still ran out of memory.

Questions

  1. Are people successfully running the new persister for RabbitMQ?
  2. Do I need to explicitly turn on the new persister when using the new persister branch? If so, how? There are (understandably) no docs that I can find.
  3. Am I setting up the exchanges, queues, and vhosts wrong? As far as I can tell everything was working great before the OOM stuff and the 1.8 semantic changes.
  4. Is there a better way to structure what I want to do?
  5. Is my use-case not supported by RabbitMQ? That would be odd, as this seems like the exact use case that message brokers were made to solve. Do other brokers support what I want?