Live Gantt chart of the Mozilla build system
I built a live Gantt chart of the buildbot steps running in the Mozilla build system. You can see it at http://pulse.mozilla.org/gantt (note that it sometimes randomly disconnects, working to fix that). Because the page is live, there may or may not be builds going on when you view it. Here is what it looks like with some activity:
It's also neat to zoom out to get a higher level overview:
It's pretty neat to see the assorted steps and how long they take. I personally have learned a bit more about what is going on under the hood just from watching the page stream data. Hopefully this tool (or something like it) can be used to identify long steps or additional places for pipelining so RelEng's already amazing turnaround time can be even better!
The ease in which this tool was built is also really exciting. I can't wait to get more data in Pulse to enable other types of tools!
Looking back at 2010, forward at 2011
2010 was an interesting year for me professionally. Inspired by similar lists online, I present what I did (or can remember at least):
- Released Mac OS X 10.6.2 and a bunch of other Apple-related stuff
- Left Apple and came to Mozilla in March
- Shipped out-of-process plugins in Firefox 3.6.4
- Got the idea for Pulse, started a prototype in my spare time
- Gave a talk about 3.6.4 and Pulse at the Mozilla summit
- Released Firefox 3.0.19
- Released Firefox 3.5.9 through 3.5.16
- Released Firefox 3.6.3 through 3.6.13
- Became a Bugzilla contributor and committer
- Released a Bugzilla extension for integration with a message broker (bugzilla-push)
- Created a GitHub service hook to integrate GitHub with a message broker like Pulse
- Released Firefox 4 beta 6 through beta 8
- Obtained commit access to Mozilla's repositories, though I haven't pushed anything yet
- Moved Pulse out of a prototype and into a Mozilla quarterly goal (which I believe was met)
- Took over Jeff's bztools project
- Wrote a MediaWiki extension to integrate wiki.mozilla.org with Pulse (to be released soon)
I'm pretty happy with what I have done so far and am looking forward to doing even more.
New technologies
It was also fun to play around some hot new technologies (and some old but new to me). These are new the ones I became familiar with in 2010:
- RabbitMQ (and by by proxy a little bit of Erlang)
- AMQP and STOMP
- Node.js and Socket.io
- git and GitHub
- CouchDB and Riak
- Mercurial
- orbited
- WebSockets
2011 Goals and Plans
2011 is shaping up to be awesome! This is the general plan for the year I am working off at the moment:
- I intend for the mechanics of a Firefox release to become 100% automated. This is critical and must happen for 2011. There are too many competent people babysitting scripts and communicating via email and IRC during releases. Luckily, Pulse was brought up in 2010 and will help enable this
- In general I intend to make Pulse a cornerstone of Mozilla's tools and processes, eliminating polling across the organization and adding information transparency. AMQP / enterprise messaging is the future for Mozilla's tools. The first higher-level tool I intend to convert once all the current instrumentation is rolled out is TinderBoxPushLog
- I will bring up a release management system like I had at Apple. It will become the single source of truth for releases and will be integrated with the build system via Pulse
- I want to standardize release and project management processes across all client software that Mozilla ships. This will be critical in 2011 as I am sure there will be many simultaneous releases
- I will fix this Bugzilla bug, which should help Mozilla's Bugzilla workflow immensely
- I will fix this Bugzilla bug, which should help anyone who gets bugmail (that's everyone, right?)
- I will fix this Bugzilla bug, so that Ehsan can stop including the functionality in his Firefox extension and I can fix a bug from 1999 (bragging rights, whoo!)
- I will release my finished MediaWiki extension that ties MediaWiki to a message broker. It's done, it just needs to be polished before I can attach my name to it and release it on GitHub
- I will publish a video walkthrough detailing how I think Mozilla should structure their repos and approach project and risk management in an accelerated release world
- Hopefully my proposal will be accepted and implemented. If not, it was an extremely useful mental exercise and will guide some small improvements in the future
- I plan to quickly tackle managing the quarterly Firefox releases and will try to make them as smooth as possible
- I'll release my top secret Mozilla-related project I've been working on in my "spare time" for the last couple of months. Probably not until the end of the year though
- I will take ownership of extension/plugin blacklisting if no one wants it. I've been resisting doing so in 2010, but I think it needs an owner and the likely owners are swamped. It can also benefit from some new processes and clear operating guidelines
- I really want to blog more about my experiences at Apple, comparing and contrasting with Mozilla. I feel I have a unique position to talk about the release technology and project management sides but haven't found time in 2010 to do so. Hopefully 2011 will be different and I'll blog before I forget everything
All three lists have a lot of Pulse/Pulse-related bits in them. This isn't by chance. Everything I set out to improve at Mozilla needed a system like Pulse and I needed that foundation before I could build higher-level tools and processes. Now that it is in place and people are starting to "get the religion", I hope I'm going to see a rapid increase in awesomeness. Special thanks to David Dahl and Margaret Leibovic for smiling, nodding, and feigning excitement when I excitedly showed them Pulse messages scrolling by in a terminal every five minutes for the past year. It has to be annoying sitting by me, but they both seemed to take it in stride...or perhaps it was the booze at their desks.
Another update on Pulse
With Firefox 4 beta 8 and Firefox 4 beta 3 for mobile out (whew!) it's time to give an update on pulse.mozilla.org.
View Pulse messages live from your browser!
I set up a page to show the messages flowing through the system. You can view it at http://pulse.mozilla.org/live. The page makes for a nifty (barebones) status dashboard of work going on around Mozilla.
The technology behind the page is fairly interesting. I am using Node.js to connect to RabbitMQ via AMQP and serve the messages over Socket.io (which uses long polling or WebSockets) to client code served by Django via WSGI to the browser. That's a lot of alphabet soup but it was surprisingly easy to set up. It's pretty exciting and fun to use the cutting edge technologies everyone is talking about.
A future blog post will detail exactly what I did, but you can see the Node.js code at http://hg.mozilla.org/users/clegnitto_mozilla.com/pulsewebsite/file/default/node/browser_amqp.js. Simple yet powerful.
General
- I no longer consider Pulse a prototype. I now consider it in beta and will be doing more evangelism to get people writing tools against the system. Geo in QA has a prototype system written against Pulse and has already suggested some great improvements
- The system moved to a new, beefier VM in the Phoenix data center. Hopefully the days of running out of disk space and memory are over!
- I have documented (with video!) how to go from our stock RHEL 5 VM image to a running Pulse instance. I will be writing it up and finishing up the video in the coming weeks
Website
- The website is finally in Mercurial. It is basically the old static site stuck into a Django template, but at least it lays the groundwork for future work
- Added the live view functionality as mentioned above
Scrapers
- I converted the scrapers/publisher shims from Celery to cron. This removes one moving piece, adds a lot more control, and I have yet to see the scrapers hang
- The new ftp scraper script has been checked in and is running on production
Messages
- There is now a heartbeat message sent every minute. This message lets people playing around with the system know their code is working and makes it so the web view doesn't have periods of inactivity. To see the messages you can use "PulseTestConsumer" from the python helper library or you can connect to the "org.mozilla.exchange.pulse.test" exchange via a standard AMQP client. There may be other messages sent through that exchange, so if you want to only listen for the heartbeat messages listen for "heartbeat"
- One of the RelEng buildbot masters is now publishing build messages into Pulse (see bug 614576). To see the messages you can use "BuildConsumer" from the python helper library or you can connect to the "org.mozilla.exchange.build" exchange via a standard AMQP client
- James Socol and Jeff Balogh have set up some of their Mozilla GitHub repos to publish commit events into Pulse using the GitHub service hook I created. To see the messages you can use "CodeConsumer" from the python helper library or you can connect to the "org.mozilla.exchange.code" exchange via a standard AMQP client. To only listen to GitHub messages filter on "github.#". If you listen for "#" you will also get messages from Mozilla's Mercurial repositories
Code
- The python helper library now defaults to temporary/non-durable queues. This should make experimentation easier for folks and will lessen the resource requirements on the server
- The python helper library now specifies its requirements in such a way that easy_install and pip will automatically download necessary dependencies
- I have started to put example code into Mercurial so new users don't need to copy and paste from the website. The repository is at http://hg.mozilla.org/users/clegnitto_mozilla.com/pulsequickstart/. I intend to expand it a fair amount, add other language examples, etc.
There is also exciting work going on to instrument assorted systems so I can retire the shim/scraper scripts. I'll likely have more to report about that in the coming weeks though.
If anyone has graphic skills and would like to help me out with the website, a logo, or create a cool dashboard using the data flowing though Pulse get in touch! I've been playing around with interesting ways to visualize the data and hope to have more to show in the coming quarter.
Lots of Pulse changes, out of prototype mode this week!
I logged off IRC, chat, and email Friday afternoon to push Pulse forward. I made a bunch of progress.
Changes
- New, beefier VM, all set up (bug 609956 and bug 614029)
- The new VM setup supports publishing and consuming messages via websockets / Socket.io
- The 'org.mozilla.exchange.hg' exchange has been renamed to 'org.mozilla.exchange.code'. This is because we'll be publishing messages from non-hg sources into it as well (like GitHub using http://christian.legnitto.com/blog/2010/11/23/github-amqp-integration-service-hook-live/), bitbucket, svn, etc). Please note the previous change will likely break your existing python scripts until you update the mozillapulse helper library!
- The mozillapulse python helper library has been updated. I added transparent support for the above exchange change. If your scripts are using HgConsumer they will automatically use the proper exchange so there is no need to change them. In the future you should migrate from HgConsumer to CodeConsumer though
- The (lame) website has been put into HG. It uses Django. Currently it it just one static template, but I intend to make it more dynamic very soon
- A new scraper / shim that polls ftp and sends interesting build release/generation events. I'll be documenting this in the coming week
- I finished up a MediaWiki extension so we can get wiki.mozilla.org events via Pulse. I'll be releasing it next week
Coming this week
- DNS switchover from the old VM to new VM. The switch will happen on Tuesday during scheduled IT downtime
- Documentation about the new ftp scraper / shim
- General improvements to the website and documentation (likely leveraging the websocket functionality to show live Pulse messages)
- Instructions (with video!) how to set up and configure the server-side of Pulse on a stock RHEL 5 VM
- Pulse will be marked as BETA rather than an unsupported prototype
Coming in the near future but likely not next week
- Rolling out extensions on assorted systems so that I can retire the shims/scrapers that poll
- Hooking up to a 2nd VM that listens to all Pulse messages in all exchanges and stores them in Redis or CouchDB
- Interesting (web?) visualizations to encourage others to write tools against Pulse
For more information, sign up for the mailing list.
GitHub AMQP integration service hook live!
As of last night you can now send AMQP messages to a message broker (like the one running on pulse.mozilla.org) for GitHub pushes and commits!
Here's how to set it up...
First, go to the admin area of one of your repositories:
Next click on "Service Hooks" on the left hand side:
After that, select the AMQP service hook:
Configure the hook to point at your server and it's done!
Once configured, the next time you push there will be messages sent via AMQP to your server from GitHub. Currently, it sends one overall push message containing all changeset info in the push as well as individual changeset messages.
Mesages are sent for the push with the following routing key format:
"github.push.#{owner}.#{repo}.#{ref}"
where:
owner = payload['repository']['owner']['name']
repo = payload['repository']['name']
ref = payload['ref_name']
Messages are also sent for each commit in a push, with the following routing key format:
"github.commit.#{owner}.#{repo}.#{ref}.#{author}"
where:
author = commit['author']['email']
(other fields are the same as above)
The message data is sent in JSON format.
Here's an example commit message (dumped from Python):
{u'_meta': {u'exchange': u'org.mozilla.exchange.pulse.test',
u'routing_key': u'github.commit.LegNeato.bztools.master.clegnitto@mozilla.com'},
u'payload': {u'author': {u'email': u'clegnitto@mozilla.com',
u'name': u'Christian Legnitto',
u'username': u'LegNeato'},
u'files': {u'added': [],
u'modified': [u'README.rst'],
u'removed': []},
u'id': u'4d69ae955e6f877000ecfe17def333b32973070b',
u'message': u'Change readme to point to my repo (and a test of AMQP GitHub service hook)',
u'timestamp': u'2010-11-22T15:16:26-08:00',
u'url': u'https://github.com/LegNeato/bztools/commit/4d69ae955e6f877000ecfe17def333b32973070b'}}
And here's an example push message (dumped from Python):
{u'_meta': {u'exchange': u'org.mozilla.exchange.pulse.test',
u'routing_key': u'github.push.LegNeato.bztools.master'},
u'payload': {u'after': u'0ccf64aa593e96a19529b9c9a3b1e0098c626108',
u'before': u'9aa20993159d5e714103abc6741b43feb371fc34',
u'commits': [{u'author': {u'email': u'clegnitto@mozilla.com',
u'name': u'Christian Legnitto',
u'username': u'LegNeato'},
u'files': {u'added': [],
u'modified': [u'bugzilla/models.py'],
u'removed': []},
u'id': u'80539c359d22ca35f61c34edb810bfc9c0bef6a8',
u'message': u'Add support for keywords',
u'timestamp': u'2010-11-17T16:14:37-08:00',
u'url': u'https://github.com/LegNeato/bztools/commit/80539c359d22ca35f61c34edb810bfc9c0bef6a8'},
{u'author': {u'email': u'clegnitto@mozilla.com',
u'name': u'Christian Legnitto',
u'username': u'LegNeato'},
u'files': {u'added': [],
u'modified': [u'README.rst'],
u'removed': []},
u'id': u'4d69ae955e6f877000ecfe17def333b32973070b',
u'message': u'Change readme to point to my repo (and a test of AMQP GitHub service hook)',
u'timestamp': u'2010-11-22T15:16:26-08:00',
u'url': u'https://github.com/LegNeato/bztools/commit/4d69ae955e6f877000ecfe17def333b32973070b'},
{u'author': {u'email': u'clegnitto@mozilla.com',
u'name': u'Christian Legnitto',
u'username': u'LegNeato'},
u'files': {u'added': [],
u'modified': [u'bugzilla/models.py'],
u'removed': []},
u'id': u'0ccf64aa593e96a19529b9c9a3b1e0098c626108',
u'message': u'Add some string representations',
u'timestamp': u'2010-11-22T18:19:32-08:00',
u'url': u'https://github.com/LegNeato/bztools/commit/0ccf64aa593e96a19529b9c9a3b1e0098c626108'}],
u'compare': u'https://github.com/LegNeato/bztools/compare/9aa2099...0ccf64a',
u'forced': False,
u'ref': u'refs/heads/master',
u'ref_name': u'master',
u'repository': {u'created_at': u'2010/11/15 14:45:56 -0800',
u'description': u'Models and scripts to access the Bugzilla REST API.',
u'fork': True,
u'forks': 0,
u'has_downloads': True,
u'has_issues': False,
u'has_wiki': True,
u'homepage': u'',
u'name': u'bztools',
u'open_issues': 0,
u'owner': {u'email': u'clegnitto@mozilla.com',
u'name': u'LegNeato'},
u'private': False,
u'pushed_at': u'2010/11/22 19:17:25 -0800',
u'url': u'https://github.com/LegNeato/bztools',
u'watchers': 2}}}
Now that this service exists Pulse can get messages about Mozilla checkins for projects hosted on GitHub, making Pulse the one-stop shop for real-time Mozilla data...once the Bugzilla extension, MediaWiki extension, and Mercurial extension are put into production of course.
Bugzilla activity visualization
Here's a cool visualization of bug activity on bugzilla.mozilla.org over the past 2 days:
Downloads:
http://people.mozilla.org/~clegnitto/videos/bugs.mp4
http://people.mozilla.org/~clegnitto/videos/bugs.webm
http://people.mozilla.org/~clegnitto/videos/bugs.ogv
And here's the same activity, zoomed in on the "Core" cluster/branch:
Downloads:
http://people.mozilla.org/~clegnitto/videos/bugs_core.mp4
http://people.mozilla.org/~clegnitto/videos/bugs_core.webm
http://people.mozilla.org/~clegnitto/videos/bugs_core.ogv
I sped up the videos a bit so that every 30 seconds in the video is equal to a day of activity in reality.
Background
I saw David Humphrey's post using gource to visualize source repositories. I looked at it and thought "That's neat, it would be cool to have it in real-time on the screens in the Mozilla offices!"
Implementation
Luckily gource supports reading a simple pipe-delimited format from stdin, which makes integration with outside tools trivial. I wrote a 20 line python script that uses my pet-project pulse to pump hg.mozilla.org push events into gource. It wasn't super exciting though, as there isn't a ton of pushes happening (even with the try repository).
I thought about it a little bit and realized that pulse also has Bugzilla data flowing through it. I decided it'd be really cool to (ab)use gource to visualize bug activity.
First, I determined the mappings from bug activity to repository activity:
- File added -> new bug ("bug.new" for the pulse routing key)
- File modified -> bug changed ("bug.changed.#" for the pulse routing key)
- File deleted -> closed bug (searching for certain states in "bug.changed" messages)
- Committer -> user creating or changing the bug
- File path -> /[Bug's product]/[Bug's component]/[Bug's id]
Constructing the "path" in this way makes bugs cluster in a coherent way. Realtime still wasn't super exciting (for pulse reasons I will not go into here), so I let the tool run for a couple days, dumped the resulting messages to a file, and pointed gource at it.
If you want to play around with it (gource is interactive), I've uploaded the data here. The user/pass is nospam (there are email addresses in the file so I didn't want to leave it wide open). I used these gource options to make the video:
gource --log-format custom --hide bloom,filenames --user-scale .5 -s 30 /path/to/data.txt
Future plans
I would love for someone to write a visualization tool using canvas or WebGL. I don't have time to do it with all the Firefox release work. If you want to try to tackle this, I can provide ample help getting it hooked up to pulse's data stream though.
I also looked a bit at code swarm, as I think it would provide better visuals for bugs. Rather than committers I would focus on bugs, with the change types (cc added, comment added, fields changed, etc) as the different colored dots. If I get time I'll run the same data through cod swarm and see what looks better. I won't be able to do realtime with code swarm though, as it uses an XML file format.
As an aside, this is why I am so excited about pulse. Having the data in an easily consumable stream unlocks the potential for tools we haven't even thought of, generally with minimal development work.
What’s up with Pulse?
Armen had asked about the current state of pulse on the mailing list and suggested I blog about it. There hasn't been much said publicly since the summit so I agreed it would be good to update everyone.
First, pulse has become a quarterly goal for Bob Moss' team! This is huge and means we'll get some talented people and additional momentum behind the project. They are tasked with moving it from a concept to something that can stand up on its own. I will still likely be heavily involved, though in the end I would love to just be a consumer of the system.
Recently we had a meeting to discuss the current state of pulse and how to hand it off. The slides from that discussion are can be found on my people account, though I am not sure they make a lot of sense without my explanations.
Current state / happenings:
- I've turned the scrapers back on so there are real messages currently flowing through pulse. The two scrapers running are the Bugzilla API and HG webpage message scrapers (used by BugzillaConsumer and HgConsumer in the python helper library). Side note: I need write a maintenance script so unacknowledged bugzilla messages in user queues don't fill up the disk space on the VM
- The bugzilla extension (bugzilla-push) I wrote enabling bugzilla to publish directly to pulse is done
- I rolled out bugzilla-push on landfill.bugzilla.org for testing. The bug tracking bugzilla.mozilla.org rollout is bug 58932
- Bugzilla-push is going through security review soon (bug 599979)
- My bugzilla refactoring patch that enables comment messages to be published into pulse finally got approved yesterday! (bug 590334)
- I wrote a quick and dirty hg hook to allow push and changeset messages to be published directly into pulse (hg-broker). The bug tracking hg.mozilla.org rollout is bug 603029
- I am currently writing a mediawiki extension so all wiki.mozilla.org changes can be published into pulse
- I have started to revamp the documentation/website on pulse.mozilla.org
- We've talked with the RabbitMQ guys a bit and may join WebDev in having them come in for a consult so I can voice my needs/concerns
- I've started playing around with elasticsearch so we can have storing and searching of all messages (doesn't work quite right yet)
I'll try to make it a habit to blog about pulse more. For the latest news feel free to join the mailing list.
Push notifications for Bugzilla!
I've had some downtime between Firefox releases and chose to work on a pet project on-and-off for the past week. I'm announcing it today as bugzilla-amqp.
What is bugzilla-amqp?
A server-side Bugzilla extension that sends messages to a message broker via AMQP whenever a Bugzilla object (bug, keyword, component, etc) is created or modified.
Why?
It enables push notifications for interesting events in Bugzilla! This is a big deal. Tools no longer have to poll the various APIs when dealing with bug data...instead they can sit back and get notified! Want to know when you are CC'd? Easy! Want to know when a new bug is written? No problem! Take a look at the quick demo video (webm, theora...warning, large!)
Because it talks AMQP, tools interested in the Bugzilla messages/events can be written in just about any language you want for any platform you want.
The impetus for writing this extension came from the desire to integrate Mozilla Pulse (running RabbitMQ) with bugzilla.mozilla.org, having push messages end-to-end.
Sounds awesome! I want this on bugzilla.mozilla.org now!
It won't be rolled out on bmo for a bit yet. All these need to happen:
- There are some features that need to be added first (like, uh, security)
- After that, because there is a fair amount of code (as far as Bugzilla extensions go), it will likely need to go through a security review
- Performance testing needs to happen so that it doesn't bring down bmo inadvertently
- The server running Mozilla Pulse needs to get beefier and the traffic expectations with IT have to be revisited (I promised them it was a prototype after all...)
I have filed bug 589322 to track putting the extension into production on bmo.
Ok, still sounds awesome...where do I get the code?
I've put it at http://github.com/LegNeato/bugzilla-amqp. Let me know if you use it and/or find any issues and feel free to fork away!
Are you some hardcore Bugzilla hacker?
Nope, I'm a Firefox release manager
. The Bugzilla extension system is pretty easy...I highly suggest you take a look if you ever wished Bugzilla did something differently or wanted a feature added.
Mozilla Pulse and RabbitMQ
I did a lightning talk at the Mozilla Summit about my pet infrastructure project, Mozilla Pulse. I'll be talking about it in more depth in a future blog post. This post is more a call for help from message broker experts.
I've been running into issues with RabbitMQ (the erlang message broker that runs on pulse). I griped a little on Twitter and got some responses, so I decided to write a more in-depth description of what I am running into. I'm not going to explain any message broker specific terminology, so feel free to skip this post if you don't know what I am talking about. None of this should be important if you just want to use pulse in the future.
The general idea of using a message broker at Mozilla is to make useful tools on top of infrastructure, with the infrastructure (producers) being loosely coupled from the tools (consumers). Because of this, I came up with this configuration for an initial prototype:
Exchanges
org.mozilla.exchange.bugzilla (topic)
- All Bugzilla messages are routed in here. Bugzilla is the producer, with permissions of ".*bugzilla" ".*bugzilla" ".*bugzilla". That is, the Bugzilla producer can do anything to the Bugzilla exchange
- The message routing key hierarchy looks like bug.added, bug.changed.[field], etc
- The plan was to add more, sticking logic in the producer (that is, bug.changed.resolution when the message data is CLOSED should be elevated to bug.closed instead, etc)
- The message rate is very high-volume for Mozilla's Bugzilla, as you can imagine
org.mozilla.exchange.hg (topic)
- All hg.mozilla.org messages are routed in here. HG is the producer, with permissions of ".*hg" ".*hg" ".*hg". That is, the HG producer can do anything to the HG exchange
- The message routing key hierarchy looks like hg.mozilla.central.repo.[opened/closed], hg.releases.mozilla.1.9.2.[commit/push], etc
- The message rate is not that high-volume, though when watching all repositories it could be a bit bursty
org.mozilla.exchange.build (topic)
- All build.mozilla.org messages are routed in here. Buildbot is the producer, with permissions of ".*build" ".*build" ".*build". That is, the Buildbot producer can do anything to the build exchange
- This is currently experimental and the routing keys haven't been figured out to provide the most value
- Very high-volume, though less so than the Bugzilla exchange
Consumers
These were my general goals for consumers:
- Be as simple as possible so people can start playing with pulse, proving the idea and getting some momentum
- I do not want to be the bottleneck for experimentation, so no user accounts or administration tasks necessary to just consume messages
- Users writing consumers should not need to learn about any of the underlying message broker terminology or technology
- Users could be running consumers on their local machines, and when they reconnect all the messages they missed should be there waiting (they could clear the old messages or process them depending on their needs)
Because of those, I came up with the following plan:
- Create a user named public with a password of public and permissions of "" "" ".*", which as far as I know means the user can read from anything but not write or create. The public user can still write and create server-created resources, which means when it asks for the foo queue, the server will create it if it doesn't exist and public will then only have access to read from it
- Create a trivial shim library in python on top of carrot to abstract out the message broker bits and help Mozilla-specific consumers get up and running quickly
- Make sure people testing set a unique string for their applabel, which means their queue will be unique and message delivery will not fall back to round-robin between different people
So, seemed like a good plan, right? And it worked! Until...
Issues
Deleting unused queues
It became clear people (myself included) created some queues and then later changed to a different queue. The old queues were sitting there accumulating messages which would never be consumed. I went to delete the queues and.....rabbitmqctl doesn't have a delete queue command. Darn. Ok, I have the BQL plugin installed, so not a huge deal to pop in and delete them through that, but it seems odd this functionality is missing.
Running out of memory with old persister
There were some bugs in the Bugzilla producer which caused messages to be extremely throttled. I fixed them and immediately the broker ran out of memory and fell over. This was because there were 10 or so queues that weren't having messages actively consumed, each with ~1000 messages. I didn't see this in testing because all my testing consumers were running and consuming the messages that were sent without any buildup. Additionally, the server is running on a VM (it's a prototype after all) which doesn't have a bunch of memory to begin with.
I tried to connect to the queues with a python consumer (using carrot) to drain them, but everything just hung. I could not drain the queues and unblock the server, which meant I couldn't write an administration script that removed 500 messages out of any queue with > 500 un-acked messages.
Reading around, a lot of people are running into this problem. The good news is that the new persister is supposed to fix it, though it isn't quite done yet. It looks like the new persister is in QA and many people on the mailing lists are running it, so I decided to take the plunge on this prototype system.
Incompatibilities between RabbitMQ 1.7.x and 1.8.x
The prototype pulse system was running RabbitMQ 1.7.x and everything was working well (except for the out of memory bit above). To get the new persister, I had to update to 1.8 (as the latest persister branch is 1.8 based). I decided to upgrade to 1.8 release and make sure everything else still worked before adding the additional layer of pre-release code on top. This is what I did:
- Downloaded rabbitmq-public-umbrella
- Compiled, installed, and then activated some plugins
I deleted the old persister log, started the server, and immediately found an issue.
The public user couldn't seem to create queues anymore. Darn, that meant people wouldn't be able to use my shim lib. Reading around, it looked like it could be caused by having a 1.7.x data directory with 1.8.x, so I deleted the whole data directory and let RabbitMQ recreate it. I then built up the exchanges, users, and permissions exactly as before. The problem was still there.
So, it looks like the RabbitMQ change to the new AMQP semantics in 1.8 broke what I was doing. Apparently, it is no longer possible to have a read-only user create a queue. I guess this makes sense, though it was my (naive) understanding that automatic queue creation was built into the AMQP spec. That is, the read-only user is requesting it, and if it exists it is handed back to the user, otherwise the server creates it on their behalf. Perhaps this is a bug?
In any case, I opened up the permissions for the public user (this is a prototype system with no real users remember).
Running out of memory with new persister
I decided to take the plunge and make sure the new persister fixed my memory issue before pursuing the permissions issue. This is roughly what I did to upgrade:
- Downloaded rabbitmq-public-umbrella
- Downloaded the new persister branch
- Replaced rabbitmq-server in rabbitmq-public-umbrella with the persister branch
- Compiled, installed, and then activated some plugins
I then created some queues, started up the Bugzilla producer, and sent thousands of messages through. RabbitMQ fell over again, as far as I can tell with the same problem. I deleted the whole data directory and let RabbitMQ recreate it. I then built up the exchanges, users, and permissions exactly as before. And it still ran out of memory.
Questions
- Are people successfully running the new persister for RabbitMQ?
- Do I need to explicitly turn on the new persister when using the new persister branch? If so, how? There are (understandably) no docs that I can find.
- Am I setting up the exchanges, queues, and vhosts wrong? As far as I can tell everything was working great before the OOM stuff and the 1.8 semantic changes.
- Is there a better way to structure what I want to do?
- Is my use-case not supported by RabbitMQ? That would be odd, as this seems like the exact use case that message brokers were made to solve. Do other brokers support what I want?




