Martin Albisetti's blog

31Aug/150

Developing and scaling Ubuntu One filesync, part 1

Now that we've open sourced the code for Ubuntu One filesync, I thoughts I'd highlight some of the interesting challenges we had while building and scaling the service to several million users.

The teams that built the service were roughly split into two: the foundations team, who was responsible for the lowest levels of the service (storage and retrieval of files, data model, client and server protocol for syncing) and the web team, focused on user-visible services (website to manage files, photos, music streaming, contacts and Android/iOS equivalent clients).
I joined the web team early on and stayed with it until we shut it down, so that's where a lot of my stories will be focused on.

Today I'm going to focus on the challenge we faced when launching the Photos and Music streaming services. Given that by the time we launched them we had a few years of experience serving files at scale, our challenge turned out to be in presenting and manipulating the metadata quickly to each user, and be able to show the data in appealing ways to users (showing music by artist, genre and searching, for example). Photos was a similar story, people tended to have many thousands of photos and songs and we needed to extract metadata, parse it, store it and then be able to present it back to users quickly in different ways. Easy, right? It is, until a certain scale  🙂
Our architecture for storing metadata at the time was about 8 PostgreSQL master databases where we sharded metadata across (essentially your metadata lived on a different DB server depending on your user id) plus at least one read-only slave per shard. These were really beefy servers with a truck load of CPUs, more than 128GB of RAM and very fast disks (when reading this, remember this was 2009-2013, hardware specs seem tiny as time goes by!).  However, no matter how big these DB servers got, given how busy they were and how much metadata was stored (for years, we didn't delete any metadata, so for every change to every file we duplicated the metadata) after a certain time we couldn't get a simple listing of a user's photos or songs (essentially, some of their files filtered by mimetype) in a reasonable time-frame (less than 5 seconds). As it grew we added caches, indexes, optimized queries and code paths but we quickly hit a performance wall that left us no choice but a much feared major architectural change. I say much feared, because major architectural changes come with a lot of risk to running services that have low tolerance for outages or data loss, whenever you change something that's already running in a significant way you're basically throwing out most of your previous optimizations. On top of that as users we expect things to be fast, we take it for granted. A 5 person team spending 6 months to make things as you expect them isn't really something you can brag about in the middle of a race with many other companies to capture a growing market.
In the time since we had started the project, NoSQL had taken off and matured enough for it to be a viable alternative to SQL and seemed to fit many of our use cases much better (webscale!). After some research and prototyping, we decided to generate pre-computed views of each user's data in a NoSQL DB (Cassandra), and we decided to do that by extending our existing architecture instead of revamping it completely. Given our code was pretty well built into proper layers of responsibility we hooked up to the lowest layer of our code,-database transactions- an async process that would send messages to a queue whenever new data was written or modified. This meant essentially duplicating the metadata we stored for each user, but trading storage for computing is usually a good trade-off to make, both in cost and performance. So now we had a firehose queue of every change that went on in the system, and we could build a separate piece of infrastructure who's focus would only be to provide per-user metadata *fast* for any type of file so we could build interesting and flexible user interfaces for people to consume back their own content. The stated internal goals were: 1) Fast responses (under 1 second), 2) Less than 10 seconds between user action and UI update and 3) Complete isolation from existing infrastructure.
Here's a rough diagram of how the information flowed throw the system:

U1 Diagram

It's a little bit scary when looking at it like that, but in essence it was pretty simple: write each relevant change that happened in the system to a temporary table in PG in the same transaction that it's written to the permanent table. That way you get transactional guarantees that you won't loose any data on that layer for free and use PG's built in cache that keeps recently added records cheaply accessible.
Then we built a bunch of workers that looked through those rows, parsed them, sent them to a persistent queue in RabbitMQ and once it got confirmation it was queued it would delete it from the temporary PG table.
Following that we took advantage of Rabbit's queue exchange features to build different types of workers that processes the data differently depending on what it was (music was stored differently than photos, for example).
Once we completed all of this, accessing someone's photos was a quick and predictable read operation that would give us all their data back in an easy-to-parse format that would fit in memory. Eventually we moved all the metadata accessed from the website and REST APIs to these new pre-computed views and the result was a significant reduction in load on the main DB servers, while now getting predictable sub-second request times for all types of metadata in a horizontally scalable system (just add more workers and cassandra nodes).

All in all, it took about 6 months end-to-end, which included a prototype phase that used memcache as a key/value store.

You can see the code that wrote and read from the temporary PG table if you branch the code and look under: src/backends/txlog/
The worker code, as well as the web ui is still not available but will be in the future once we finish cleaning it up to make it available. I decided to write this up and publish it now because I believe the value is more in the architecture rather than the code itself   🙂

4Apr/1410

On open sourcing Ubuntu One filesync

This week has been bitter-sweet. On the one hand, we announced that a project many of us had poured our hearts and minds into was going to be shut down. It’s made many of us sad and some of us haven’t even figured out what to do with their files yet    🙂

On the other hand, we’ve been laser-focused on making Ubuntu on phones and tablets a success, our attention has moved to making sure we have a rock-solid, scalable, secure and pleasant to use for developers and users alike. We just didn’t have the time to continue racing against other companies whose only focus is on file syncing, which was very frustrating as we saw a project we were proud of be left behind. It was hard to keep feeling proud of the service, so shutting it down felt like the right thing to do.

I am, however, very excited about open sourcing the server-side of the file syncing infrastructure. It’s a huge beast that contains many services and has scaled well into the millions of users.

We are proud of the code that is being released and in many ways we feel that the code itself was successful despite the business side of things not turning out the way we hoped for.

This will be a great opportunity to those of you who’ve been itching to have an open source service for personal cloud syncing at scale, the code comes battle-tested and with a wide array of features.

As usual, some people have taken this generous gesture “as an attempt to gain interest in a failing codebase”, which couldn’t be more wrong. The agenda here is to make Ubuntu for phones a runaway success, and in order to do that we need to double down on our efforts and focus on what matters right now.

Instead of storing away those tens of thousands of expensive man-hours of work in an internal repository somewhere, we’ve decided to share that work with the world, allow others to build on top of that work, benefit from it.

It’s hard sometimes to see some people trying to make a career out of trying to make everything that Canonical does as inherently evil, although at the end of the day what matters is making open source available to the masses. That’s what we’ve been doing for a long time and that’s the only thing that will count in the end.

 

So in the coming months we’re going to be cleaning things up a bit, trying to release the code in the best shape possible and work out the details on how to best release it to make it useful for others.

All of us who worked on this project for so many years are looking forward to sharing it and look forward to seeing many open source personal cloud syncing services blossoming from it.

29Jun/111

Ubuntu One Files for Android released!

After a long and interesting journey, today we've released Ubuntu One Files for Android.

The app started being developed by Michał Karnicki as a Google Summer of Code project, and he did such a fantastic job at it that we hired him on full time and teamed him up Chad Miller to end up releasing a fantastically polished app. It got immediately featured in the press!
It was built on top of our public APIs, documented here: https://one.ubuntu.com/developer/

Besides it letting you access all your files stored in Ubuntu One, it has a very cool feature to auto-sync all the pictures on your phone, having an instant backup of them, and a convenient place to share them!

I'm super proud of the work we put out.

Also, as with all the rest of our clients, it's open source and you can get it in Launchpad

12May/1114

Thunderbird will be default in Oneiric (11.10), maybe

A very healthy and civilised session about switching to Thunderbird by default just ended here in the Ubuntu Developer Summit, and the outcome was that if the Thunderbird developers manage to do some needed work (to be defined) by a certain time in our cycle (to be defined), we will ship Oneiric and more importantly, the 12.04 LTS with Thunderbird by default.

The bits I can remember that need to be done are:
- Evolution data server integration
- Tighter integration with Unity
- Shrink the size of the overall application so it fits in the CD
- A good upgrade story
- Migration plan for Evolution users

We will also make sure it ships with integration with contacts in Ubuntu One, thanks to James Tait's head start with the Hedera project.

I'm a big fan of Thunderbird, so I'll be doing my best to help them achieve their goals  🙂

Filed under: Ubuntu, Ubuntu One 14 Comments
25Apr/112

Looking for a CSS/HTML guru to work on Ubuntu One

In the last few months, I've been lucky enough to be able to hire some exceptional people that were contributing to Ubuntu One in their free time. Every time someone comes in from the community, filled with excitement about being able to work on their pet project full time my job gets that much better.
So, everyone say hello to James Tait and Michał Karnicki!

Now we're looking for a new team member to help us make the Ubuntu One website awesome. Someone who knows CSS and HTML inside out, cares deeply about doing things the best way possible and is passionate about their work.

If you're interested or know anyone who may, the job posting is up on Canonical's website.

7Mar/112

Ubuntu, Natty and Unity

I have to confess, after I heard I found out we where shipping Unity in Ubuntu by default I was nervous. I got asked many times what my feelings were, and I think I generally dodged the question. This was a pretty risky move, which we are still a few months away from finding out how well the risk pays off.
Given that a lot of the design behind Unity wasn't done in the open and hadn't had a long time to mature, I've been sceptical of whether we  (as in, the Ubuntu project) could pull of such a massive change in a such a short period of time, and still have happy users.

I've been using Unity on and off on my netbook (which is my secondary computer), but while enjoying a long weekend I've spent the last few days using it a lot, and my feeling towards the it have changed quite a bit.

I think it was the right decision. Overall, it feels like an overall improved experience, even with its current rough edges. Exactly what I think we need to win over a wider audience and have them fall in love with Ubuntu head over heels. Everything is starting to feel much more tightly integrated and with a purpose, as well as some eye-candy sprinkled in a lot of the right places.
I'm really glad Canonical decided to invest to heavily in such a risky and insanely complicated task, Natty is probably one of the most exciting releases I can remember.

There are still a few key challenges ahead, most notably to me is making the design process more open and inclusive, but still being able to deliver something that feels polished and not a pile of consensus between people who have gotten good at arguing. The Ayatana community does seem to be slowly growing, though, so the future looks pretty bright. Getting the right balance between Canonical and a community around design feels like one of the hardest problems to solve, luckily, Canonical continues to hire the brightest and most enthusiastic minds around, so I'm sure it will eventually feel like a solved problem.

I think it's been almost 6 years since I landed in the Ubuntu world, I've done all kinds of things in the community ranging from starting and building the Argentine LoCo to editing the Ubuntu Weekly Newsletter, to evaluating new Ubuntu members in the Americas region. With its ups and down, great press and wild controversies, it still feels like the best place to be.

7Oct/1014

Open source is awesome

I am ZOMG very tired from the exciting release for Ubuntu One, but I wanted to highlight a very pleasant surprise.
We launched the Ubuntu One Music Streaming a week ago, and yesterday, while hadn't yet publicly released, we got a patch that adds support to tell last.fm the music you are listing from the Android app.

A big thank you to Scott Ferguson for being so awesome.

6Oct/101

Exciting changes in Ubuntu One

Matt Griffin has written a great blog post, so I'm just going to echo it:

After over a year’s worth of feedback from users like you and a clear view of where we want to take Ubuntu One in the future, we’ve just made some changes to the Ubuntu One service offering and pricing plans.

For starters, we will no longer offer the 50 GB plan to new subscribers. Everyone will get the basic plan and then have the option to add various ‘add-ons’ of services and storage as needed. But here are the details:

Ubuntu One Basic – available now
This is the same as the current free 2 GB option but with a new name. Users can continue to sync files, contacts, bookmarks and notes for free as part of our basic service and access the integrated Ubuntu One Music Store. We are also extending our platform support to include a Windows client, which will be available in Beta very soon.

Ubuntu One Mobile – available October 7th
Ubuntu One Mobile is our first example of a service that helps you do more with the content stored in your personal cloud. With Ubuntu One Mobile’s main feature – mobile music streaming – users can listen to any MP3 songs in their personal cloud (any owned MP3s, not just those purchased from the Ubuntu One Music Store) using our custom developed apps for iPhone and Android (coming soon to their respective marketplaces). These will be open source and available from Launchpad. Ubuntu One Mobile will also include the mobile contacts sync feature that was launched in Beta for the 10.04 release.

Ubuntu One Mobile is available for $3.99 (USD) per month or $39.99 (USD) per year. Users interested in this add-on can try the service free for 30 days. Ubuntu One Mobile will be the perfect companion to your morning exercise, daily commute, and weekend at the beach – we’re really excited to bring you this service!

Ubuntu One 20-Packs – available now
A 20-Pack is 20 GB of storage for files, contacts, notes, and bookmarks. Users will be able to add multiple 20-Packs at $2.99 (USD) per month or $29.99 (USD) per year each. If you start with Ubuntu One Basic (2 GB) and add 1 20-Pack (20 GB), you will have 22 GB of storage.

All add-ons are available for purchase in multiple currencies – USD, EUR and, recently added, GBP.

Users currently paying for the old 50 GB plan (including mobile contacts sync) can either keep their existing service or switch to the new plans structure to get more value from Ubuntu One at a lower price.

We know that you will enjoy these new add-ons as well as the performance enhancements we’ve made to Ubuntu One in recent months. If you have questions, our recently updated support area is a great place to start. There you’ll find a link to the current status of Ubuntu One services, a link to our frequently updated list of frequently asked questions, and a way to send us a direct message. As always, you can also ping the team on IRC (#ubuntuone in freenode). We welcome your questions, comments and suggestions.

30Sep/100

Ubuntu One Music Streaming public beta!

After a solid 6 months of work, music streaming is up for public testing!  \o/

Read the full announcement for all the details, and go see the wiki page on how to sign up.

24Sep/100

No, we are not infringing any licenses

We spend a lot of time making sure we're not violating any licenses, and usually work with upstream early on. Charlie Smotherman got confused about how we had implemented music streaming and filed a bug reporting a violation. I'd encourage anyone who even suspects there may be a license violation to report a bug or contact us as soon as possible, but maybe hold off on the inflammatory blog posts  😉
We've contacted him explaining all this but he seems to not had a chance to update his claims so I'm bringing this up now.

Nobody on the Ubuntu One team commented on any of his blog posts either. Ampache seems like a nice piece of software and even some people on the Ubuntu One team use it. We chose to go with Subsonic clients (we are not using any of the server pieces as it doesn't fit with our infrastructure) because the API seemed to be very nice, the existing clients where very nice to use, and all upstream developers where friendly and happy to help us release the service.

I'm sorry if any feelings got hurt, but there's no need to lash out like that.

For future reference, the whole team hangs out in #ubuntuone on Freenode.