Here’s how Evernote moved 3 petabytes of data to Google’s cloud

Article by

Evernote decided last year that it wanted to move away from running its own data centers and start using the public cloud to operate its popular note-taking service. On Wednesday, it announced that the lion’s share of the work is done, save for some last user attachments.

The company signed up to work with Google, and as part of the migration process, the tech titan sent a team of engineers (in one case, bearing doughnuts) over to work with its customer on making sure the process was a success.

Evernote wanted to take advantage of the cloud to help with features based on machine learning that it has been developing. It also wanted to leverage the flexibility that comes from not having to run a data center.

The move is part of a broader trend of companies moving their workloads away from data centers that they own and increasingly using public cloud providers. While the transition required plenty of work and adaptation, Evernote credited Google for pitching in to help with the migration.

Why move to the cloud?

There was definitely plenty of work to do. Evernote’s backend was built on the assumption that its application would be running on the company’s twin California data centers, not in a public cloud. So why go through all the work?

Many of the key drivers behind the move will be familiar to cloud devotees. Evernote employees had to spend time maintaining the company’s data center, doing things like replacing hard drives, moving cables and evaluating new infrastructure options.

While those functions were key to maintaining the overall health and performance of the Evernote service, they weren’t providing additional value to customers, according to Ben McCormack, the company’s vice president of operations.

“We were just very realistic that with a team the size of Evernote’s operations team, we couldn’t compete with the level of maturity that the cloud providers have got…on provisioning, on management systems, et cetera,” McCormack said.“ We were always going to be playing catch-up, and it’s just a crazy situation to be in.”

When Evernote employees thought about refreshing a data center, one of the key issues that they encountered is that they didn’t know what they would need from a data center in five years, McCormack said.

Evernote had several public cloud providers it could choose from, including Amazon Web Services and Microsoft Azure, which are both larger players in the public cloud market. But McCormack said the similarities between the company’s current focus and Google’s areas of expertise were important to the choice. Evernote houses a large amount of unstructured data, and the company is looking to do more with machine learning.

“You add those two together, Google is the leader in that space,” McCormack said. “So effectively, I would say, we were making a strategic decision and a strategic bet that the areas that are important to Evernote today, and the areas we think will be important in the future, are the same areas that Google excels in.”

Machine learning was a highlight of Google’s platform for Evernote CTO Anirban Kundu, who said that higher-level services offered by Google help provide the foundation for new and improved features. Evernote has been driving toward a set of new capabilities based on machine learning, and Google services like its Cloud Machine Learning API help with that.

While cost is often touted as a benefit of cloud migrations, McCormack said that it wasn’t a primary driver of Evernote’s migration. While the company will be getting some savings out of the move, he said that cost wasn’t a limitation for the transition.

The decision to go with Google over another provider like AWS or Azure was driven by the technology team at Evernote, according to Greg Chiemingo, the company’s senior director of communications. He said in an email that CEO Chris O’Neill, who was at Google for roughly a decade before joining Evernote, came in to help with negotiations after the decision was made.

How it happened

Once Evernote signed its contract with Google in October, the clock was ticking. McCormack said that the company wanted to get the migration done before the new year, when users looking to get their life on track hammer the service with a flurry of activity.

Before the start of the year, Evernote needed to migrate 5 billion notes and 5 billion attachments. Because of metadata, like thumbnail images, included with those attachments, McCormack said that the company had to migrate 12 billion attachment files. Not only that, but the team couldn’t lose any of the roughly 3 petabytes of data it had. Oh yeah, and the Evernote service needed to stay up the entire time.

McCormack said that one of the Evernote team’s initial considerations was figuring out what core parts of its application could be entirely lifted and shifted into Google’s cloud, and what components would need to be modified in some way as part of the transition.

Part of the transformation involved reworking the way that the Evernote service handled networking. It previously used UDP Multicast to handle part of its image recognition workflow, which worked well in the company’s own data center where it could control the network routers involved.

But that same technology wasn’t available in Google’s cloud. Kundu said Evernote had to rework its application to use a queue-based model leveraging Google’s Cloud Pub/Sub service, instead.

Evernote couldn’t just migrate all of its user data over and then flip a switch directing traffic from its on-premises servers to Google’s cloud in one fell swoop. Instead, the company had to rearchitect its backend application to handle a staged migration with some data stored in different places.

The good news is that the transition didn’t require changes to the client. Kundu said that was key to the success of Evernote’s migration, because not all of the service’s users upgrade their software in a timely manner.

Evernote’s engagement with Google engineers was a pleasant surprise to McCormack. The team was available 24/7 to handle Evernote’s concerns remotely, and Google also sent a team of its engineers over to Evernote’s facilities to help with the migration.

Those Google employees were around to help troubleshoot any technical challenges Evernote was having with the move. That sort of engineer-to-engineer engagement is something Google says is a big part of its approach to service.

For one particularly important part of the migration, Google’s engineers came on a Sunday, bearing doughnuts for all in attendance. More than that, however, McCormack said that he was impressed with the engineers’ collaborative spirit.

“We had times when…we had written code to interface with Google Cloud Storage, we had [Google] engineers who were peer-reviewing that code, giving feedback and it genuinely felt like a partnership, which you very rarely see,” McCormack said. “Google wanted to see us be successful, and were willing to help across the boundaries to help us get there.”

In the end, it took roughly 70 days for the whole migration to take place from the signing of the contract to its final completion. The main part of the migration took place over a course of roughly 10 days in December, according to McCormack.

Lessons learned

If there was one thing Kundu and McCormack were crystal clear about, it’s that even the best-laid plans require a team that’s willing to adapt on the fly to a new environment. Evernote’s migration was a process of taking certain steps, evaluating what happened, and modifying the company’s approach in response to the situation they were presented with, even after doing extensive testing and simulation.

Furthermore, they also pointed out that work on a migration doesn’t stop once all the bytes are loaded into the cloud. Even with extensive testing, the Evernote team encountered new constraints working in Google’s environment once it was being used in production and bombarded with activity from live Evernote users.

For example, Google uses live migration techniques to move virtual machines from one host to another in order to apply patches and work around hardware issues. While that happens incredibly quickly, the Evernote service under full load had some problem with it, which required (and still requires) optimization.

Kundu said that Evernote had tested live migration prior to making the switch over to GCP, but that wasn’t enough.

When an application is put into production, user behavior and load on it might be different from test conditions, Kundu said. “And that’s where you have to be ready to handle those edge cases, and you have to realize that the day the migration happens or completes is not the day that you’re all done with the effort. You might see the problem in a month or whatever.”

Another key lesson, in McCormack’s opinion, is that the cloud is ready to handle any sort of workload. Evernote evaluated a migration roughly once every year, and it was only about 13 months ago that the company felt confident a cloud transition would be successful.

“Cloud has reached a maturity level and a breadth of features that means it’s unlikely that you’ll be unable to run in the cloud,” McCormack said.

That’s not to say it doesn’t require effort. While the cloud does provide benefits to Evernote that the company wasn’t going to get from running its own data center, they still had to cede control of their environment, and be willing to lose some of the telemetry they’re used to getting from a private data center.

Evernote’s engineers also did a lot of work on automating the transition. Moving users’ attachments over from the service’s on-premises infrastructure to Google Cloud Storage is handled by a pair of bespoke automated systems. The company used Puppet and Ansible for migrating the hundreds of shards holding user note data.

The immediate benefits of a migration

One of the key benefits of Evernote’s move to Google’s cloud is the company’s ability to provide reduced latency and improved connection consistency to its international customers. Evernote’s backend isn’t running in a geographically distributed manner right now, but Google’s worldwide networking investments provide an improvement right away.

“We have seen page loading times reducing quite significantly across some parts of our application,” McCormack said. “I wouldn’t say it’s everywhere yet, but we are starting to see that benefit of the Google power and the Google reach in terms of bridging traffic over their global fiber network.”

Right now, the company is still in the process of migrating the last of its users’ attachments to GCP. When that’s done, however, the company will be able to tell its users that all the data they have in the service is encrypted at rest, thanks to the capabilities of Google’s cloud.

From an Evernote standpoint, the company’s engineers have increased freedom to get their work done using cloud services. Rather than having to deal with provisioning physical infrastructure to power new features, developers now have a whole menu of options when it comes to using new services for developing features.

“Essentially, any GCP functionality that exists, they’re allowed to access, play with — within constraints of budget, obviously — and be able to build against.”

In addition, the cloud provides the company with additional flexibility and peace of mind when it comes to backups, outages and failover.

What comes next?

Looking further out, the company is interested in taking advantage of some of Google’s existing and forthcoming services. Evernote is investigating how it can use Google Cloud Functions, which lets developers write snippets of code that then run in response to event triggers.

Evernote is also alpha testing some Google Cloud Platform services that haven’t been released or revealed to the public yet. Kundu wouldn’t provide any details about those services.

In a similar vein, Kundu wouldn’t go into details about future Evernote functionality yet. However, he said that there are “a couple” of new features that have been enabled as a result of the migration.

Courtesy: www.cio.com

To comment on this article and other CIO content, visit on Facebook, LinkedIn or Twitter.
Advertisements

ChromeDriver Error : Unsupported major.minor version 52.0

Came across this error while trying to get Jenkins-Selenium combination running on my machine:-

org/openqa/selenium/chrome/ChromeDriver : Unsupported major.minor version 52.0

Solution: Found out I have given Selenium v3.0.1 in my pom.xml file, which is not a stable selenium version. Reverted back the previous most stable selenium version i.e v2.53.1. This resolved my logjam.

Installation of MongoDB on Ubuntu

1.Import the public key used by the package management system.

The Ubuntu package management tools (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the MongoDB public GPG Key:

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927

2.Create a list file for MongoDB.

Create the /etc/apt/sources.list.d/mongodb-org-3.2.list list file using the command appropriate for your version of Ubuntu:

Ubuntu 12.04

echo "deb http://repo.mongodb.org/apt/ubuntu precise/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list

Ubuntu 14.04

echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list

Ubuntu 16.04

echo "deb http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
3.Reload local package database.

Issue the following command to reload the local package database:

sudo apt-get update
4.Install the MongoDB packages.

You can install either the latest stable version of MongoDB or a specific version of MongoDB.

Install the latest stable version of MongoDB.

Issue the following command:

sudo apt-get install -y mongodb-org

Install a specific release of MongoDB.

To install a specific release, you must specify each component package individually along with the version number, as in the following example:

sudo apt-get install -y mongodb-org=3.2.9 mongodb-org-server=3.2.9 mongodb-org-shell=3.2.9 mongodb-org-mongos=3.2.9 mongodb-org-tools=3.2.9

If you only install mongodb-org=3.2.9 and do not include the component packages, the latest version of each MongoDB package will be installed regardless of what version you specified.

Pin a specific version of MongoDB.

Although you can specify any available version of MongoDB, apt-get will upgrade the packages when a newer version becomes available. To prevent unintended upgrades, pin the package. To pin the version of MongoDB at the currently installed version, issue the following command sequence:

echo "mongodb-org hold" | sudo dpkg --set-selections
echo "mongodb-org-server hold" | sudo dpkg --set-selections
echo "mongodb-org-shell hold" | sudo dpkg --set-selections
echo "mongodb-org-mongos hold" | sudo dpkg --set-selections
echo "mongodb-org-tools hold" | sudo dpkg --set-selections
(Ubuntu 16.04-only) Create systemd service file

NOTE

Follow this step ONLY if you are running Ubuntu 16.04.

Create a new file at /lib/systemd/system/mongod.service with the following contents:

[Unit]
Description=High-performance, schema-free document-oriented database
After=network.target
Documentation=https://docs.mongodb.org/manual

[Service]
User=mongodb
Group=mongodb
ExecStart=/usr/bin/mongod --quiet --config /etc/mongod.conf

[Install]
WantedBy=multi-user.target

Run MongoDB Community Edition

The MongoDB instance stores its data files in /var/lib/mongodb and its log files in/var/log/mongodb by default, and runs using the mongodb user account. You can specify alternate log and data file directories in /etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.

If you change the user that runs the MongoDB process, you must modify the access control rights to the/var/lib/mongodb and /var/log/mongodb directories to give this user access to these directories.

Start MongoDB.

Issue the following command to start mongod:

sudo service mongod start

Verify that MongoDB has started successfully

Verify that the mongod process has started successfully by checking the contents of the log file at/var/log/mongodb/mongod.log for a line reading

[initandlisten] waiting for connections on port <port>

where <port> is the port configured in /etc/mongod.conf, 27017 by default.

Stop MongoDB.

As needed, you can stop the mongod process by issuing the following command:

sudo service mongod stop
Restart MongoDB.

Issue the following command to restart mongod:

sudo service mongod restart
Begin using MongoDB.

To help you start using MongoDB, MongoDB provides Getting Started Guides in various driver editions. See Getting Started for the available editions.

Before deploying MongoDB in a production environment, consider the Production Notes document.

Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.

 

Courtesy: MongoDB Website

Parameterised Scheduler Plugin

“VJ, I want you to run the automation job on all our three environments on a nightly basis. So that we can compare the results & find out any discrepancies across environments.”

My team lead said this to me one day out of the blue.A small background on this – we have automation scripts used to test Rest APis on our cloud. This normally runs on a single environment every night, mainly pre-prod.Now the requirement was to run the same set of scripts on all environments i.e staging/dev, pre-prod & prod.

This was a new and exciting task for me and I got on with it right away!

Till now, I only knew how to schedule a job periodically as I had earlier written a blog post regarding it. But I did not know how to schedule the same job, multiple times with different parameters.

I found exactly what I needed in the Parameterised Scheduler Plugin!!

So once you install this plugin,restart the jenkins application for it to take effect. Very important. I tried scheduling it without restart and the plugin did not work.

Here’s how you can schedule the same job multiple times throughout the day with different parameters:-

Build_periodically_with_parameters

The % symbol separates the cron notation from the parameters(if any) that you want to give for your run. ‘env’ is my parameter which my scripts pick up.

Another example of scheduling the same job with multiple parameters :-

Scheduling_With_Multiple_Params

The parameters have to be separated by a ; for it to take effect. Here ‘env’ and ‘param’ are 2 params that my automation scripts accept.

So there’s the solution for it.Hope this post was helpful to some.

I love Jenkins for it’s plug-ability & wide range of solutions to commonly faced problems/requirements. Simply  superb!

Alright guys.. Have fun… Enjoy 🙂

This is VJ signing off!

 

 

 

sudo /etc/init.d/jenkins start not working

I had installed jenkins and builds were working fine.

Suddenly it stopped working. There is nothing in logs

I tried starting it, with the following command :-

sudo /etc/init.d/jenkins start

but still it’s not running

root@localhost:$# service jenkins restart
 * Restarting Jenkins Continuous Integration Server jenkins [ OK ] 

root@localhost:$# service jenkins status
Jenkins Continuous Integration Server is not running

After some googling I was able to get to run jenkins with the foll command :-

-Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war –webroot=/var/cache/jenkins/war –httpPort=8080

Still not able to get why the conventional Jenkins start command is not working. I’m working on ubuntu 14.04 version. So maybe there is some issue with Jenkins-Linux combination. To get it working is enough for the time being!

Infrastructure as code tops IT’s DevOps challenges

IT operations pros have some work to do to automate the infrastructure underpinning DevOps initiatives.

While cultural barriers are some of the most daunting DevOps challenges, IT operations practitioners say that capturing infrastructure as code is the most significant technical hurdle to supporting modern application development practices.

Even though configuration management tools such as Puppet and Chef that enable infrastructure as code have been on the market for years, the concept can still be difficult for some IT pros to grasp.

Not everyone has yet bought into the concept of taking a traditional rack and stack infrastructure with management of IPs in Excel spreadsheets, and automation through Bash scripts and Ruby code, according to Pauly Comtois, vice president of DevOps for a multi-national media company.

“A lot of our customer organizations barely have operations automated in any way,” echoed Nirmal Mehta, senior lead technologist for the strategic innovation group at Booz Allen Hamilton Inc., a consulting firm based in McLean, Va., who works with government organizations to establish a DevOps culture.

“It’s 2016, and we should be able to automate those deployments,” he said. “Once you do that, you can start to use the exact same tools to manage the infrastructure that you use for your application code.”

A big reason why companies have been slow to automate their operations is that infrastructure as code work can be more easily discussed than done — legacy applications often weren’t designed with tools such as Chef or Puppet in mind.

Third-party software that runs on Windows isn’t conducive to automation via the command line, Comtois pointed out. “What makes that really technically challenging is when that piece of software also happens to be critical to the workflow of that organization, so I can’t just go in and rip it out and replace it with something else.”

These issues can be overcome, but “some transitions are more painful than others,” he said.

Security teams also have to be brought on board with managing infrastructure as code, according to Mehta.

“Infrastructure as code and configuration management make compliance a lot easier, but that also means that compliance is no longer a thing that you do once a year,” he said. “It gets enveloped in the DevOps process [just like] any piece of code needs to go through.”

The majority of time spent by IT operations into the foreseeable future will be transitioning manual processes into infrastructure as code, or automated steps that follow the same pipeline that application code does, according to Mehta.

DevOps and IT ops: a two-way street

You know the top challenges, but do you know where they stem from? Learn how a changing DevOps culture affect IT pros’ day-to-day responsibilities and the tools you can use to bridge that gap.

Infrastructure as code benefits

So why go through the technical headaches to establish infrastructure as code?

According to experienced DevOps practitioners, it’s the only way to create an automated IT infrastructure that adequately supports automated application development testing and release cycles.

“In our environment, Jenkins makes many calls into Ansible to build stuff and deploy and configure it,” said Baron Schwartz, founder and CEO of VividCortex, a database monitoring SaaS provider based in Charlottesville, Va. “Whatever we want to be automated, we have CircleCI calling a Web service that pokes Jenkins, which runs Ansible — it sounds like a Rube Goldberg machine, but it works well.”

Even things the VividCortex team wants to kick off manually use a chat bot to call into Jenkins and kick off a build job with Ansible, Schwartz said.

Getting IT ops staffs used to the concept of infrastructure as code is key to securing their buy-in as DevOps is more broadly rolled out in an environment, according to Caedman Oakley, DevOps evangelist for Ooyala Inc., a video processing service headquartered in Mountain View, Calif.

“Operations doesn’t want to see things change unless [it] know[s] what controls are in place,” Oakley said. “Everything being written in a Chef recipe or in cookbooks means [operations] can see what the change was and … knows exactly who did the change and why it’s happening — and that actually is the greatest opener to adoption on the operations side.”

Ultimately infrastructure as code simplifies infrastructure management, Oakley said.

“Operations can just go manage the infrastructure now, and don’t have to worry about figuring out why one server is slightly different from another,” he said.  “You can just fire up an instance any way you want to.”

Beth Pariseau is senior news writer for TechTarget’s Data Center and Virtualization Media Group. Write to her at bpariseau@techtarget.com or follow @PariseauTT on Twitter.

 

Courtesy: http://searchitoperations.techtarget.com/news/450280797/Infrastructure-as-code-tops-ITs-DevOps-challenges