Error Handling – Core Design Decision

Error handling in a software is very critical.
We often under-engineer our implementations around it.
Handling a few generic error messages is the easy part.

But,
1. How can the software recover gracefully from these error messages?
2. How can the customer experience not degrade post the error?
3. How is the error logged and iterated upon with an intelligent fix?

These are the core questions that come to my mind to have a clean implementation around error handling in software development.

#software #design #errorhandling #builditbetter

Mongo::Error::OperationFailure: Cursor not found

Lately, I’ve been running into this error while running my nightly automation scripts. From my experience in resolving nagging errors, this one too was another of those annoying/inconsistent errors which did not have any concrete solutions on the internet.  This post is for the benefit of those fortunate people(unlike me) who will encounter this error in the future.

Small Background on the task:I had to query all data which was in my MongoDB server one-by-one and compare it on real-time with my api responses. I am using the ‘mongo’ ruby driver gem to interact with the db.The total data in the db was ~3.5 lac records but while running the script – at around the ~350 iteration mark I was getting this error :-

Mongo::Error::OperationFailure:
  Cursor not found, cursor id: 79727049273 (43)
/home/qaserver/.rvm/gems/ruby-2.3.0/gems/mongo-2.4.0/lib/mongo/operation/result.rb:256:in `validate!'
/home/qaserver/.rvm/gems/ruby-2.3.0/gems/mongo-2.4.0/lib/mongo/operation/executable.rb:37:in `block in execute'
/home/qaserver/.rvm/gems/ruby-2.3.0/gems/mongo-2.4.0/lib/mongo/server/connection_pool.rb:107:in `with_connection'
/home/qaserver/.rvm/gems/ruby-2.3.0/gems/mongo-2.4.0/lib/mongo/server.rb:242:in `with_connection'
/home/qaserver/.rvm/gems/ruby-2.3.0/gems/mongo-2.4.0/lib/mongo/operation/executable.rb:35:in `execute'
/home/qaserver/.rvm/gems/ruby-2.3.0/gems/mongo-2.4.0/lib/mongo/cursor.rb:188:in `block in get_more'
/home/qaserver/.rvm/gems/ruby-2.3.0/gems/mongo-2.4.0/lib/mongo/retryable.rb:51:in `read_with_retry'
/home/qaserver/.rvm/gems/ruby-2.3.0/gems/mongo-2.4.0/lib/mongo/cursor.rb:187:in `get_more'
/home/qaserver/.rvm/gems/ruby-2.3.0/gems/mongo-2.4.0/lib/mongo/cursor.rb:113:in `each'
/home/qaserver/.rvm/gems/ruby-2.3.0/gems/mongo-2.4.0/lib/mongo/collection/view/iterable.rb:44:in `each'
./spec/all_usecases_spec/rovi_ott_validation_spec/rovi_ott_links_validation_for_all_programs_spec.rb:66:in `block (2 levels) in '

div>

254      # @since 2.0.0
255      def validate!
256        !successful? ? raise(Error::OperationFailure.new(parser.message)) : self
257      end
258
259# Install the coderay gem to get syntax highlightingpre>

The crazy part was this error was not at all consistent but would happen at times. I would overlook this by re-running  my scripts.One day suddenly out of nowhere this error became almost 100% consistent! On priority, I had to find a solution for it.

After doing some amount of reading, I realised that this error is to do with the cursor which gets created while querying the db. So what happens is MongoDB returns a cursor when the query happens. In my case as my query is one which ‘finds all’ I do not fully know if multiple cursors are returned for each sub-query or a single cursor is returned which loops through the whole db. I need some more clarification on that.

But what I understand is that MongoDB closes all cursors that have been inactive for 10 minutes.It has something called a cursor timeout to do the same. So maybe one such cursor created was getting inactive after a particular time.

On more exploring I understood that there is a way to disable this cursor timeout. The hard part was to find the key word for this cursor timeout for the ruby driver which I was using, in my case the ruby driver ‘mongo’. Going through multiple stackoverflows which gave some incorrect solutions like use ‘:timeout => false’ I had to struggle my way to find this answer.

After going through the Mongo Ruby Driver documentation(which has a very confusing sequence) thouroghly, I found my answer!

There is an option while querying called ‘no_cursor_timeout’ which must be used to disable this cursor timeout. Here’s how you implement it :-

coll.find({:date => { ‘$eq’ => Date.today }}).no_cursor_timeout.each do |doc|

          ########## Code goes in here ###########

end

Login loop issue on Ubuntu

Had an issue with Ubuntu 14.04 version where in login into the system would result in the screen going through various screens and end up back at login page. I had previously had the same issue but was able to resolve it with the help of my friend. This time I thought I’d try to fix this myself and was able to faster than I thought.

Here’s how I resolved it after going through a few solutions :-

So basically lightdm is the display manager which comes by default with 14.04. So when you google for lightdm here’s what you find …

LightDM is an X display manager that aims to be lightweight, fast, extensible and multi-desktop. It uses various front-ends to draw login interfaces, also called Greeters.

Basically this package manages the login interface.To me that’s not a show stopper, in fact all my work starts after login.So I just thought I’d try another display manager. There are different display managers that work with ubuntu, another one being gdm. I just ran the following command to remove lightdm and install gdm.

CNTRL + ALT + F1 launches the terminal window even when user is’nt logged in.

sudo apt-get purge lightdm && sudo apt-get install gdm

This fixed my issue. Now I’m able to login to my machine without a prob. Case closed!

ChromeDriver Error : Unsupported major.minor version 52.0

Came across this error while trying to get Jenkins-Selenium combination running on my machine:-

org/openqa/selenium/chrome/ChromeDriver : Unsupported major.minor version 52.0

Solution: Found out I have given Selenium v3.0.1 in my pom.xml file, which is not a stable selenium version. Reverted back the previous most stable selenium version i.e v2.53.1. This resolved my logjam.

sudo /etc/init.d/jenkins start not working

I had installed jenkins and builds were working fine.

Suddenly it stopped working. There is nothing in logs

I tried starting it, with the following command :-

sudo /etc/init.d/jenkins start

but still it’s not running

root@localhost:$# service jenkins restart
 * Restarting Jenkins Continuous Integration Server jenkins [ OK ] 

root@localhost:$# service jenkins status
Jenkins Continuous Integration Server is not running

After some googling I was able to get to run jenkins with the foll command :-

-Djava.awt.headless=true -jar /usr/share/jenkins/jenkins.war –webroot=/var/cache/jenkins/war –httpPort=8080

Still not able to get why the conventional Jenkins start command is not working. I’m working on ubuntu 14.04 version. So maybe there is some issue with Jenkins-Linux combination. To get it working is enough for the time being!

NameError: undefined local variable or method `null’

I bumped into this error while trying to implement API validation tastcases using Airborne tool :-

NameError:
undefined local variable or method `null’ for #<RSpec::ExampleGroups::ApiProgramProgramid:0x000000030d2f20>
# ./spec/my_test_name.rb:9:in `block (2 levels) in <top (required)>’

Background of the issue:-

In the ‘expect_json’ part I had given the exact response of my API request as test data for future executions.

So for eg: if this was part of my API response:-

{“id”:17537987,”show_type”:”SM”,”series_id”:17537987,”season_program_id”:null}

I had given this as part of my Airborne validation script :-

expect_json({id:17537987,show_type:”SM”,series_id:17537987,season_program_id:null})

While trying to execute this script, I was continuously encountering this error saying “NameError: undefined local variable or method `null'”. It sure was getting on my nerves as I was’nt able to proceed with my implementation. And we all know how deadlines work! hehe… Anyways…

After a fair bit of trials & failures I found out the issue :-

Ruby as a language stores null values as nil. I was unaware of this as this is the first time I’m working with Ruby & had already started my implementation of Airborne Tool(ruby based), without getting sufficient time to ramp up on the language. We techies are required to deliver fast results you see.. hehe..

So finally, I gave this as my expected json value :-

expect_json({id:17537987,show_type:”SM”,series_id:17537987,season_program_id:nil})

And voila! The script worked!! 🙂

Hope this helps someone to resolve this specific issue.

Regards,

VJ

 

 

 

 

Jenkins: Error Building Binaries – Windows OS

Hi all,

Recently I came across an issue on Jenkins while building a project on Windows OS.

What was happening was that while running a build script locally in the Windows OS the building of binaries was successful, but while doing the same using Jenkins I was getting error building binaries. It was not able to take the local settings of the machine which includes various configurations of Visual Studio 2008.

Solution: After some debugging, I found out that the root cause was that I was running the Jenkins slave as a windows service. When I stopped the service and ran the .jnlp file on the slave, the build process was successfuly.  Another important point I’m starting to realise about Jenkins slaves on Windows is that it is best to launch the slave via jnlp and not as a service.

Regards,

VJ

Selenium Error: StaleElementReferenceException

Most automation tools depend on the concept of the page has finished loading. With AJAX and Web 2.0 this has become a grey area. META tags can refresh the page and Javascript can update the DOM at regular intervals.

For Selenium this means that StaleElementException can occur. StaleElementException occurs if I find an element, the DOM gets updated then I try to interact with the element.

Actions like:

driver.findElement(By.id("foo")).click();

are not atomic. Just because it was all entered on one line, the code generated is no different than:

By fooID = By.id("foo");
WebElement foo = driver.findElement(fooID);
foo.click();

If Javascript updates the page between the findElement call and the click call then I’ll get a StaleElementException. It is not uncommon for this to occur on modern web pages. It will not happen consistently however. The timing has to be just right for this bug to occur.

Generally speaking, if you know the page has Javascript which automatically updates the DOM, you should assume a StaleElementException will occur. It might not occur when you are writing the test or running it on your local machine but it will happen. Often it will happen after you have 5000 test cases and haven’t touched this code for over a year. Like most developers, if it worked yesterday and stopped working today you’ll look at what you changed recently and never find this bug.

So how do I handle it? I use the following click method:

public boolean retryingFindClick(By by) {
        boolean result = false;
        int attempts = 0;
        while(attempts < 2) {
            try {
                driver.findElement(by).click();
                result = true;
                break;
            } catch(StaleElementReferenceException e) {
            }
            attempts++;
        }
        return result;
}

This will attempt to find and click the element. If the DOM changes between the find and click, it will try again. The idea is that if it failed and I try again immediately the second attempt will succeed. If the DOM changes are very rapid then this will not work. At that point you need to get development to slow down the DOM change so this works or you need to make a custom solution for that particular project.

The method takes as input a locator for the element you want to click. If it is successful it will return true. Otherwise it returns false. If it makes it past theclick call, it will return true. All other failures will return false.

Personally, I would argue this should always work. If the developers are refreshing the page too quickly then it will be overloading the browser on the client machine.

Courtesy: http://darrellgrainger.blogspot.in/2012/06/staleelementexception.html

SVN Checkout Error in Jenkins : E200030 : BUSY

Hey all,

You might come across this error while trying to setup a fresh CI pipeline on Jenkins with SVN as repository :-

Caused by: org.tmatesoft.svn.core.SVNException: svn: E200030: BUSY

SVN E200030 BUSY

Solution :-

Check if the root folder of the job has a .svn folder(which is hidden). Delete this folder & try to checkout once again. This should solve your problem. Basically this .svn folder has some svn keys/credentials stored temporarily. These might come in conflict with some other keys which are already used on the machine for other purposes.

There is also a useful plugin which is available on Jenkins which is used in all of my jobs to solve this issue – The ‘Workspace Cleanup Plugin’. More info available @ https://wiki.jenkins-ci.org/display/JENKINS/Workspace+Cleanup+Plugin

Hope this post helps!

Regards,

VJ

Coverity Error : failed to access wsdl … Idiotic me with a bump on my head…

This post is about a sickening error I encountered which literally stalled my task for a few days.

Problem : While trying to query the coverity server using cov-manage-im, i got the following error in the stacktrace : –

“Failed to access wsdl”

The solution to the error was even more sickening 😦 !!… Sometimes in IT you realise that just hounding the problem from the same angle is of no use… Take a step back > Have a coffee > Chill out > Listen to some music > Approach your problem from a different angle

Sickening Solution:  Update Coverity Static Analysis to the latest version !!! . It so happened that I had Coverity Static Analysis version 5.1.x on my machine & the coverity server was upgraded to the latest version which needed me to have version 6.5.3.

Conclusion: Sometimes solutions to various problems can be simpler than what we think it is. Have a wholistic way of approaching the problem. Don’t keep hitting it from the same place…Sooner or later you will be the one with the bump on your head… 🙂  haha…

Regards,

VJ

PS: The weekend is here.. Time to partyyyyyyyy 😉