Welcome to It-Slav.Net blog
Peter Andersson
peter@it-slav.net

I've already got a female to worry about. Her name is the Enterprise.
-- Kirk, "The Corbomite Maneuver", stardate 1514.0

Background

When I was preparing a presentation about what op5 is doing and our contribution to the community, I went to ideas.nagios.org. When I browsed the list of the biggest issues with Nagios I found out that op5 has packaged and solved them all in op5 Monitor. I encourage everyone to take a peak at the list and judge for them self what platform you want to use for your enterprise monitoring solution.

 

1. Nagios Clusters

op5 has developed Merlin (Module for Effortless and Redundant and Loadbalanced Infrastructure with Nagios). It provides the possibility to have a redundant or/and loadbalanced solution using Nagios. It also provides a scalable database backend for Ninja and Nagvis, take a peak at Merlin webpage.

 

2. Performance Graphing

In op5 Monitor op5 has added pnp4nagios that graphs more or less any check that gives a numerical value back.

 

3. New Status Map

We provide a new statusmap, it is a part of Nagvis project and called Automap. op5 has contributed to Nagvis by adding support for Merlin database backend and GoogleMaps integration

 

4. Better Interface

op5 has created a new PHP based interface that makes you proud of your monitor solution, called Ninja. Ninja stands for Nagios is now Just Awesome, it looks so good so you can show it to your manager:-) Take a peak at Ninja webpage.

We had a naming competition at op5 my suggestion to call it Yang, Yet Another Nagios Gui, did not get so many votes so the name become Ninja.

 

5. Configuration

Any one that has tried to configure Nagios by hand has thought bad thoughts. It is cumbersome and in the begining it makes even a skilled Unix admin frustrated. op5 has developed a web based configuration tool named Nacoma. It is included in op5 Monitor.

 

6. SLA reports

op5 has got alot of feedback from the customers and one issue was better reports, we have created two reports: availability and SLA report. Both of them can be scheduled and looks god compared to the reports in Nagios, they are included in Ninja

 
7. More members in core team

Well, this is something that op5 cannot controll, however, Andreas Ericsson one of op5s core developers are a member og Nagios steering board.

 

8. SNMP Trap receiver

This is something op5 is working on right now, we have today created a prototype together with a customer and depending of the outcome of that project we will know better when it will be available to the market.

 

9. UI-Improvment

op5s solution to this issue is Ninja, a better, nicer, scalable, database backend gui to Nagios.

 

10. Nagios Dashboard

A newTactical Overview with widgets or Nagvis, all included in Ninja

 

Links


8 Responses to “Top 10 Nagios problems solved, by op5”

  1. Aldo Mett Says:

    Hi

    We have a quite large Nagios setup (~2k hosts, 20k services) with native GUI and Merlin and reports_module backend for custom data queries. Although Ninja seems the best frontend for Nagios so far I still found some old Nagios or completely new annoyances which are gamestoppers for me now.
    Some problems may be our systems specific but others could be practical for any user. In random order:

    Same problems as native Nagios GUI:

    1. No custom variables shown in host and service details. We have a lot of metadata with hosts. To make it visible for NOC engineers I use a workaroud – the data is copied and formatted as a HTML table to notes field. But it messes up Ninja extinfo page layout.

    2. No search by custom variables or any other fields. I really miss good search and filtering engine for any view. Regex search is a must.

    Ninja problems:

    3. Limited search results, why not full match with pagination?

    4. No basic auth. In case of central ldap directory and Apache mod_auth_ldap the login should be transparent (like Nagios, Cacti, Zabbix, etc).

    5. No host notes_url and action_url in service problems view. As problems view is the main page for NOC engineer, there must be quick links from host/service to external info (wiki, Cacti).

    6. In problems view, Ninja shows passive check icons on random hosts (or hosts which have some passive service problem) even if the host has active alive check.

    7. No pagination and keyword search/filter in reports and logs views. Alert history does not work for big logs at all (memory or timeout issue?)

    8. Slow response on any page. It may be a Mysql performance tuning issue but on same machine Nagios GUI is faster!

    Merlin problems:
    9. after Nagios config reload (we have to do that several times in a hour) hosts and service status updates take some time. Ninja is useless in this time.

    10. Merlin or Reports_module causes check latency problems for Nagios core after several conf reloads. It is a lot of better than ndoutils but still annoying. I have to restart Nagios to get latency down again.

    11. reports_module compared to ndoutils had issues when I converted my reporting scripts. Some data is missing and some is useless overhead.

    I’m glad to argue or listen suggestions what have I done wrong. Feel free to contact me directly if you want sell me other options to solve my problems ๐Ÿ˜‰

    regards,
    Aldo Mett

  2. peter Says:

    Hi Aldo

    Are you on the op5-users@lists.op5.com mailinglist? If not I really recommend you to subscribe to it http://www.op5.org/community/mailing-lists/op5-users-mailing-lists
    Any bug you find could you enter in op5 bug system Mantis, http://bugs.op5.com, if we do not know the bugs, we cannot fix them.

    Generally Ninja is now in version 1.0 so we are very interested in your feedback to enhance Ninja. The goal for the current release has been to get something new that is good enough and now we are working on getting it better. We start with the low hanging fruits.

    Full featured Merlin is planned release in the beginning of November so in this stage there are bugs and we are fully aware of that. However we gladly take your comments and will do our best to fix them.

    I’ll answer some of the comments you have:
    1-Yes, in many cases custom variables contains passwords and so on so either Ninja or Nagios shows them in the GUI.
    2-In the right hand corner there is a search field. If you want to search for a host use h:, service s:, hostgroup hg: and servicegroup sg:. If this is not good enough we really want you to enter them as an RFE, Request for enhancement, at http://bugs.op5.com
    3-In next version of Ninja it will be fixed https://bugs.op5.com/view.php?id=3330
    4-You can use Ldap if you want, see http://www.op5.org/community/mailing-lists/archive
    5-If this is true, then it is a bug. Report it at http://bugs.op5.com
    6-If this is true, then it is a bug. Report it at http://bugs.op5.com
    7-Good point, I have reported it https://bugs.op5.com/view.php?id=3703 and https://bugs.op5.com/view.php?id=3704
    8-Well the DB adds an extra load, in most cases the Ninja is faster then old GUI when it should view alot of data, like “service detail”
    9-Do you change your nagios config several times every hour? However, report this issue as a bug http://bugs.op5.com
    10-Report as a bug http://bugs.op5.com
    11-I do not understand. Please clarify

    I’ll try to get the merlin and ninja developers to answer your questions. But I really recommend you to report them in our bug tracking system

    It seems like you are an advanced user and maybe you should consider contacting op5 so you could get a supported solution and help from op5 partner eco system.

  3. Peter ร–stlin Says:

    Hi Aido

    Another op5 employe named Peter getting involved ๐Ÿ™‚ As Peter A said, we very much appreciate your input.

    About the issues:

    #5, I’m not able to reproduce. The action and notes url’s are available as icons for me when viewing the host/service problem pages. On the service detail page they are available as text links.

    #6. It is fixed in GIT and will be available in the next release: https://bugs.op5.com/view.php?id=3681

    #10 and #11. We are merging the reports module into merlin. This will hopefully improve performance.

    Cheers
    Peter ร–stlin
    Devel Manager, op5 AB

  4. Andreas Says:

    Hello Aldo.

    Regarding the showlog page, this is most likely due to a bug in the showlog program which caused it to sift through a LOT of logfiles even though it could figure out it only needed to open a few of them.

    Upgrading to latest reports-module from git should solve your problem.

    Also note that both showlog and import are part of Merlin as of next release, in which we scrap reports-module altogether. It will still be possible to use both modules simultaneously, but that would mean redundant data collection, and we will stop supporting the reports module.

  5. Aldo Mett Says:

    Hi again

    Thanks for responses. I evaluated Ninja and Merlin in spring just after the launch of stable versions which I had waited since Netways Nagios Conference’08 and Nordicnagiosmeet’09 ๐Ÿ˜‰ As both products are well designed and promising to be a platform to develop custom widgets and whatever we need for our highly dynamic and rapidly growing infrastructure. I planned to contact you this summer but I have been too busy with other things and have not written down all needs and ideas I have in my mind.

    Also we are considering other monitoring tools as alternative to Nagios and Cacti. There are many products around, each with some impressive features or solutions, but so far no luck finding a thing that does all good things at once. And we have a lot of history and legacy accumulated into current tools during company’s 7 years lifetime.

    Answering to your comments above:

    I don’t like mailinglists (too much e-mails) and your list archive is not searchable. Fortunately Google helps with that.

    #1 & 2, imho resource.cfg is for sensitive data. We have metadata which I’d like to use for filters and searches on any page (location, service, priority, owners etc). Some example cases when it could be helpful:
    a) imagine a major outage in one datacenter. Nagios problem page is a really big mess with hundreds of problems. Now we would like to hide low priority problems and concentrate on high priority or service level problems only.
    b) a sysadmin wants to view hosts or services or problems which are related to him or his team only
    etc
    And widgets should be easily configurable which fields we want to see on each view.
    (example case: several hosts report power failure on one feed, adding ‘rack’ column to problems page may lead to the broken fuse on one rack)

    Another annoying thing which is related Nagios logic. If host is down I don’t want to see all that host related problems in service problems view. I even want option to not check any services during host outage. Especially bad if we watch this view ordered by duration (fresher problems at top). Now think about case a) again ๐Ÿ™‚

    Nagios object dependencies and other features are for reducing notification spam. But nothing helps against visual spam.

    #4 This ldap module is not what i mean. I need Basic auth only (without Ninja login page). Auth is already provided by Apache module.

    #5, On service problems page (which is the main screen we use in NOC) there are only service action_url and notes_url icons on Actions column. But Nagios UI shows host url as well next to host name.

    #9, Some of our Nagios configuration files are generated automatically based on internal production and CMDB databases and during a normal workday there are many changes every hour. I have to choose a right update interval to satisfy operations needs and keep Nagios stability.

    #10,11, I’m looking forward for any new and better stable versions ๐Ÿ˜‰

  6. Michal "Wolvverine" Panasiewicz Says:

    All this "problem" is resolved in OMDdistro http://omdistro.org/

  7. peter Says:

    Some of them yes, but not all

    /Peter

  8. vipin Says:

    its was very helpful….

Leave a Reply





Book reviews
FreePBX 2.5
Powerful Telephony Solutions






Asterisk 1.6
Build a feature rich telephony system with Asterisk






Learning NAGIOS 3.0





Cacti 0.8 Network Monitoring,
Monitor your network with ease!