Posts

Showing posts with the label devops

Template for IT Ops runbook

https://github.com/SkeltonThatcher/run-book-template/blob/master/run-book-template.md https://www.atlassian.com/software/confluence/templates/devops-runbook https://medium.com/@shawnstafford/ops-runbook-16017fa78733

Performance Counters for .NET

A little dated by now. From Windows 2000 Resource Kit: http://technet.microsoft.com/en-us/library/cc938609.aspx http://technet.microsoft.com/en-us/library/cc940375.aspx http://technet.microsoft.com/en-us/library/cc938586.aspx http://technet.microsoft.com/en-us/library/bb734903.aspx For .NET http://www.symantec.com/business/support/index?page=content&id=HOWTO9722 From Windows 2003: http://technet.microsoft.com/en-us/library/cc779038(WS.10).aspx

Excerpts From “What Can DevOps Learn from Formula 1” Presentation

Full presentation available here . A list of salient points follows: How to define success? Developers want Agility & Change Operations want Availability & Stability F1 car lifecycle Design –> Develop –> Test –> Deploy –> Support Support (Operations) need to always provide feedback to Design (Development) Engineers need to work hand-in-hand with Operations Success must be measured F1: Telemetry and monitoring are required to deliver drivers’ results IT Systems: Performance data collection and analysis as well as performance monitoring are required to deliver success Monitoring is critical in managing change Need to constantly monitor and manage impact of change. Need to provide feedback for where the car: Is fast – where things are done right Is slow – where to optimise and improve Failed – where to fix Three key aspects that impact application performance Concurrency Data volume Resources Where does one find the real bottlenecks? Not of...

Troubleshooting Common .NET HTTP Connection Errors

The first is to identify whether the error is with the client or server (or even intermediaries). Most of the errors begin with “ The underlying connection was closed: “… Indications of client error An unexpected error occurred on a send – Could be due to: antivirus software installed on the client machine   Indications of intermediary error The remote name could not be resolved or The proxy name could not be resolved – Could be due to: DNS issue inability to access the hosts file Unable to connect to the remote server – Likely to have gotten through the DNS but hit a connection glitch due to: proxy firewall network authentication   Indications of server error or intermediaries (e.g. load balancer, proxy, etc.) An unexpected error occurred on a receive – Server or intermediary unexpectedly closes the TCP connection. May be due to: Server or intermediary timeout values set too low ( TODO : increase the client’s request timeout & also the server...

Best Practices for running ASP.NET on IIS 7

When should application pools be turned into web gardens? Web gardens should only be used if the application doesn’t use in-process session variables but rather out-of-process ones (e.g. session state service or database session state). Reason is that a web garden would have at least 2 worker processes which do not share in-process (session) memory. Drivers to using web gardens are: Application makes long-running synchronous requests Application is low in availability and crashes often Application creates high CPU load on worker process Best Practices Systems Settings Optimum paging file size setting: 1.5x the RAM for 32-bit OS system-managed for 64-bit OS Disk queue length should always average less than 2 Processor queue length should be less than the number of processors  Network utilisation should be less than 50% Application Isolation Policy Some applications should be deployed into their own application pool mission critical and should be high...

Service Level Agreement (SLA) and Number of 9s

To commit to memory, SLA 9s and acceptable unscheduled downtime: Availability % Approximate downtime/ year 90 50,000 minutes ( 800 hours ) 99 5,000 minutes ( 80 hours ) 99.9 500 minutes ( 8 hours ) 99.99 50 minutes ( 1 hour ) 99.999 5 minutes 99.9999 0.5 minutes In more details, for reference: Availability % Downtime/ year Downtime/ month* Downtime/ week 90% (“one 9”) 36.5 days 72 hours 16.8 hours 95% 18.25 days 36 hours 8.4 hours 98% 7.30 days 14.4 hours 3.36 hours 99% (“two 9s”) 3.65 days 7.20 hours 1.68 hours 99.5% 1.83 days 3.60 hours 50.4 minutes 99.8% 17.52 hours 86.23 minutes 20.16 minutes 99.9% (“three 9s”) 8.76 hours 43.2 minutes 10.1 minutes 99.95% 4.38 hours 21.56 minutes 5.04 minutes 99.99% (“four 9s”) 52.56 minutes 4.32 minutes 1.01 minutes 99.999% (“five 9s”) 5.26 minutes 25.9 seconds 6.05 seconds 99.9999% (“six 9s”) 31.5 seconds 2.59 seconds 0.605 seconds * Assume a 30-day month.

System Deterioration

Machineries need oiling after some time; building structure deteriorates due to natural forces; facades need to be repainted; the Golden Gate bridge needs to be repainted; vehicles need to be maintained; what does that mean for systems? I’m inclined to believe that software systems, like all other things, need to be constantly maintained, oiled, cleaned, before they deteriorate. How does deterioration look like? Deterioration may appear in these forms: systems become slower progressively systems crash/ become unavailable more often systems become more bloated (larger codebase, more storage space required, etc.) The system was originally deployed and tested fine so what went wrong? How does deterioration happen? It typically happens due to the following forces: user-base increased post-deployment to a number that was not intended/ tested for smart users found ways to use the system that was not originally intended for operation/ support team did not make it a point to upkeep t...

Oracle Weblogic Server States

Image
With respect to the Weblogic server, there are several runtime states that are interesting. In addition, certain events/ commands lead to the transitions to other known states. I attempted to capture the state transitions and events into a UML state diagram for easy reference. States in brown are end-states while those in yellow are transitional.

Apache Web Server Troubleshooting

How to troubleshoot Apache Web Server? A few standard commands to know by hard: To confirm the modules that are or will be loaded in Apache: apachectl –M [-f <conf filename>] To check the virtual host load pattern: apachectl –S [-f <conf filename>] To test a httpd.conf file: apachectl –t [-f <conf filename>] To maintain the http server: apachectl start|stop|restart To start with a specified configuration file: apachectl –f <conf filename> Create a minimalist httpd.conf. Something like: Listen 80 ServerName myserver.com:80 ServerAdmin admin@myserver.com #ServerRoot " . " ServerRoot " /usr/local/apache2 " User nobody Group nobody #DocumentRoot " ..\wwwroot " DocumentRoot " /home/data/html " ErrorLog " logs/basic_apache.log " LogLevel debug #LoadModule log_config_module modules/mod_log_config.dll LoadModule log_config_module modules/mod_log_config.so <IfModule log_config_module> LogFormat " %h %l...

Creating a new execution thread pool for servlet for WLS 7 and WLS 8

Step 1: You need to create a new Execute Queue (e.g. CriticalAppQueue) Click the name of the server instance where you will add the execute queue. Select the Monitoring —> General tab. Click the Monitor All Active Queues text link to display the execute queues that the selected server uses. Click the Configure Execute Queue text link to display the execute queues that you can modify. Click the Configure a New Execute Queue link. On the Execute Queue Configuration tab modify the following attributes or accept the system defaults: Queue Length : Always leave the Queue Length at the default value of 65536 entries. The Queue Length specifies the maximum number of simultaneous requests that the server can hold in the queue. The default of 65536 requests represents a very large number of requests; outstanding requests in the queue should rarely, if ever reach this maximum value. If the maximum Queue Length is reached, WebLogic Server automatically doubles the size of the queue to...

Monitoring & Troubleshooting ASP.NET Applications on IIS

Toolbox for the IIS server net start sc query tasklist /svc netstat –ano wfetch/ wget/ curl procmon/ filemon/ regmon eventvwr mmc.exe (reliability & performance monitoring) process time & % .NET CLR ASP.NET Requests Wait Time Requests Current Requests Queued Requests Rejected Toolbox for the browser machine ping telnet/ portcheck wfetch/ wget/ curl When should application pools be turned into web gardens? Web gardens can only be used if the application doesn’t use in-process session variables but rather out-of-process ones (e.g. session state service or database session state). Drivers to using web gardens are: Application makes long-running synchronous requests Application is low in availability and crashes often Application creates high CPU load on work process Non-IIS Settings & Bottlenecks Optimum paging file size setting: 1.5x the RAM for 32-bit OS system-managed for 64-bit OS Disk queue length should always average les...