Blog Objective

This is a blog that attempts to make life easier by noting down the author's accrued knowledge and experiences.
The author has dealt with several IT projects (in Java EE and .NET) and is a specialist in system development.

29 September 2011

Constraints in the forms of Triangles, Squares …

Life is full of constraints. It seems like you can never simply "have your cake and eat it".

service triangle


In Product Development, we have the following Project Triangle:

  • Good - quality of the delivered product
  • Fast – time taken to deliver the product
  • Cheap – cost of designing/ developing/ delivering the final product
We are told to pick any two!

In System Development, the Project Management Triangle follows:
  • Scope – the set of functionality for the system or the work to be performed
  • Schedule – time taken to complete the project
  • Cost – cost of implementing/ delivering the final system
We are told to pick any two!

A variant in System Development, a Project Management Triangle (more like a Square) follows:
  • Scope – the set of functionality for the system
  • Schedule – time taken to complete the project
  • Cost – cost of implementing/ delivering the final system
  • Performance – the ability of the system to take load and be responsive
We are told to pick any three!

For distributed systems using shared data among nodes, a similar triangle – known as CAP theorem or Brewer’s Conjecture – exists. The three constraints are:
  • Consistency – all distributed nodes are in-sync and consistent and “sees” the same data
  • Availability – every client’s request will receive a response, whether successful or not
  • Partition tolerance – system continues to operate despite
    • arbitrary messages between nodes dropping or arriving late; or
    • nodes becoming unavailable (due to crashes)
We are told to pick any two!

24 September 2011

System Deterioration

Machineries need oiling after some time; building structure deteriorates due to natural forces; facades need to be repainted; the Golden Gate bridge needs to be repainted; vehicles need to be maintained; what does that mean for systems?
I’m inclined to believe that software systems, like all other things, need to be constantly maintained, oiled, cleaned, before they deteriorate.

How does deterioration look like?

Deterioration may appear in these forms:
  1. systems become slower progressively
  2. systems crash/ become unavailable more often
  3. systems become more bloated (larger codebase, more storage space required, etc.)
The system was originally deployed and tested fine so what went wrong?

How does deterioration happen?

It typically happens due to the following forces:
  1. user-base increased post-deployment to a number that was not intended/ tested for
  2. smart users found ways to use the system that was not originally intended for
  3. operation/ support team did not make it a point to upkeep the system
  4. “quick and dirty” changes made to the system resulted in technical debt
  5. data growth beyond expectation or unplanned data growth

What can be done to upkeep the system?

The following could alleviate the situation:
  1. repay the technical debt (if any)
  2. plan and implement archival strategies for the file system and database
  3. consider data partitioning for huge tables
  4. review database indices to remove old and unused ones while creating new ones where necessary
  5. review the hardware capacity and resize as appropriate
  6. make use of multi-threading/ multi-processing where necessary (only if such skillset is available)

23 September 2011

Technical Debt

This wonderful metaphor by Ward Cunningham reminds us that doing things in a “quick and dirty” manner or taking shortcuts sets us for a debt which, when not repaid promptly, incurs interest in the future.
When developing systems, enhancing existing ones, or even fixing bugs, we will typically arrive at a crossroad: should we take the time to do it right; or should we take shortcuts and deliver quickly?
It is generally in the interest of the business folks to roll out changes quickly; while the development folks would choose to properly design, implement and test changes before delivery.
Fortunately, It doesn’t apply to everything (e.g. code smells or design flaw). 
It requires a deliberated decision to do something that is not sustainable in the long-term, but yields a short-term benefit. The result of which is not in the interest of the system's ongoing maintainability.
 

Examples of technical debt

Some common examples include:
  1. hardcoding values in source code
  2. postponement of documentation
  3. postponement of writing tests
  4. overusing TODO comments
  5. ignoring compilation (e.g. deprecation) or code analysis warnings
  6. not merging code in source code management

Issues with technical debt

How does a technical debt haunt us? The following could be signs of debt (or interest, in financial terms):
  1. instability in the system
  2. increase maintenance costs
  3. feeble architecture affecting extensibility
  4. patching data takes up most of the time

Should one ever incur a technical debt?

Indeed, there is no right or wrong!
The recommendation is that:
  • if the system is reasonably stable, incurring a technical debt is fine. However, this needs to be repaid promptly.
  • if the system is nowhere stable, do not incur further debts. The reason for the current state may well be due to a technical debt!?

How does one pay down the debt?

A common excuse to paying down technical debt is "if it ain't broken; don't fix it".
However, the recommended 3-pronged approach is:
  1. Track down the technical debt
  2. Stop incurring debt
  3. Start repayment
Repayment which could be in these forms:
  1. go back to make changes one would have done if time and resources had permitted
  2. tune the system (oil the machinery!)
    1. database tuning is the most common (e.g. archiving, indexing, implement partitioning, etc.)
    2. review configurable parameters (maximum requests, timeout, etc.)
Ideally, channel all resource into paying down the technical debt instead of building new features.
A more practical approach is to always reserve time to repay debts while continuing with business-as-usual (enhancements, bug-fixing, etc.).

22 September 2011

Logging Levels

Most popular logging frameworks (e.g. Log4J, SLF4J, Log4Net, EntLib) allow for several logging levels to be instrumented. A common question is: What logging levels are appropriate under what scenarios? The following lists some common log levels are their usage:
  • Fatal -
    • Service/ application/ process about to terminate.
    • Unable to proceed with normal operation.
    • Force a shutdown to prevent (further) data loss.
    • Examples:
      • No more system resource (OOM)
  • Error -
    • Unhandled or unexpected system error has occurred.
    • Current operation/ thread to be aborted.
    • Issue needs to be fixed.
    • Still able to continue with normal operation (at least for other users/ sessions) otherwise.
    • Examples:
      • database deadlock
      • Unexpected/ unhandled exception
      • Runtime errors
      • Transaction aborted
      • Can't create file
  • Warn -
    • Things are generally working fine.
    • Recoverable conditions.
    • Transient environmental conditions happened.
    • Something happened (but was handled) that may potentially turn into error conditions soon but has yet to happen.
    • Examples:
      • number of connections getting low
      • operation timeout
      • CPU utilisation reaches 80%
      • data unavailable; using cached/ default values
  • Info -
    • Useful and important information for logging.
    • No cause for concern.
    • A normal & expected event happened.
    • Useful in the running and management of the system.
    • Examples:
      • Successful initialisation
      • Starting/ stopping of services
      • Successful transactions
  • Debug -
    • Useful in troubleshooting in production environment but otherwise, should only be used in development and testing environments.
    • Used by IT professionals (e.g. developers, system admins, etc.).
  • Trace -
    • Detailed tracing and logging.
    • Specifically for developers only.

20 September 2011

Minimum (Lean) System Documentation

Let’s admit it: any form of system documentation is not up-to-date. The moment we start producing it, it is out-of-date.

If that is the case, we should do with minimum (perhaps, lean) system documentation that is kept current. This begs the next question: How little is enough?

If I imagine myself taking over a system from someone else, I believe the minimum/ lean system documentation should contain the following:

Solution Design
  • description of the high-level process flow
  • description of the main modules/ services in the system
  • what are the architecturally significant use cases or main functions of the system?
  • what processes/ components make up the system? E.g.
    • Are there web-based applications?
    • Are there batch processes?
  • what are the databases in use?
    • What are the primary (entity) tables in use?
  • what output is generated by the system? E.g.
    • Are there printed output?
    • Are there output for system integration?
    • Are there messages (emails/ SMS) sent?
  • what systems are integrated and what integration patterns are used?
    • what are the triggers for interfacing?
  • what user roles are interacting with the system?
  • what are the batch jobs and their inter-dependencies?
  • what 3rd party – whether freeware, open-sourced, or COTS – libraries/ frameworks are in use
    • what are they used for?
Deployment & Maintenance
  • what configuration files are used and location of these files?
  • what are the folders (local/ mounted) in use?
  • where are the log files located?
  • what is the deployment environment (topology) like?

Oracle Weblogic Server States

With respect to the Weblogic server, there are several runtime states that are interesting. In addition, certain events/ commands lead to the transitions to other known states. I attempted to capture the state transitions and events into a UML state diagram for easy reference.
States in brown are end-states while those in yellow are transitional.

image

18 September 2011

Performance Testing

There are 3 main concerns that performance testing seeks to address.
These concerns form the acceptance criteria or metrices for the tests.
They are:
  1. User concern – Response Time
  2. Business concern - Throughput. E.g. requests/ sec; calls/ day
  3. System concern - Resource Utilisation (often overlooked). E.g. processor & memory utilisation; disk I/O; network I/O

Purpose of Performance Testing

Depending on what has been done, performance testing helps in assessing:

  1. release readiness
  2. infrastructure adequacy
  3. software performance
  4. performance tuning efficiency

Some Definitions


  • Performance Target = Performance Goals
  • Performance Requirements = contractual obligation, SLAs that cannot be compromised. = Performance Thresholds
  • Workload = Stimulus applied. E.g. number of users, concurrent users, data volume, transaction volume.

Types of Performance Testing


  • Load Testing - testing within anticipated production load/ volume
  • Stress Testing - testing beyond expected production load/ volume. Objective is to reveal application bugs that will only surface under load. E.g. synchronisation issues, race conditions, memory leaks.

12 September 2011

Micromanagement

I just came across an interesting article regarding Dangers of Micromanagement. See here.
I've extracted an excerpt which I'm in total agreement with. The dangers are:
Less risk taking, less initiative, wasted resources. Employees will learn that a micromanager is going to direct in such detail that they will learn to wait for direction in what to do and how to do it.In a micromanagement environment, employees often end up waiting to execute; or they move forward only to be redirected by their manager, wasting valuable time and resources in the process.

Less innovation. When people are being told what to do, there is little to no room for creativity or new solutions. The value of diverse thinking is lost because there is a feeling that mistakes are not acceptable. Micromanagement does not facilitate a "continual improvement" mind-set.
My experiences and observations thus far has not differed from the above.
  1. I’ve worked for micro-managers.
  2. I’ve worked with micro-managers.
  3. I’ve worked in an end-user environment that micromanages our suppliers.
  4. I’ve also worked in a supplier environment that is being micromanaged.
Whether the relationship applies to manager-to-subordinate or client-to-supplier, the same dangers are prevalent.
     In the same article, the counter-measure are:

1. A manager is a leader first, and an expert second. Serve as a coach to employees - cultivate their skills, growth, creative problem solving, etc. Become an expert at leading!
2. Focus on what, not how. What needs to be done? How will success be measured? Focus on results - not methodologies. Provide employees with the metrics and the parameters by which they must operate (e.g., timelines, decision-making authority, scope, resources). To help a team generate its own ideas, ask open-ended questions (not leading ones), and then be quiet and listen. Team members may know of a more efficient and affordable path to achieve a desired outcome.
3. Be clear about expectations! A common mistake is that managers believe they are being clear, when in reality, they are not. I've seen many managers develop an idea of what they want, but withhold that vital information from their team. A manager who can picture a successful outcome should share it.
4. Establish reporting tools and timelines. Delegation is about letting go - and that's not always easy. Proper delegation still requires a manager to provide leadership and mentorship. The delegated work needs to be tracked to provide two-way feedback. Identify the assignment's key tasks and milestones, and then determine when and how feedback should be given. This will give assurance that the project is on track and the opportunity to influence direction or decisions at critical points, rather than at random minor points along the way.
5. Provide context. People like to feel like they are contributing to something bigger than themselves. They need to understand why their assignment is critical. Context for the task allows employees to be more connected to the objective and remain motivated. Plus, when they understand the business context for the assignment, they make better decisions.
Is it time to break free from micromanagement?

Creating Scalable Systems

Some food for thought. Consider the following:
  1. Prefer BASE over ACID transactions
  2. Prefer asynchronous over synchronous transactions
  3. Keeping state is expensive
  4. Considering database sharding (highscalability, codefutures, Pros and Cons) by data, by transaction or by customer but avoid premature optimisation
  5. Design the system for automated rollback
  6. Create isolative structures; share nothing; such that nothing crosses the swimlanes
  7. Design systems for failure
  8. Create idempotent services where possible
Database sharding requires changes in mindset:
  1. Tables may need to be denormalised to optimise sharding (as well as to workaround cross-shard joins/ queries)
  2. Scale-out instead of scale-up
  3. Do away with replication where possible
Different sharding schemes are:
  1. Vertical partitioning – sometimes known as functional or feature partitioning where data relating to certain entities are grouped together. Different functions or features are put onto different shards.
  2. Range-based partitioning – data for a certain function/ feature/ entity is sharded using ranges (such range may be based on year, location, etc.)
  3. Hash-based partitioning – data for a certain function/ feature/ entity is sharded using a hash function (modulo operation)
Database sharding presents a number of issues:
  1. Data needs to be rebalanced from time-to-time
  2. Joining data from multiple shards (cross-shard join) is expensive
  3. Referential integrity is now an issue since referential data may now be in a different database
  4. Sharding is relatively new; no body of knowledge and lack of support

Notes on Corporate Strategies


Scenario Planning

  1. Consider STEP factors:
    • Sociological
    • Technological
    • Economical
    • Political
  2. Perform Impact Analysis (Impact vs. Probability as the axes of a graph)
  3. Construct scenarios

Porter’s 5 Forces

  1. rivalry
  2. barrier to entry/ exit
  3. substitutes
  4. bargaining power of suppliers
  5. bargaining power of buyers

Balance Scorecard

  1. financial perspective – revenue, costs, profits, EPS
  2. customer perspective – responsiveness, base customers, CRM, complaints, branding
  3. internal business perspective
  4. innovation & learning perspective

SWOT analysis

  1. Strengths
  2. Weaknesses
  3. Opportunities
  4. Threats