Hello mainframers!
Have you ever think of creating a performance health checker list for yourself? List that is full of items that helps you have the best performance in your environment. I am pretty sure most of you did, either as written or in mind and applied. I tried both: written verison and applying it actually... I am still trying to update it with latest enhancements though. Today I wanted to share my view of this process and I love to hear yours too.
I started working on it when I was trying to plan my PDB - RMF monitor III database and adding automations, alerts to my environment in 2009. The reason behind this process was that I realized it took a lot of time to do performance troubleshooting and every time I was dealing with it, I was doing the same actions. As data and amount of resources that you need to check increases automation process and alert mechanism becomes much more important. This is actually one of the items that our EWCP Project Leader Norman Hollander shared in his perfect presentation that he gave in last znextgen monthly call. He talked about how performance management changes and answer question to "Is it Worth It? ". Linda Mooney - znexgen project manager - uploaded it to website. Here is the link to Norman’s well prepared presentation that is useful not only for znextgen folks, but also for us as well.http://www.share.org/p/do/si/topic=39&source=7. I like to share more about this subject in next articles, in addition to Norman’s perfect view as well. It makes us think again, why it is so important to work on performance improvements and how we need to adapt to latest updates in market. Linda recently shared with us that Norman will do another one in next znextgen monthly call. We are looking forward to it…It will be about SLA and basics of performance where we will also find something to gain from this prezentation as well.
Today, I want to share the list that I created in general. First of all, what are our resources? CPU - Memory - I/O . This is the top group. If you are dealing with not only MVS but also subsystems like DB2 and CICS, MQ your first group will be MVS-DB2-CICS-MQ-WAS-IMS etc whatever ıs suitable for you. Because there will be many tuning parameters specific to those SWs…There are several lists. Let’s assume MVS only. z/OS is the main item. I divided the list into categories as well, as DESIGN (Feature/Function Implementation), ROT, Awareness.
We are doing several different tasks about performance: Tasks for performance troubleshooting, tasks for improving performance of environment, tasks for improving performance management (getting best benefit from monitoring products,alerts,SW that we are using to get performance data, methods that we are using)…The list that I am talking about here is related to tasks about performance troubleshooting and improvement. We can also have another list related to tasks for improving performance management. This second list includes all resources,actions we need to implement or do in order to do much better performance management. For example : Getting ready for new SMF30 field that shows the TCB in one AS that has the highest cputime. Another example, start collecting SMF113 records or,start collecting new LPAR interrupt delay time in SMF records,etc...This is another topic.But today let's go back to our original performance check list.
DESIGN
Here are some of the items in my DESIGN list... These are mostly new features that are recommended to implement- some may depend on your environment’s needs.... “Using latest HW & HW microcodes and latest z/OS version as much as possible” is my default one…
1. Hyperpav
2. zHPF
3. MIDAW
4. Hiperdispatch (now default)
5. FlashExpress
6. IRD (if you need)
7. BlockWorkload Support
8. WLM Managed Initiators
9. zIIPs & zAAPs (zAAP on zIIP)
10. Large Page Support
11. I/O Priority Manager (WLM Mode/HW only mode)
12. Group Capacity Limit , Hardcap,Softcap….
13. WLM Resource Group Usage To Protect Loved Ones
14. HCA3 New Infiniband CF Connection
15. FICON Express8 ,SSDs
.... any many other features that you see in zEC12 or new z/OS version.
This list might be familiar for those of you who have seen the SHARE MVS survey where our colleagues and Cheryl Watson did a great job preparing this survey. This also shows that there are many nice features and functions that were being invented in mainframe platform related to improving performance of our workloads.
These are not only improves your workload's response time but also decreases your CPU cost. For example: If you migrate from ICBs to Infiniband , or from Infiniband HCA2 to HCA3 even, your response time for each of your sync request will be improved. When your workload waits for CF request to complete, you do CPU spin. If you improve your CF sync request response time, that means, you also use less CPU cycles!.I found this nice example for “achieving two targets with one shot” directly.Every improvement you do,will efffect either CPU,memory or I/O.But these are all attached to each other as it is the case for ASes who are sharing same resources.One improvement in one resource lets others use it much more and causes its delay to be decreased.
These are well-known ones. And there are some more detailed ones. For example: ARCH level and Tune parameters in compilers, improvements in every z/OS release, z/OS BCP, WLM, DFSMS, JES2/JES3, DFSMS…
ROT
In Rule of Thumb (ROT) part, I have general ROT values for different items. These values sometimes differ from one expert to another but they are nearly same in general. Whichever you consider, they save you from basic performance problems…
1. Channel Utilization > %30 is not good
2. Device Queuing intensity > 300 not good
3. WLM service class’s PI > 3 is not good
4. Between each service classes velocity goals, out at least more than 5 as velocity goal difference.
5. CF sub channel busy condition should not be above %10 of all requests
6. False lock contentions in CF structures is not recommended to be above %0.1 of requests
....any many more...
There are several performance monitor products in market and they have exception mechanisms where you can also change the threshold values. I used products’ interfaces and some coding to create my own additional exception rules…
AWARENESS
Awareness part consists of rules, best practices that we should keep in mind. Some of these are actually triggers for alerts that we use or triggers for additional exception values.It also contains rules that effect how we implement solutions.Seperating some important datasets from others,locating CF structures according to systems who use them in order to get benefit from ICs etc...
Also these are much more depend on our environment ...Some are about trackin abnormal resource usage. In order to realize our abnormal values, we need to know normal ones.How do you know your normal values. For example: Do you know your master AS's CPU usage? IOSAS's CPU usage when systems are working fine ? or Do we need to check trend data everytime there is problem?. That is another big subject that I like to talk more in future.
As another example for awareness: CEC Utilization above %95 (?) trigger myself for much careful analiz. My all abnormal values for my system ASes CPU/memory usage will trigger me to check more on them or do an automatic action.
There are other items in awareness part which are related to being up-to-date, learn, and increase knowledge: improving/knowing where to find more resources of knowledge. Mine are: following SHARE sessions, prezentations, IBM WSC techdocs (which is my default webpage), APAR tracking through IBM resource link webpage’s notification mechanism, IBM Redbooks webpage’s notification mechanism… In addition to all above ,we also need to consider doing best capacity planning,using tools for it,like zPCR, collecting correct usage items to help capacity planning process. Bad capacity planning causes performance problems as we all know.
I wonder how great it will be if we start a SHAREd-document and have every one of us put an update to the list….We don’t need to go too far away. Most of them are in our minds. And to combine them and to check whether we missed anything or not ,all the information that we need is our SHARE conferences sessions and for the upcoming ones and new experiences we need to wait for next great sessions at SHARE in Boston. You can check MVS program: MVS core-EWCP-Storage sessions and write down hints/tips/new features/functions you remember from sessions you attended/as well as the ones that you could not attend. You can do comment to each feature about what other customers saw as benefit to performance. It is so nice that SHARE website has search engine. Here is the link for search engine: http://www.share.org/p/do/se/topicid[]=50&dp=6m I personally thank to those SHARE colleagues who had done amazing work by categorizing all those documents and creating this search panel.Please give it a try .It is also searching session abstracts.
Looking forward to your feedback...
Till next time, stay tuned!.