DB2: explanation of monitoring scripts

 Tailored Brands

Custom Monitoring Scripts

November 8, 2022

304 South 8th Street, Suite 201,

Colorado Springs, CO 80905

[Office] 888-685-3101 • [fax] 719-685-3400

www.XTIVIA.com

www.Virtual-DBA.com

© 2022 XTIVIA

Confidential Page 2 of 23

Notices

Copyright © 2022, XTIVIA Inc.

This document contains information proprietary to XTIVIA and the Client for which it was produced. The information, whether in the form of text, schematics, tables, drawings or illustrations, business proposals, must not be copied, reproduced, stored, or transmitted in any form, without the prior written consent of XTIVIA.

All material in this document is to be considered confidential to XTIVIA and must not be disclosed in any form without the prior written consent of XTIVIA.

All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission from XTIVIA, except as necessary for review by Client to which it was delivered.

XTIVIA Roles & Contact Information Role Name Telephone Email

DBA Author Title

Marc Petros, Db2 DBA 719-309-1124

mpetros@xtivia.com

Business Development

Document History Version Action Action Date

1.00

Original Document

November 8, 2022

© 2022 XTIVIA

Confidential Page 3 of 23

Table of Contents

Notices ............................................................................................................... 2

XTIVIA Roles & Contact Information .................................................................. 2

Document History ............................................................................................... 2

Table of Contents ............................................................................................... 3

How to Use This Document ................................................................................ 4

Directory and File Setup ..................................................................................... 4

Directories .......................................................................................................... 4

Files .................................................................................................................... 5

Permissions and Ownership ............................................................................... 7

Script Basics ....................................................................................................... 8

Script Types ....................................................................................................... 8

Script Arguments ................................................................................................ 8

Alert Respones ................................................................................................... 8

Script Layout and Functions ............................................................................... 9

Script Temp Files ............................................................................................. 10

Script Log Files ................................................................................................. 10

Standby Databases .......................................................................................... 10

User Accounts .................................................................................................. 10

Database Backups ........................................................................................... 11

Database Configuration Backups ..................................................................... 13

Diagnostic Log Monitoring ................................................................................ 14

Database Availability ........................................................................................ 15

HADR Monitoring ............................................................................................. 16

Crash Monitoring .............................................................................................. 17

Transaction Log Monitoring .............................................................................. 18

File System Monitoring ..................................................................................... 19

CPU Monitoring ................................................................................................ 20

Memory Monitoring ........................................................................................... 21

Swap Monitoring ............................................................................................... 22

Host Uptime Monitoring .................................................................................... 23

© 2022 XTIVIA

Confidential Page 4 of 23

How to Use This Document

This document provides details on all of the scripts used to monitor the Db2 databases for Tailored Brands. The information provided can be used to understand how the scripts function and what customizations are allowed. While this document can be read start to finish, it is intended to answer questions as they arise. It is expected that you will only need to read those sections that can help answer questions.

Directory and File Setup

This section describes the normal setup for the script’s directories and files. It is important to follow this structure exactly as the script will use the absolute paths to these directories and files.

Directories

The following directories are used for housing the files associated with these scripts. The paths start with the admin_scripts directory. As long as the admin_scripts and all directories contained therein are kept the same, the parent directory can be located anywhere. Each script will set the path where the script is located as a variable. The sub directories are hard coded to the parent directory, admin_scripts.

admin_scripts/

This directory contains all of the scripts used to monitor a Db2 database and its host. It may also contain administration scripts for performing specific DBA tasks. It is advised to not execute any of these scripts if you do NOT know what they are for.

If you need to know what the script does or how to use it, you can execute it with the ‘-h’ or ‘-help’. This will display the usage of the script. With a rare exception, most scripts will be run like so:

./<script> <instance> <database>

The script will fail if the instance name and database name are not listed in the correct order. The script will also fail if you do not include both parameters.

admin_scripts/output/

This directory contains all of the log files created by the scripts stored in the admin_scripts directory. There are typically two types of files that can be found here.

• The first is a history file. These files contain historical data collected from certain scripts that use the data as part of their alert detection logic.

• The second type of file found here is the output from scripts that detect an issue and send an alert.

Any executed script that does not find a problem will delete the file on exit. Each Db2 monitor script is written to clean up after itself. There is no separate script for pruning old output files since one isn't needed. There is a variable in the scripts that is used by the scripts to determine when an output file is ready for deletion. This age can be adjusted in the scripts.cfg file found in the admin_scripts directory.

© 2022 XTIVIA

Confidential Page 5 of 23

admin_scripts/output/cron_files/

This directory contains files from each scheduled script. The purpose of these files is for debugging. The files are overwritten by each execution of a script. A file should exist for each script and monitored instance.

admin_scripts/output/timers/

This directory contains files that are used by the scripts found in admin_scripts. Most of these files will be empty, however, some may contain data. The creation/modification date of these files is used by their associated script for determining when an alert should be sent. This prevents the scripts from sending too many alerts over a given period of time. Deleting these files has no impact on the script’s performance. Each script will simply create a new timer file as needed. Most scripts will use a default timer of 30 minutes.

Files

This section outlines the files used by the scripts and users working on the scripts.

Readme.txt

This file contains information regarding the directories and files within that directory. The text is generic and covers all important directories and knowledge needed to understand the basics.

#########################################################################

#

# Last Updated: 2022-09-14

#

# This Readme file is for the Db2 monitoring scripts used

# by Tailored Brands.

#

# Instead of creating a separate readme file for each

# instance, with only the info for that instance, this file

# contains the necessary details for all instances where a

# readme file is useful.

#

# Each instance is denoted by the name of the directory

# containing this file. Feel free to skip to that section

# as needed.

#

#########################################################################

./admin_scripts

This directory contains all of the scripts used to monitor a Db2

database and its host. It may also contain administration scripts

for performing specific DBA tasks. It is advised to not execute

any of these scripts if you do NOT know what they are for.

PLEASE DO NOT MAKE ANY CHANGES TO THESE SCRIPTS.

There is a configuration file named scripts.cfg that is used to

set the default values for key variables in each script.

You MAY make changes to that file. Each variable is listed

with its description for easy reference.

© 2022 XTIVIA

Confidential Page 6 of 23

If you need to know how to run any of the monitoring scripts, you

can access the help info by running the script with either '-h' or

'help'. The other way is to view the script. The help info will

be at the top of the file's contents.

./admin_scripts/output

This directory contains all of the log files created by the

scripts stored in the admin_scripts directory.

There are typically two types of files that can be found here.

- The first is a history file. These files contain historical

data collected from certain scripts that use the data as part

of their alert detection logic.

- The second type of file found here is the output from scripts

that detect an issue and send an alert.

Any executed scripts that do not find a problem to alert will

delete the file on exit.

Each Db2 monitor script is written to clean up after itself.

There is no separate script for pruning old output files since

one isn't needed. There is a variable in the scripts that is

used by the scripts to determine when an output file is ready

for deletion. This age can be adjusted in the scripts.cfg file

found in the admin_scripts directory.

./admin_scripts/output/timers

This directory contains files that are used by the scripts

found in admin_scripts. Most of these files will be empty,

however, some may contain data.

The creation/modification date of these files is used by

their associated script for determining when an alert should

be sent for a positive result. This prevents the scripts from

sending too many alerts over a given period of time.

Deleting these files has no impact on the scripts performance.

Each script will simply create a new timer file as needed

during their next execution that finds a positive result.

./admin_scripts/output/cron_files

This directory contains files from each scheduled script.

The files are overwritten by each execution of a script.

A file should exist for each script and monitored instance.

The purpose of these files is for debugging.

script.cfg

The script.cfg file contains all of the variables that are used to determine key activities by the script. This file is sourced by each script upon execution. If there is a need to tweak the way a script works, this file is what you edit to do that. All of the scripts

© 2022 XTIVIA

Confidential Page 7 of 23

are written to run successfully without this file. This file makes tuning the scripts easier and prevents possible problems from erroneous edits.

The file is made of a list of variables. These variables are grouped together by the script that uses them. The very first group of variables is the “Universal variables”. These variables are the defaults used by all of the scripts. They include:

• Hostname

• Number of days to keep script log files.

• The email address for warnings.

• The email address for pages.

You will notice that there are entries for all of these variables, except the host, for each individual script. The logic for these variables is as follows:

1. If the script.cfg file exists, source it.

2. If the variable for the specific script being run is set, use it.

3. If the variable for the specific script being run is NOT set, use the universal default variable.

4. If neither the variable for the specific script nor the universal default variable is set, use the default from inside the script.

Option 4 can come into effect under any of the following conditions:

• The script.cfg file was not found or sourced.

• The variables within the script.cfg file are empty.

• The script.cfg file has a specific script variable that is empty and there is no universal default variable listed.

Permissions and Ownership

All of the files in the admin_scripts directory, that are part of monitoring, should be owned by db2admin. The group should be db2iadm1. The permissions should be rwxrwxr_x (775). The output files created by the monitoring scripts will have permissions set according to the account that generates the file.

diag_ignore.txt

This file contains the patterns that the diagnostic monitor script uses to determine if a targeted entry type should be ignored. Please take special care in choosing the patterns to record in this file. If a pattern can occur multiple times within a single entry, it can cause a suppression of otherwise valid alerting. There is a heading in this file that outlines the necessary points to consider. Empty lines are okay within the file and lines can be commented out by starting the line with a ‘#’ character.

#

# READ ME !!!

#

# This file contains patterns used by the diagnostic monitor script

# for ignoring specific entries.

#

# There should be 1 and only 1 pattern per line.

© 2022 XTIVIA

Confidential Page 8 of 23

# Each pattern can contain multiple words.

# When possible do not use patterns that contain special characters

# "\/[]{}`~^&*()

# Avoid patterns containing quotation marks.

# To verify which characters are okay, please reference the read

# command's man page.

#

# The script ignores lines in this file that are empty or start with a #

# character

# just like this comment section.

#

# When selecting a pattern please verify that the pattern is specific

# enough that it is only found in the offending diagnostic entries AND

# that the pattern only occurs 1 time in each offending diagnostic

# entry.

#

# Patterns that occur multiple times in a single entry can suppress

# valid alerts.

# This occurs because the db2.diag_mon.ksh script counts the number of

# lines containing

# the pattern and subtracts this amount from the total number of

# entries found.

#

# As an example, if the db2.diag_mon.ksh script finds 2 entries worthy

# of alerting,

# but 1 of the entries should be ignored, and that entry contains the

# pattern twice,

# the script will not send an alert ( 2 - 2 = 0 ).

#

Script Basics

The following pages describe each of the scripts in use. This section describes those areas that are common across all of the scripts. The scripts are written in Kornshell as this is a requirement for Db2 and is therefore available on all Unix/Linux hosts.

Script Types

The type of script can be either reactive or predictive. Reactive scripts alert when a problem has occurred and predictive scripts monitor thresholds and will alert when a problem could occur.

Script Arguments

Most scripts are run by calling the script with the instance name and database name. Any scripts that allow for multiple arguments will have those options outlined.

Alert Respones

Many of the scripts alert for problems that can be complex to troubleshoot and/or resolve. The ability to provide a comprehensive set of steps for each script is beyond the scrope of this document. That said, where possible the alert will provide a a starting point to addressing the alert in question. Where possible, these responses are included in the script’s description in this document.

© 2022 XTIVIA

Confidential Page 9 of 23

Each script’s description will also contain a 3am response in red under the response heading. This is intended to provide the least amount of work required to address the problem during the most inconvenient times. Unfortunately, not all alerts have easy responses and sometimes the response is digging in to resolve the problem then and there.

Script Layout and Functions

The format for each script is identical across almost all of the scripts. At the beginning of the script is a notes section that details what the script does and how. It will also include any information on the privileges or permissions required by the user account that is running the script.

After the notes section, there are the functions used within the script. The primary functions used in these scripts are as follows:

• endScripendScriptt -- This function is called when the script is ready to exit, and it handles all formatting and messaging for the alerts and any log files.

• checkDb2ErrorscheckDb2Errors -- This function is used to check Db2 commands for errors. Special handling of Db2 exit codes is necessary because Db2 does not use the traditional rules.

o 0 - 1 = Success

o 2 = Warning

o 4+ = Failure

• pageWithErrorpageWithError -- This function is used the check commands for errors. Any exit code greater than 0 is a failed command. Commands checked by this function will send a page alert.

• warnWithErrorwarnWithError -- This function is used the check commands for errors. Any exit code greater than 0 is a failed command. Commands checked by this function will send a warning alert.

Some scripts may use error detection methods without calling a function. In some cases, an error might lead to directly calling the endScript function.

Script Initialization Process

When a script is executed, it will go through an initialization process. This process involves the following steps:

1. Sets the environment for trap conditions and pipe failures.

2. Sets variables that are hardcoded.

3. Verifies paths, files, and directories.

4. Sources the script.cfg file.

5. Any variables not found/set in the script.cfg file are set to their defaults.

6. Verifies the required parameters executed with the script.

7. Finds and sources the db2profile for the instance the script is running against.

8. Checks the HADR role of the database.

© 2022 XTIVIA

Confidential Page 10 of 23

Some scripts may have fewer initialization steps than listed above (e.g., OS level scripts don’t need the db2profile sourced).

Script Temp Files

The scripts may create and use temporary files to complete various tasks. These files are deleted by the endScript function when the script exits. These files should be created in the /tmp directory. One such use is an output file. This file captures the stdout and stderr from each command executed. If a command fails, the information from this output file is then copied to the Additional Information section of the log file that is used when sending out an alert. Each command will overwrite the file’s contents from the previous command.

Script Log Files

The log files created by each script will display several lines of information intended to aid in a user’s ability to assess the situation and begin their work correcting any problems. The details include the date and time of the alert, what host and database the alerts occurred on, the nature of the alert, and a response. The response provides a brief narrative of steps to take to resolve the issue. Below these fields is the Additional Information section which will provide various forms of information based on the circumstances and coding of the script and condition.

When the script completes its execution, it will delete the log file if no problems were detected. For those situations where an error was detected, the log file is left alone so that you may retrieve its information later. These log files are then deleted when their age is greater than the number of days specified by a variable that can be customized in the script.cfg file.

Standby Databases

Each script will check the database’s HADR role. Because connections are not allowed against standby databases, script’s that require a connection will only run against a connectable database (i.e., standard or primary). Standby databases will cause the script to exit as a successful execution. Scheduling a script to run, that cannot be run against a standby database, has the advantage of being ready to begin active monitoring after a failover has occurred.

User Accounts

When running a script, it will be necessary for the scripts success that the user account has the necessary privileges for the instance, database, directories, and files. As a rule, it is best to run any script whose name begins with “db2”, from the instance owner user account. OS level scripts (those whose name begins with “os”) can be run by just about any user account, although these scripts were developed with the db2admin user account in mind so it is recommended to use that account for OS scripts.

© 2022 XTIVIA

Confidential Page 11 of 23

Database Backups

This script does not perform any monitoring. Instead, this script will take either an online (the default) or offline backup. The script will also remove old archive logs and/or backup images if directed. Should problems occur during operation, the script will send an alert.

Name

db2.backup_db.ksh

Type

Reactive -- This script alerts for problems that have occurred.

Settings

This script uses the following options:

Option Argument Purpose

-d

database name

Name of database to backup

-h

Displays script usage

-i

instance name

Name of the instance (needed for sourcing the db2profile)

-l

/absolute path

Determines where the backup image will be stored

-o

Take offline backup

-p

Prune old, unneeded archive logs

-q

Disables sending alert (quiet)

-r

Delete old backups

The location option (l) is used to set the location for where the backup is to be saved to. The script will use this location for deleting old backups, so If you call this script with an uncommon location, the script will only try to delete the backups that are in that location.

When an offline backup is selected (o), the script will force off all connections, quiesce the database, and deactivate the database before starting the backup. Quiescing the database is not strictly required, but aids in ensuring the applications do not successfully reconnect. Finally, it will once again force off all applications and then start the backup. This is necessary for the offline backup to succeed in environments where the applications will automatically try reconnecting.

The option for deleting old archive logs (p) will use the location specified in the database configuration settings for LOGARCHMETH1 parameter. When using this option, the script will construct a list of archive logs that are safe to delete by comparing the archive log file to the oldest log file associated with the backup’s retain variable and the first active log file. For example, if the backup retention is set to 5 days, then the script will delete any archive logs that are older than 5 days from now.

© 2022 XTIVIA

Confidential Page 12 of 23

The quiet option (q) is used to suppress sending an alert. This option is useful when the script is being ran manually by allowing the user taking to backup to not bother other technicians who would receive an alert upon failure. It is assumed that if you running the script quietly that you are available to address issues and/or are running a backup that doesn’t hold any consequences for failure.

The option for deleting old backups (r) searches the directory set by the location option. When the location option is not set, the default location for backups is used:

/usr/opt/app/dbdump/<instance>/

The options for the instance and database are REQUIRED for running this script. All others are optional. No special order is needed for using the options. The configuration file will allow you to customize the following variables:

• The hostname.

• The age, in days, to keep old archive logs.

• The age, in days, to keep old backup images.

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• The default location for storing backup images.

Script Logic

This script opens with the typical initiation. Depending on the type of backup the script will prepare the environment and begin the backup process. When finished, it will clean up old archive logs and backup images as directed. If a backup fails, an alert will be sent.

Steps:

1. Initiate the script.

2. If taking an offline backup, force off connections.

3. Takes backup of database (online/offline).

4. Checks backup for errors and alert if needed.

5. If the backup succeeds, the script continues.

6. Deletes old archive logs (if directed to do so).

7. Delete old backups (if directed to do so).

8. Alert if necessary.

Response

The severity of alerts from this script depend upon the value of the database. Production databases should be attended to immediately as a failed backup can damage DR, while the lower environments can probably wait until business hours.

3am response -- If you have to attend to this in the middle of the night, just try running the script again from the instance owner account or take a manual backup.

© 2022 XTIVIA

Confidential Page 13 of 23

Database Configuration Backups

This script collects key information from a database that is not otherwise captured in the traditional database backup. These files relate to database configuration settings. Information is collected in individual files. These files are then tarred at the end. This script only warns for errors. The recommended schedule for this script is 1x weekly.

Name

db2.cfg_backup.ksh

Type

Reactive -- This script alerts for problems that have occurred.

Settings

This script requires 2 parameters: instance name and database name (in that order). You may also use the parameter “help” instead, and the script will display usage information. The options for the instance and database are REQUIRED for running this script. The configuration file will allow you to customize the following variables:

• The age, in days, to keep old tar files

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• The location for storing the collected files.

Script Logic

This script opens with the typical initiation. Afterward, it runs through a series of commands to collect key configuration details. Instance level details are always collected. Database level details are only collected for databases that are NOT standby databases. Each command is checked for success/failure based on its exit code. If errors are detected, then it sends an alert.

Steps:

1. Initiate the script.

2. Collect instance level details.

3. Check if database is standby.

4. If standby is NOT standby, collect database level details.

5. Check if any commands failed.

6. Alert if necessary.

Response

The output from this script should provide a good starting point for troubleshooting problems. Since this script does not monitor database performance, problems will be due to the success or failure of running commands. Privileges and permissions are the typical cause of problems with the commands in the script.

3am response -- This script should not require any afterhours support.

© 2022 XTIVIA

Confidential Page 14 of 23

Diagnostic Log Monitoring

This script monitors the diagnostic log for two different types of messages: severe messages, and critical messages. If any of these messages are found in the diagnostic log for the period selected, it will send an alert. There is also a ignore file that allows for patterns to be used to ignore entries.

Name

db2.diag_mon.ksh

Type

Reactive -- This script alerts for problems that have occurred.

Settings

This script requires 2 parameters: instance name and database name (in that order). You may also use the parameter “help” instead, and the script will display usage information. The options for the instance and database are REQUIRED for running this script. The configuration file will allow you to customize the following variables:

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• How long to wait, in minutes, before sending another alert.

• The file containing patterns for alerts to ignore.

Script Logic

This script opens with the typical initiation. Afterward, it sets the timestamps for reviewing logs and then pulls the diagnostic logs for that period and searches for the keywords severe and critical. Each instance of the keywords is counted. At the end, if the count is greater than zero, the script will send an alert.

Steps:

1. Initiate the script.

2. Set timestamps for period to review.

3. Pull diagnostic logs for the period.

4. Search and count targeted entry types.

5. Alert if necessary.

Response

Alerts from this script require investigation. The output from the script will provide a command that you can run to view the entries that the script found. These entries will help you investigate the problem, though they may only highlight the effects of the problem. On occasion, the issue may require IBM’s help to diagnose.

3am response -- Verify that the database is up and healthy. Investigate locking and general performance metrics. Resolve issues as needed.

© 2022 XTIVIA

Confidential Page 15 of 23

Database Availability

This script checks database availability by connecting to the database. If a connection attempt fails, the script will send an alert. The script will not try to connect to a standby database.

Name

db2.htbt_mon.ksh

Type

Reactive -- This script alerts for problems that have occurred.

Settings

This script requires 2 parameters: instance name and database name (in that order). You may also use the parameter “help” instead, and the script will display usage information. The options for the instance and database are REQUIRED for running this script. The configuration file will allow you to customize the following variables:

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• How long to wait, in minutes, before sending another alert.

Script Logic

This script opens with the typical initiation. Then it checks the role of the database, and only continues if it is not a standby. Next it will check if there are any previous iterations of the script running and confirms that the database is active (this indicates a hung database). It then attempts to connect to the database. If the connection fails it will send an alert.

Steps:

1. Initiate the script.

2. Check if previous executions are still running.

3. Connect to the database.

4. Alert if necessary.

Response

The additional information section of the output can provide a good indicator of the nature of the problem. Some investigation may be necessary in order to determine the extent of the problem and whether or not the problem is impacting the application.

3am response -- Login and try connecting to the database. Check for active connections to ensure that the application is okay. This alert deserves investigation even after hours.

© 2022 XTIVIA

Confidential Page 16 of 23

HADR Monitoring

This script monitors the HADR cluster and alerts for conditions that impact database availability, failovers, and connection status. If a failover occurs, the script will send an alert. The script monitors from the perspective of the standby in order to provide alerts that are as meaningful as possible.

Name

db2.hadr_mon.ksh

Type

Reactive -- This script alerts for problems that have occurred.

Settings

This script requires 2 parameters: instance name and database name (in that order). You may also use the parameter “help” instead, and the script will display usage information. The options for the instance and database are REQUIRED for running this script. The configuration file will allow you to customize the following variables:

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• How long to wait, in minutes, before sending another alert.

Script Logic

This script opens with the typical initiation. It then pulls current status and compares conditions to previous iteration. It alerts as needed.

Steps:

1. Initiate the script.

2. Verify the database HADR role.

3. Check if the role has recently changed.

4. Check HADR connection status.

5. Check HADR flags.

6. Alert if necessary.

Response

The additional information provided by this script gives you the complete HADR status details. This information may not show you the cause of the problem. Investigation is still needed.

3am response -- Login and verify the health of the primary database. Check that the application has access. Resolve issues with the primary as needed. Afterward, you can investigate any other types of problems with the cluster.

© 2022 XTIVIA

Confidential Page 17 of 23

Crash Monitoring

This script checks for the existence of files created by Db2 in response to a serious problem or crash. These files are known as trap files. The script checks for files that have been created over a previous period of time. It alerts as needed.

Name

db2.trap_mon.ksh

Type

Reactive -- This script alerts for problems that have occurred.

Settings

This script requires 2 parameters: instance name and database name (in that order). You may also use the parameter “help” instead, and the script will display usage information. The options for the instance and database are REQUIRED for running this script. The configuration file will allow you to customize the following variables:

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• How long to wait, in minutes, before sending another alert.

• How far back in time to look for files.

• What type of alert to send (page vs. warn).

Script Logic

This script opens with the typical initiation. It then uses the span variable for the find command that is used to look for files created/modified within that span. It alerts as needed.

Steps:

1. Initiate the script.

2. Looks for trap files modified within time span.

3. Alert if necessary.

Response

The additional information provided by this script only give a basic description of how to respond. Investigation is required in order to address this alert. The additional information will also provide a Db2 command that you can use to review the diagnostic log for help with troubleshooting.

3am response -- Verify that the database is active and healthy. Investigate locking and general performance metrics. Resolve issues as needed.

© 2022 XTIVIA

Confidential Page 18 of 23

Transaction Log Monitoring

This script monitors secondary log usage. Secondary are only used after all available primary logs have been consumed. Secondary log utilization is indicative of log saturation. If all logs (primary and secondary) become consumed, applications will begin failing from log full errors. If secondary log utilization is greater than the threshold, an alert is sent.

Name

db2.tx_mon.ksh

Type

Predictive -- This script alerts for impending problems.

Settings

This script requires 2 parameters: instance name and database name (in that order). You may also use the parameter “help” instead, and the script will display usage information. The options for the instance and database are REQUIRED for running this script. The configuration file will allow you to customize the following variables:

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• How long to wait, in minutes, before sending another alert.

• Threshold, as a percent, of secondary logs usage.

Script Logic

This script opens with the typical initiation. It checks to see if any secondary logs are being used. If secondary logs are used, it then collects details on the oldest connection holding logs and alerts as needed.

Steps:

1. Initiate the script.

2. Check secondary log usage.

3. Collect information on oldest connection.

4. Alert if necessary.

Response

The additional information provides the details from the application that is holding the oldest log. This is usually the culprit. You will want to review that information to determine the status of the connection and if it is acceptable to force it off.

3am response -- Verify secondary log utilization and force off oldest connection if possible.

© 2022 XTIVIA

Confidential Page 19 of 23

File System Monitoring

This script monitors file systems that impact Db2 operations against 2 different thresholds. If a file system’s utilization is greater than the threshold, an alert is sent. The impact of file systems reaching max capacity can cause an outage depending on how Db2 uses the file system in question.

Name

os.capacity_mon.ksh

Type

Predictive -- This script alerts for impending problems.

Settings

Because this script runs at the level of the OS, no parameters are required. You may use the parameter “help”, and the script will display usage information. The configuration file will allow you to customize the following variables:

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• How long to wait, in minutes, before sending another alert.

• Warning threshold, as a percent of file system utilization.

• Paging threshold, as a percent of file system utilization.

Script Logic

This script opens with the typical initiation. It collects file system utilization using df and then checks the values for targeted file systems against the paging threshold first and then the warning threshold. It alerts if necessary.

Steps:

1. Initiate the script.

2. Collect file system utilization.

3. Check each targeted file system against page threshold.

4. Check each targeted file system against warn threshold.

5. Alert if necessary.

Response

The additional information section of the output will display the entire results collected from the df command. Review ALL file systems to ensure that there aren’t multiple file systems filling up. Address the problem(s) based on how the file system(s) is/are being used.

3am response -- Same as the response outlined above.

© 2022 XTIVIA

Confidential Page 20 of 23

CPU Monitoring

This script monitors the cpu load averages from the command uptime. If the load average is greater than the processing ability of the host it will collect details from top, and the script will send an alert.

Name

os.cpu_mon.ksh

Type

Reactive -- This script alerts for problems that have occurred.

Settings

Because this script runs at the level of the OS, no parameters are required. You may use the parameter “help”, and the script will display usage information. The configuration file will allow you to customize the following variables:

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• How long to wait, in minutes, before sending another alert.

• The number of cores on the host.

Script Logic

This script opens with the typical initiation. It collects the following information: number of cores, and the load averages. The number of cores is used to determine the threshold. If the 15 minute and 1 minute averages are greater than the threshold, a paging alert is sent. If the 5 minute and 1 minute averages are greater than the threshold, a warning alert is sent.

Steps:

1. Initiate the script.

2. Collect number of cores.

3. Sets the threshold.

4. Collect load averages.

5. Check load average combinations against threshold.

6. Alert if necessary.

Response

The additional information section will contain details from top. These can be used to start your investigation. Due to the complexity and breadth of cpu usage as it affects performance, a more detailed description for investigation is beyond the scope of this document.

3am response -- Verify that the database is active and healthy. Investigate locking and general performance metrics. Resolve issues as needed.

© 2022 XTIVIA

Confidential Page 21 of 23

Memory Monitoring

This script monitors the utilization of memory on the host. The impact of insufficient memory can degrade database performance. The amount of memory used is compared against the paging and warning threshold. If usage is greater than one of the thresholds, an alert is sent.

Name

os.memory_mon.ksh

Type

Predictive -- This script alerts for impending problems.

Settings

Because this script runs at the level of the OS, no parameters are required. You may use the parameter “help”, and the script will display usage information. The configuration file will allow you to customize the following variables:

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• How long to wait, in minutes, before sending another alert.

• Warning threshold, as a percent of memory utilization.

• Paging threshold, as a percent of memory utilization.

Script Logic

This script opens with the typical initiation. It collects memory usage from /proc/meminfo and compares the utilization against the paging threshold first and then the warning threshold. Alerts are sent if necessary.

Steps:

1. Initiate the script.

2. Collect memory usage.

3. Check usage against the page threshold.

4. Check usage against the warn threshold.

5. Alert if necessary.

Response

The additional information section includes the output from top. This information can be useful toward investigating problems. You will want to login and determine the problem and any needed resolution.

3am response -- Verify that the database is active and healthy. Investigate locking and general performance metrics. Resolve issues as needed.

© 2022 XTIVIA

Confidential Page 22 of 23

Swap Monitoring

This script monitors the usage of swap. High swap utilization may mean that the memory on the system is overloaded. Since swap is the use of physical disks, its usage can cause performance degradation. The amount of swap used is compared against the paging and warning threshold. If usage is greater than one of the thresholds, an alert is sent.

Name

os.swap_mon.ksh

Type

Predictive -- This script alerts for impending problems.

Settings

Because this script runs at the level of the OS, no parameters are required. You may use the parameter “help”, and the script will display usage information. The configuration file will allow you to customize the following variables:

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• How long to wait, in minutes, before sending another alert.

• Warning threshold, as a percent of swap utilization.

• Paging threshold, as a percent of swap utilization.

Script Logic

This script opens with the typical initiation. It collects memory usage from /proc/meminfo and compares the utilization against the paging threshold first and then the warning threshold. Alerts are sent if necessary.

Steps:

1. Initiate the script.

2. Collect swap usage.

3. Check usage against the page threshold.

4. Check usage against the warn threshold.

5. Alert if necessary.

Response

The additional information section includes the output from top. This information can be useful toward investigating problems. You will want to login and determine the problem and any needed resolution.

3am response -- Verify that the database is active and healthy. Investigate locking and general performance metrics. Resolve issues as needed.

© 2022 XTIVIA

Confidential Page 23 of 23

Host Uptime Monitoring

This script checks how long the host has been up. Because an outage can occur from multiple causes and may not provide an opportunity to alert upon shutting down, this script alerts when the host comes back online. The script can be set to page or warn as needed. Because the cause of the outage may have consequences for Db2, it is necessary to verify that the databases are healthy after startup.

Name

os.uptime_mon.ksh

Type

Reactive -- This script alerts for problems that have occurred.

Settings

Because this script runs at the level of the OS, no parameters are required. You may use the parameter “help”, and the script will display usage information. The configuration file will allow you to customize the following variables:

• What email to use to send warnings.

• What email to use to send pages.

• How long, in days, to keep log files from this script.

• How long to wait, in minutes, before sending another alert.

• What response is needed for an alert (page vs. warn).

Script Logic

This script monitors the uptime of the host using the uptime command. If the host uptime is less than 15 minutes, it will send an alert. The type of alert will depend on the response setting in the script.cfg file.

1. Initiate the script.

2. Collect uptime.

3. Compare uptime to current time.

4. Alert if necessary.

Response

Since the cause of the outage has a variety of possibilities, the additional information section will not be useful for this alert. You will want to log in and confirm that Db2 is running and then you can investigate the cause.

3am response -- Verify that the database is active and healthy. Investigate locking and general performance metrics. Resolve issues as needed. Check if maintenance is occurring.

Comments

Popular posts from this blog

Oracle: To clean up WRI$_ADV_OBJECTS