Many installations are still running IMS because IMS is able to position the next output operation to a dataset at the correct place after an abend followed by a restart. The typical situation is that a program is reading and updating data in DB2 and at the same time writing into a dataset. I assume that all installations ensures data integrity and concurrency for DB2 programs updating data by issuing COMMIT regularly. Unfortunately MVS has no built in feature to COMMIT datasets. This is where IMS takes over in many installations. IBM has developed a tool called RRS (Recoverable Resource Manager) which can handle COMMIT on datasets to replace IMS in this case. A few installations has developed their own tools to avoid using either IMS, RRS or other products.
For a long time I have thought about how difficult it was to make a simple restart facility for datasets instead of using various expensive products. My needs were to build a job that, if it failed, should be able to restart using the exact same JCL as it was originally submitted with. Such jobs are normally the best kind to build for execution in a production environment. My job consisted of the following components:
This approach worked perfectly. I was just a bit annoyed about the fact that I had to write a program to perform the copying. This may be done using DFSORT or other similar products. How you will do it is entirely up to you. There are a lot of ways to do it.
There is one important detail you must be aware of. When the processing program breaks down you must try to gain control so you can close the output dataset. If you do not gain control there is a (very small) risk that MVS does not flush all records to disc and in such a situation your output dataset might end up containing fewer records than indicated by your restart table. It is therefore important that the copying procedure is able to detect such a situation and terminate further processing.
The sketched approach for restart may of course also be used for programs not using DB2. Then you have to find another way to store the information about how much processing has been performed.