Useful Perl Scripts I have written

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Useful Perl Scripts I have written

Post by jdmiceli »

Greetings to everyone in the DataStage community,

Every now and then we see requests for utilities that will work with .DSX files for replacing values, parameters or whatever, as well as concatenators and splitters. I have written several utilities in Perl that I use all the time and that actually started being used by other teams in the company I work for because they are easy to use.

They are usually run on Windows, though I did write some of them on AIX Unix before moving the files to Windows. They should run anywhere. All of the programs run non-destructively, meaning they create new files with the corrected data as opposed to overwriting your original file. This way, just in case there is a problem, you have not lost your source and you don't have to rename it to get it back into the original name.

All the programs have the full documentation for usage and purpose in the header and should be pretty clear. If you have any questions, please feel free to post them and I will do my best to answer. You are free to use these, though please bear in mind I accept no responsibility for anything that may come out of your use of them. If you improve the scripts in any way that will make it better for all of us, please either post it or send the changes to me so I can incorporate them into the originals.

If other people have scripts they would like to post, maybe we can just keep this thread going and people could save this thread in their favorites to be able to find it anytime they needed to. Just a thought.

I will post the scripts in separate messages with explanations on usage for each one following this.

Bestest!
Last edited by jdmiceli on Fri Sep 28, 2007 11:13 am, edited 1 time in total.
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Perl code to split jobs out

Post by jdmiceli »

This first script is a job splitter. It will take a single file in DataStage export format (generally .dsx though it doesn't really matter as long as the format is there) and split it into individual jobs that can be loaded. It will provide a proper header and everything needed to correctly import to DataStage.

It not only splits out job definitions, but also all other aspects of the process, such as routines, containers (that part still needs testing so maybe you can let me know if it works), tabledefs, etc. The documentation in the header is pretty self explanatory.

To fire it off from the command line, type perl DSX_Cutter.pl filename

Here is the code:

Code: Select all

#!/usr/local/bin/perl -w

###################################################################################################################
#
# Documentation Header
# 
###################################################################################################################
# Script Name: DSX_Cutter.pl
# Author     : John Miceli
# Create Date: 04/26/2007
#
# Purpose: This script is meant to easily take a single .dsx file as it was exported from DataStage and split it 
# 		into it's component parts.  There will be one job, routine, transform, etc., per file.  The file
# 		will be named after the Identifier for each DSRECORD within a section or each DSJOB with all 
# 		appropriate subrecords attached to it.
#
# Test results: 
#
# Operating command: perl DSX_Cutter.pl filename.dsx
#		  
# Output     : Individual files with the appropriate header for importing into DataStage for each component.
#
# Caveats    : I don't claim to be a Perl maven.  I'm sure someone smarter than me could have done this with
# 		fewer lines of code.  However, it works quickly and efficiently.  It is not complicated at all,
# 		hence the resulting performance.  Part of keeping it simple is that I may be missing some of the
# 		fluff found in lots of Perl code that probably should be there depending on your level of
# 		paranoia ;-)
#
#	Any and all input/improvements to the script are welcome.  Either email me at 'jmiceli@wrberkley.com' or
#	message 'jdmiceli' on DSXchange.com and I will get back to you if I can.
#
#
###################################################################################################################
#
# Disclaimer: This script is provided AS IS and I take no responsibility for it's use or misuse.  It's use is completely
#       the responsibility of YOU the user and is at your own risk.  I do not claim perfection, nor do I warrantee or
#       guarantee the operation of the script.  It appears to work and I am using it and that's all the further I
#       am willing to go!!
#       	
###################################################################################################################
#
# Code Block
# 
###################################################################################################################
use strict;
use warnings;


#  define scalars
my $dsx;  		# name of the file to be processed
my $value;    		# holds name of parameters file
my $line;      		# value for line for comparison
my @work;		# array to hold working rows
my @header;		# array to hold the header rows
my @jobs;		# array to hold the job rows
my @transforms;		# array to hold the transform rows
my @routines;		# array to hold the routines rows
my @tabledefs;		# array to hold the tabledefs rows
my @stagetypes;		# array to hold the stagetypes rows
my @datatypes;		# array to hold the datatypes rows
my @containers;		# array to hold the shared container rows
my @section;		# array to hold information for sectional parts of the dsx (transforms, routines, etc.)
my $element;		# element of the array being processed
my $tag;		# tag holder
my $on;			# off/on flag 
my $dsname;		# name to be assigned to the new file 
my $cnt;		# generic count for checking things
my $rowchk;		# counter for catching begin and end tags right next to each other
my $in_section;		# flag showing a routine file is being worked on


#
# initialize some things if needed
# 
$on = 0;
$in_section = 0;

#
# collect the filename from the input argument and create the working filename
# 
$dsx = "$ARGV[0]";

 open (OLD, "< $dsx")||die "Unable to open $dsx for reading!\n";

 while ($line = <OLD>)
 {
	chomp $line;

	push(@work, "$line\n");
}

close OLD;

# now that it is all in memory, parse out the sections into their own arrays
# for processing
foreach $element (@work)
{
	chomp $element;

	# determine which section we are in and flag it accordingly
	if ($element =~ /BEGIN HEADER/)
	{
		$on = 1;  ## flag for HEADER records
	}	
	elsif ($element =~ /BEGIN DSJOB/)
	{
		$on = 2;  ## flag for DSROUTINES records
	}
	elsif ($element =~ /BEGIN DSROUTINES/)
	{
		$on = 3;  ## flag for DSROUTINES records
	}
	elsif ($element =~ /BEGIN DSTRANSFORMS/)
	{
		$on = 4;  ## flag for DSTRANSFORMS records
	}
	elsif ($element =~ /BEGIN DSTABLEDEFS/)
	{
		$on = 5;  ## flag for DSTABLEDEFS records
	}
	elsif ($element =~ /BEGIN DSSTAGETYPES/)
	{
		$on = 6;  ## flag for DSSTAGETYPES records
	}
	elsif ($element =~ /BEGIN DSDATATYPES/)
	{
		$on = 7;  ## flag for DSDATATYPES records
	}
	elsif ($element =~ /BEGIN DSSHAREDCONTAINER/)
	{
		$on = 8;  ## flag for DSDATATYPES records
	}

	# separate out each section to a name array for it
	if ($on == 1)
	{
		push(@header, "$element\n");
		if ($element =~ /END HEADER/)
		{
			$on = 0;
		}
	}

	if ($on == 2)
	{
		push(@jobs, "$element\n");
		if ($element =~ /END DSJOB/)
		{
			$on = 0;
		}
	}
	
	if ($on == 3)
	{
		push(@routines, "$element\n");
		if ($element =~ /END DSROUTINES/)
		{
			$on = 0;
		}
	}
	
	if ($on == 4)
	{
		push(@transforms, "$element\n");
		if ($element =~ /END DSTRANSFORMS/)
		{
			$on = 0;
		}
	}
	
	if ($on == 5)
	{
		push(@tabledefs, "$element\n");
		if ($element =~ /END DSTABLEDEFS/)
		{
			$on = 0;
		}
	}
	
	if ($on == 6)
	{
		push(@stagetypes, "$element\n");
		if ($element =~ /END DSSTAGETYPES/)
		{
			$on = 0;
		}
	}
	
	if ($on == 7)
	{
		push(@datatypes, "$element\n");
		if ($element =~ /END DSDATATYPES/)
		{
			$on = 0;
		}
	}
	
	if ($on == 8)
	{
		push(@containers, "$element\n");
		if ($element =~ /END DSSHAREDCONTAINER/)
		{
			$on = 0;
		}
	}
	
}

##############################################################################################################
# process the jobs into their own individual files
##############################################################################################################

# DSJOB section
foreach $element(@jobs)
{
	chomp $element;
	push(@section,"$element\n");
	
	if (($element =~ /BEGIN DSJOB/) and ($in_section == 0))
	{
		# flag the jobs section as being active
		$in_section = 1;
	}
	elsif ($in_section == 1)
	{
		# check to see if we are at the end of the job
		if ($element =~ /END DSJOB/)
		{
			$cnt = scalar(@section);
			if ($cnt > 2)
			{
				# reset the in_section counter
				$in_section = 0;
				
				# extract the name of the file using the Identifier
				$value = $section[1];
				chomp $value;   # remove the end of line stuff
				$value =~ s/Identifier//g;
				$value =~ s/"//g;   # handle quotes
				$value =~ s/ //g;   # handle spaces
				$value =~ s/([\.])/_/g; # handle periods if they exist
				$value =~ s/([\\\\])/_/g; # handle double back slashes if they exist
				$value =~ s/([\/\/])/_/g; # handle double forward slashes if they exist
				$value =~ s/([\\])/_/g; # handle back slashes if they exist
				$value =~ s/([\/])/_/g; # handle forward slashes if they exist
				$value =~ s/([:])/_/g;  # handle colons if they exist

				# name the file
				$dsname = "DSJOB_" . "$value.dsx";
				print "Creating job file for $dsname\n";

				open (NEW, "> $dsname")|| die "Unable to create $dsname file!";

				# output the header to the file
				foreach $line(@header)
				{
					chomp $line;
					print NEW "$line\n";
				}
				
				# output the lines of this particular job
				print NEW "BEGIN DSJOB\n";
				foreach $line(@section)
				{
					next if (($line =~ /BEGIN DSJOB/) or ($line =~ /END DSJOB/));
					chomp $line;
					print NEW "$line\n";
				}
				print NEW "END DSJOB\n";
				
				# when done pushing the lines out, close the file
				close NEW;
			}

			# if the count was 2 or less or has been processed, then ditch what is in the array
			@section = ();
			$value = '';
		}
	}
}

# DSTRANSFORMS section
$cnt = 0;

# get rid of dupes and empty pairs
@work= ();
foreach $element(@transforms)
{
	chomp $element;
	next if (($element =~ /BEGIN DSTRANSFORMS/) or ($element =~ /END DSTRANSFORMS/));
	push(@work, "$element\n");
}
@transforms = ();
@transforms = @work;
@work = ();

foreach $element(@transforms)
{
	chomp $element;
	push(@section,"$element\n");
	
	if (($element =~ /BEGIN DSRECORD/) and ($in_section == 0))
	{
		# flag the jobs section as being active
		$in_section = 1;
	}
	elsif ($in_section == 1)
	{
		# check to see if we are at the end of the job
		if ($element =~ /END DSRECORD/)
		{
			$cnt = scalar(@section);
			if ($cnt > 2)
			{
				# reset the in_section counter
				$in_section = 0;
				
				# extract the name of the file using the Identifier
				# the identifier could be at position 1 or 2 depending on 
				# where we are in the process
				
				# finding the first identifier - this is needed if there is a trucation of
				# some sort. 
				#
				# NEED TO MAKE THIS MORE ROBUST!!!!
				if ($section[1] =~ /Identifier/)
				{
					$value = $section[1];
				}
				elsif ($section[2] =~ /Identifier/)
				{
					$value = $section[2];
				}
				elsif ($section[3] =~ /Identifier/)
				{
					$value = $section[3];
				}
				elsif ($section[4] =~ /Identifier/)
				{
					$value = $section[4];
				}
				elsif ($section[5] =~ /Identifier/)
				{
					$value = $section[5];
				}

				chomp $value;   # remove the end of line stuff
				$value =~ s/Identifier//g;
				$value =~ s/"//g;   # handle quotes
				$value =~ s/ //g;   # handle spaces
				$value =~ s/([\.])/_/g; # handle periods if they exist
				$value =~ s/([\\\\])/_/g; # handle double back slashes if they exist
				$value =~ s/([\/\/])/_/g; # handle double forward slashes if they exist
				$value =~ s/([\\])/_/g; # handle back slashes if they exist
				$value =~ s/([\/])/_/g; # handle forward slashes if they exist
				$value =~ s/([:])/_/g;  # handle colons if they exist

				# name the file
				$dsname = "DSTRANSFORMS_" . "$value.dsx";
				print "Creating transform file for $dsname\n";

				open (NEW, "> $dsname")|| die "Unable to create $dsname file!";

				# output the header to the file
				foreach $line(@header)
				{
					chomp $line;
					print NEW "$line\n";
				}
				
				# output the lines of this particular job
				print NEW "BEGIN DSTRANSFORMS\n";
				foreach $line(@section)
				{
					chomp $line;
					print NEW "$line\n";
				}
				print NEW "END DSTRANSFORMS\n";
				
				# when done pushing the lines out, close the file
				close NEW;
			}

			# if the count was 2 or less or has been processed, then ditch what is in the array
			@section = ();
			$value = '';
		}
	}
}

# DSROUTINES section
# 
# get rid of dupes and empty pairs
@work= ();
foreach $element(@routines)
{
	chomp $element;
	next if (($element =~ /BEGIN DSROUTINES/) or ($element =~ /END DSROUTINES/));
	push(@work, "$element\n");
}
@routines = ();
@routines = @work;
@work = ();

foreach $element(@routines)
{
	chomp $element;
	push(@section,"$element\n");
	
	if (($element =~ /BEGIN DSRECORD/) and ($in_section == 0))
	{
		# flag the jobs section as being active
		$in_section = 1;
	}
	elsif ($in_section == 1)
	{
		# check to see if we are at the end of the job
		if ($element =~ /END DSUBINARY/)
		{
			$cnt = scalar(@section);
			if ($cnt > 2)
			{
				# reset the in_section counter
				$in_section = 0;
				
				# extract the name of the file using the Identifier
				# the identifier could be at position 1 or 2 depending on 
				# where we are in the process
				
				# finding the first identifier - this is needed if there is a trucation of
				# some sort. 
				#
				# NEED TO MAKE THIS MORE ROBUST!!!!
				if ($section[1] =~ /Identifier/)
				{
					$value = $section[1];
				}
				elsif ($section[2] =~ /Identifier/)
				{
					$value = $section[2];
				}
				elsif ($section[3] =~ /Identifier/)
				{
					$value = $section[3];
				}
				elsif ($section[4] =~ /Identifier/)
				{
					$value = $section[4];
				}
				elsif ($section[5] =~ /Identifier/)
				{
					$value = $section[5];
				}

				chomp $value;   # remove the end of line stuff
				$value =~ s/Identifier//g;
				$value =~ s/"//g;   # handle quotes
				$value =~ s/ //g;   # handle spaces
				$value =~ s/([\.])/_/g; # handle periods if they exist
				$value =~ s/([\\\\])/_/g; # handle double back slashes if they exist
				$value =~ s/([\/\/])/_/g; # handle double forward slashes if they exist
				$value =~ s/([\\])/_/g; # handle back slashes if they exist
				$value =~ s/([\/])/_/g; # handle forward slashes if they exist
				$value =~ s/([:])/_/g;  # handle colons if they exist

				# name the file
				$dsname = "DSROUTINES_" . "$value.dsx";
				print "Creating routine file for $dsname\n";

				open (NEW, "> $dsname")|| die "Unable to create $dsname file!";

				# output the header to the file
				foreach $line(@header)
				{
					chomp $line;
					print NEW "$line\n";
				}
				
				# output the lines of this particular job
				print NEW "BEGIN DSROUTINES\n";
				foreach $line(@section)
				{
					# print "$line\n" if (($line =~ /BEGIN DSROUTINES/) or ($line =~ /END DSROUTINES/));
					chomp $line;
					print NEW "$line\n";
				}
				print NEW "END DSROUTINES\n";
				
				# when done pushing the lines out, close the file
				close NEW;
			}

			# if the count was 2 or less or has been processed, then ditch what is in the array
			@section = ();
			$value = '';
		}
	}
}

# DSTABLEDEFS section
#
# get rid of dupes and empty pairs
@work= ();
foreach $element(@tabledefs)
{
	chomp $element;
	next if (($element =~ /BEGIN DSTABLEDEFS/) or ($element =~ /END DSTABLEDEFS/));
	push(@work, "$element\n");
}
@tabledefs = ();
@tabledefs = @work;
@work = ();

foreach $element(@tabledefs)
{
	chomp $element;
	push(@section,"$element\n");
	
	if (($element =~ /BEGIN DSRECORD/) and ($in_section == 0))
	{
		# flag the jobs section as being active
		$in_section = 1;
	}
	elsif ($in_section == 1)
	{
		# check to see if we are at the end of the job
		if ($element =~ /END DSRECORD/)
		{
			$cnt = scalar(@section);
			if ($cnt > 2)
			{
				# reset the in_section counter
				$in_section = 0;
				
				# extract the name of the file using the Identifier
				# the identifier could be at position 1 or 2 depending on 
				# where we are in the process
				
				# finding the first identifier - this is needed if there is a trucation of
				# some sort. 
				#
				# NEED TO MAKE THIS MORE ROBUST!!!!
				if ($section[1] =~ /Identifier/)
				{
					$value = $section[1];
				}
				elsif ($section[2] =~ /Identifier/)
				{
					$value = $section[2];
				}
				elsif ($section[3] =~ /Identifier/)
				{
					$value = $section[3];
				}
				elsif ($section[4] =~ /Identifier/)
				{
					$value = $section[4];
				}
				elsif ($section[5] =~ /Identifier/)
				{
					$value = $section[5];
				}

				chomp $value;   # remove the end of line stuff
				$value =~ s/Identifier//g;
				$value =~ s/"//g;   # handle quotes
				$value =~ s/ //g;   # handle spaces
				$value =~ s/([\.])/_/g; # handle periods if they exist
				$value =~ s/([\\\\])/_/g; # handle double back slashes if they exist
				$value =~ s/([\/\/])/_/g; # handle double forward slashes if they exist
				$value =~ s/([\\])/_/g; # handle back slashes if they exist
				$value =~ s/([\/])/_/g; # handle forward slashes if they exist
				$value =~ s/([:])/_/g;  # handle colons if they exist

				# name the file
				$dsname = "DSTABLEDEFS_" . "$value.dsx";
				print "Creating tabledef file for $dsname\n";

				open (NEW, "> $dsname")|| die "Unable to create $dsname file!";

				# output the header to the file
				foreach $line(@header)
				{
					chomp $line;
					print NEW "$line\n";
				}
				
				# output the lines of this particular job
				print NEW "BEGIN DSTABLEDEFS\n";
				foreach $line(@section)
				{
					chomp $line;
					print NEW "$line\n";
				}
				print NEW "END DSTABLEDEFS\n";
				
				# when done pushing the lines out, close the file
				close NEW;
			}

			# if the count was 2 or less or has been processed, then ditch what is in the array
			@section = ();
			$value = '';
		}
	}
}

# DSSTAGETYPES section
#
# get rid of dupes and empty pairs
@work= ();
foreach $element(@stagetypes)
{
	chomp $element;
	next if (($element =~ /BEGIN DSSTAGETYPES/) or ($element =~ /END DSSTAGETYPES/));
	push(@work, "$element\n");
}
@stagetypes = ();
@stagetypes = @work;
@work = ();

foreach $element(@stagetypes)
{
	chomp $element;
	push(@section,"$element\n");
	
	if (($element =~ /BEGIN DSRECORD/) and ($in_section == 0))
	{
		# flag the jobs section as being active
		$in_section = 1;
	}
	elsif ($in_section == 1)
	{
		# check to see if we are at the end of the job
		if ($element =~ /END DSRECORD/)
		{
			$cnt = scalar(@section);
			if ($cnt > 2)
			{
				# reset the in_section counter
				$in_section = 0;
				
				# extract the name of the file using the Identifier
				# the identifier could be at position 1 or 2 depending on 
				# where we are in the process
				
				# finding the first identifier - this is needed if there is a trucation of
				# some sort. 
				#
				# NEED TO MAKE THIS MORE ROBUST!!!!
				if ($section[1] =~ /Identifier/)
				{
					$value = $section[1];
				}
				elsif ($section[2] =~ /Identifier/)
				{
					$value = $section[2];
				}
				elsif ($section[3] =~ /Identifier/)
				{
					$value = $section[3];
				}
				elsif ($section[4] =~ /Identifier/)
				{
					$value = $section[4];
				}
				elsif ($section[5] =~ /Identifier/)
				{
					$value = $section[5];
				}

				chomp $value;   # remove the end of line stuff
				$value =~ s/Identifier//g;
				$value =~ s/"//g;   # handle quotes
				$value =~ s/ //g;   # handle spaces
				$value =~ s/([\.])/_/g; # handle periods if they exist
				$value =~ s/([\\\\])/_/g; # handle double back slashes if they exist
				$value =~ s/([\/\/])/_/g; # handle double forward slashes if they exist
				$value =~ s/([\\])/_/g; # handle back slashes if they exist
				$value =~ s/([\/])/_/g; # handle forward slashes if they exist
				$value =~ s/([:])/_/g;  # handle colons if they exist

				# name the file
				$dsname = "DSSTAGETYPES_" . "$value.dsx";
				print "Creating stagetypes file for $dsname\n";

				open (NEW, "> $dsname")|| die "Unable to create $dsname file!";

				# output the header to the file
				foreach $line(@header)
				{
					chomp $line;
					print NEW "$line\n";
				}
				
				# output the lines of this particular job
				print NEW "BEGIN DSSTAGETYPES\n";
				foreach $line(@section)
				{
					next if (($line =~ /BEGIN DSSTAGETYPES/) or ($line =~ /END DSSTAGETYPES/));
					chomp $line;
					print NEW "$line\n";
				}
				print NEW "END DSSTAGETYPES\n";
				
				# when done pushing the lines out, close the file
				close NEW;
			}

			# if the count was 2 or less or has been processed, then ditch what is in the array
			@section = ();
			$value = '';
		}
	}
}

# DSDATATYPES section
#
# get rid of dupes and empty pairs
@work= ();
foreach $element(@datatypes)
{
	chomp $element;
	next if (($element =~ /BEGIN DSDATATYPES/) or ($element =~ /END DSDATATYPES/));
	push(@work, "$element\n");
}
@datatypes = ();
@datatypes = @work;
@work = ();

foreach $element(@datatypes)
{
	chomp $element;
	push(@section,"$element\n");
	
	if (($element =~ /BEGIN DSRECORD/) and ($in_section == 0))
	{
		# flag the jobs section as being active
		$in_section = 1;
	}
	elsif ($in_section == 1)
	{
		# check to see if we are at the end of the job
		if ($element =~ /END DSRECORD/)
		{
			$cnt = scalar(@section);
			if ($cnt > 2)
			{
				# reset the in_section counter
				$in_section = 0;
				
				# extract the name of the file using the Identifier
				# the identifier could be at position 1 or 2 depending on 
				# where we are in the process
				
				# finding the first identifier - this is needed if there is a trucation of
				# some sort. 
				#
				# NEED TO MAKE THIS MORE ROBUST!!!!
				if ($section[1] =~ /Identifier/)
				{
					$value = $section[1];
				}
				elsif ($section[2] =~ /Identifier/)
				{
					$value = $section[2];
				}
				elsif ($section[3] =~ /Identifier/)
				{
					$value = $section[3];
				}
				elsif ($section[4] =~ /Identifier/)
				{
					$value = $section[4];
				}
				elsif ($section[5] =~ /Identifier/)
				{
					$value = $section[5];
				}

				chomp $value;   # remove the end of line stuff
				$value =~ s/Identifier//g;
				$value =~ s/"//g;   # handle quotes
				$value =~ s/ //g;   # handle spaces
				$value =~ s/([\.])/_/g; # handle periods if they exist
				$value =~ s/([\\\\])/_/g; # handle double back slashes if they exist
				$value =~ s/([\/\/])/_/g; # handle double forward slashes if they exist
				$value =~ s/([\\])/_/g; # handle back slashes if they exist
				$value =~ s/([\/])/_/g; # handle forward slashes if they exist
				$value =~ s/([:])/_/g;  # handle colons if they exist

				# name the file
				$dsname = "DSDATATYPES_" . "$value.dsx";
				print "Creating datatypes file for $dsname\n";

				open (NEW, "> $dsname")|| die "Unable to create $dsname file!";

				# output the header to the file
				foreach $line(@header)
				{
					chomp $line;
					print NEW "$line\n";
				}
				
				# output the lines of this particular job
				print NEW "BEGIN DSDATATYPES\n";
				foreach $line(@section)
				{
					next if (($line =~ /BEGIN DSDATATYPES/) or ($line =~ /END DSDATATYPES/));
					chomp $line;
					print NEW "$line\n";
				}
				print NEW "END DSDATATYPES\n";
				
				# when done pushing the lines out, close the file
				close NEW;
			}

			# if the count was 2 or less or has been processed, then ditch what is in the array
			@section = ();
			$value = '';
		}
	}
}

# DSSHAREDCONTAINER section
#
# get rid of dupes and empty pairs
@work= ();
foreach $element(@containers)
{
	chomp $element;
	next if (($element =~ /BEGIN DSSHAREDCONTAINER/) or ($element =~ /END DSSHAREDCONTAINER/));
	push(@work, "$element\n");
}
@containers = ();
@containers = @work;
@work = ();

foreach $element(@containers)
{
	chomp $element;
	push(@section,"$element\n");
	
	if (($element =~ /BEGIN DSSHAREDCONTAINER/) and ($in_section == 0))
	{
		# flag the jobs section as being active
		$in_section = 1;
	}
	elsif ($in_section == 1)
	{
		# check to see if we are at the end of the job
		if ($element =~ /END DSSHAREDCONTAINER/)
		{
			$cnt = scalar(@section);
			if ($cnt > 2)
			{
				# reset the in_section counter
				$in_section = 0;
				
				# extract the name of the file using the Identifier
				# the identifier could be at position 1 or 2 depending on 
				# where we are in the process
				
				# finding the first identifier - this is needed if there is a trucation of
				# some sort. 
				#
				# NEED TO MAKE THIS MORE ROBUST!!!!
				if ($section[1] =~ /Identifier/)
				{
					$value = $section[1];
				}
				elsif ($section[2] =~ /Identifier/)
				{
					$value = $section[2];
				}
				elsif ($section[3] =~ /Identifier/)
				{
					$value = $section[3];
				}
				elsif ($section[4] =~ /Identifier/)
				{
					$value = $section[4];
				}
				elsif ($section[5] =~ /Identifier/)
				{
					$value = $section[5];
				}

				chomp $value;   # remove the end of line stuff
				$value =~ s/Identifier//g;
				$value =~ s/"//g;   # handle quotes
				$value =~ s/ //g;   # handle spaces
				$value =~ s/([\.])/_/g; # handle periods if they exist
				$value =~ s/([\\\\])/_/g; # handle double back slashes if they exist
				$value =~ s/([\/\/])/_/g; # handle double forward slashes if they exist
				$value =~ s/([\\])/_/g; # handle back slashes if they exist
				$value =~ s/([\/])/_/g; # handle forward slashes if they exist
				$value =~ s/([:])/_/g;  # handle colons if they exist

				# name the file
				$dsname = "DSSHAREDCONTAINER_" . "$value.dsx";
				print "Creating shared container file for $dsname\n";

				open (NEW, "> $dsname")|| die "Unable to create $dsname file!";

				# output the header to the file
				foreach $line(@header)
				{
					chomp $line;
					print NEW "$line\n";
				}
				
				# output the lines of this particular job
				print NEW "BEGIN DSSHAREDCONTAINER\n";
				foreach $line(@section)
				{
					next if (($line =~ /BEGIN DSSHAREDCONTAINER/) or ($line =~ /END DSSHAREDCONTAINER/));
					chomp $line;
					print NEW "$line\n";
				}
				print NEW "END DSSHAREDCONTAINER\n";
				
				# when done pushing the lines out, close the file
				close NEW;
			}

			# if the count was 2 or less or has been processed, then ditch what is in the array
			@section = ();
			$value = '';
		}
	}
}


Bestest!
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Job concatenator

Post by jdmiceli »

Many times there are tons of jobs that need to be moved from one server to another or fixed at the file level and then re-imported. If you have ever done an import, you know it only seems to do one file at a time (unless I'm just brain dead and haven't found a way to do it! :shock: )

This very easy to use program will take all DSX files and make them one file called 'all.txt'. Then you could just import that one file and life is good.

To use this one just drop the program into the parent directory of all the files you want to concatenate. Then either double-click the program from the Windows Explorer or type perl DSX_Concatenator.pl at the command line.

PLEASE NOTE: This particular program traverses any directories below the parent directory and pulls those files in as well! I did this because our Release Management team often times has many directories worth of files to pull in and I wanted to make it easier for them!

I could add a switch to turn that functionality off if anyone needs it.

Code: Select all

#!/usr/local/bin/perl -w

###############################################################################
# Program name: DSX_Concatenator.pl
#
# Purpose: This program is meant to take a directory structure (Windows only
# 	   for now) and parse through the entire tree finding any files with
# 	   a .dsx extension.  It will then take all these files and concatenate
# 	   them into a single file at the starting directory level called 
# 	   all.txt for import into DataStage.
#
# Usage: Put this program in the parent directory for the project and type:
# 	   DSX_Concatenator.pl
#
# 	 The program will then traverse the directory tree and any files
# 	 with the extension .dsx will be added to a single file named all.txt
#
# History:
# 	Developer	Date	   	Note
# 	---------       ---------- 	--------------
# 	jmiceli		06-07-2007 	Creation date.
#
#
###############################################################################
 use File::Find;

 print "Building directory and file tree...\n\n\n\n";

 open (NEW, "> all.txt") || die "could not create file\n";

 find(\&cleanup, ".");

 foreach $list(@raw)
 {
	 chomp $list;

	 $list =~ s/\//\\/g;

	 push(@files,"$list\n");
 }

 # print "@files\n";

 foreach $list(@files)
 {
	 chomp $list;
	 print "working on $list\n";

	 open (OLD, "< $list") || die "could not open file for reading";

	 foreach $line(<OLD>)
	 {
		 chomp $line;
		 # print "line: $line\n";

		 print NEW "$line\n";
	 }
	 close OLD;
 }

 close NEW;


 # Subroutine that determines whether we matched the file extensions.
 sub cleanup {
   if (/\.dsx$/)
   {
	    push(@raw, "$File::Find::name\n");
   }

 }

Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

sorting script

Post by jdmiceli »

I also do a lot of table name extraction to a file or whatever and then use VI or Vim to do substitutions to add the rest of what I want in front of and behind the names. Sometimes I forgot to order things, so I wrote this little sorting script with the option of sorting ascending or descending. This is only designed to handle files with one word per line. It should work with numbers as well, though I'm not sure if it will order them numerically or lexically.

From the command line just type perl sorter.pl filename (A or D).

Code: Select all

#!/usr/local/bin/perl -w
###################################################################################################################
# jdm
#  Script: Sorter.pl
#  Purpose: This script will take a file with one word per line and sort it in alphabetical order, either 
#  		ascending or descending order.  It will do it in a non-destructive manner by creating a separate file.
#  		
#  Usage: perl Sorter.pl origfile <A or D>  (ie: perl Sorter.pl filename A)
# 
#  	origfile  =  name of the original file
#
#  Files created: sorted_origfile
#
#  History:	Date		Developer	Comments
#  		06/06/2007	jmiceli		original creation of the program
#
###################################################################################################################  	

use strict;
use warnings;

# define variables
my $oldfile;
my $fixfile;
my $order;
my $line;
my @array;
my @sorted;
my $string;


# get the parameters and create the filenames
$oldfile = "$ARGV[0]";
$order = "$ARGV[1]";

print "order: $order\n";


$fixfile = "sorted_" . $oldfile;

print "sorted: $fixfile\n";
# open files for processing
open (OLD, "< $oldfile")|| die "Could not open $oldfile for parsing!\n";
open (NEW, "> $fixfile")|| die "Could not open $fixfile for parsing!\n";

while ($line = <OLD>)
{
	chomp $line;

	push(@array, "$line\n");
}

if ($order eq 'A')
{
	@sorted = sort(@array);
}
else
{
	@sorted = reverse sort(@array);
}

foreach $line(@sorted)
{
	chomp $line;

	print NEW "$line\n";
}


close OLD;
close NEW;

Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

unix to dos converter

Post by jdmiceli »

This perl utility is a simple unix to dos converter. I wish I could take complete credit for it, but it is a pretty basic item and most of the code is lifted from a book (though I don't remember right off which one to give proper credit). Just know I am not claiming it solely as my own, though I did add the file backup part of the script. :lol:

Usage is a simple perl unix2dos.pl directoryname.

Code: Select all

#!/usr/bin/perl -w

# A script to convert Unix files to Windoze line termination format.
# WARNING: THIS SCRIPT DOES NOT CHECK FOR PRE-EXISTING
#		FILE, USE WITH CAUTION

# usage: unix2dos <DIRECTORY>
#  Here is the single line version of this for a file: 
#		perl -p -i.bak -e 's/\n/\r\n/;' filename

$file = shift @ARGV || die "No filename passed in!\n$!\n";

$| = 1;
$linesFixed = 0;

foreach( $file ) 
{	
	print "$_\t";
	open(INPUT, "<$_");
	rename( $_, "$_.bak") || die "Unable to rename $_\n$!\n";
	open(OUTPUT, ">$_");
	while(<INPUT>)
	{
		if ( s/\n/\r\n/ )
		{
			$linesFixed++;
		}
		print OUTPUT;
	}
}
continue
{
	print "($linesFixed)\n";
	$linesFixed = 0;
	close INPUT;
	close OUTPUT;
}

Last edited by jdmiceli on Mon Mar 23, 2009 8:36 am, edited 2 times in total.
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

dos to unix perl script

Post by jdmiceli »

Here is the last scripts opposite, turning dos files to unix terminations.

Usage is perl dos2unix.pl directoryname.

Code: Select all


#!/usr/bin/perl -w

# A script to convert Windoze files to Unix line termination format.
# WARNING: THIS SCRIPT DOES NOT CHECK FOR PRE-EXISTING
#		FILE, USE WITH CAUTION

# usage: dos2unix <DIRECTORY>
#  Here is the single line version of this for a file: 
#		perl -p -i.bak -e 's/\r\n/\n/;' filename

$file = shift @ARGV || die "No filename passed in!\n$!\n";

$| = 1;
$linesFixed = 0;

foreach( $file ) 
{	
	print "$_\t";
	open(INPUT, "<$_");
	rename( $_, "$_.bak") || die "Unable to rename $_\n$!\n";
	open(OUTPUT, ">$_");
	while(<INPUT>)
	{
		if ( s/\r\n/\n/ )
		{
			$linesFixed++;
		}
		print OUTPUT;
	}
}
continue
{
	print "($linesFixed)\n";
	$linesFixed = 0;
	close INPUT;
	close OUTPUT;
}

Last edited by jdmiceli on Mon Mar 23, 2009 8:36 am, edited 1 time in total.
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Change parameter block completely in export file

Post by jdmiceli »

This perl script is great when it comes time for a team like Release Management to release your new code to a different environment. This may be needed to change your default parameters for all jobs in the export file (unless, of course, you are using one of a several job control methods where you pass them all in). This is useful if you ever have to debug a problem on the production system because you won't have to go into the Parameters section to change all the Dev/Integration/testing settings that may be there.

This one does take a little prep work though. What you do is create a job for each of your environments that has the correct parameters set up. There will be no stages in these jobs. Save them somewhere so you can access them easily.

When you are ready to replace your parameter blocks, export all the jobs needed into one file, then export the template you made into it's own separate file (just the template for the environment these will be going to). Use VI or your editor of choice to edit the template file and remove everything above and below the parameter block (line 1 of the file down to and including the line 'Parameters "CParameters" ' for the top part and everything from the line reading 'NULLIndicatorPosition "0" ' to the bottom of the file). Save the file after removing as instructed.

Now, you are ready to run the script.
Type perl ReplaceParameters.pl filename.dsx param_filename. This will create a new file called Final_filneme.dsx, which will have the new parameters in place.

Used in conjunction with the other utilities above, our source control and release management software (StarTeam), this process works out pretty slick. Once it is done the first time, generally you don't have to change much, so error potential drops.

Here is the code:

Code: Select all

#!/usr/local/bin/perl -w

###################################################################################################################
#
# Documentation Header
# 
###################################################################################################################
# Script Name: ReplaceParameterBlock.pl
# Author     : John Miceli
# Create Date: 10/10/2006
#
# Purpose: This script is meant to quickly and easily replace parameter blocks in a DataStage export file (.dsx)
#          with the contents of file prepared in advance with the correct parameters.  This is a non-destructive
#          process that will not affect your original export file.  Instead, the process will create two additional
#          files (see Output section).
#
# Test results: I tested this script against an export file consisting of 212 extract jobs.  The file size was
#		just under 18.2 megabytes in size.  The script correctly processed the file in 6 seconds.
#
# Operating command: perl ReplaceParameters.pl filename.dsx parm_filename
# 		1. filename.dsx - this is the name of the export file (in .dsx export file)
# 		2. parm_filename - this is the name of the file that contains the correct parameters you are 
# 				    wishing to place in the export file
#
# Output     : 1. Original file (filename.dsx) - will remain untouched and is used for reading purposes only
# 	       2. changed_filename.dsx - will contain the original script minus the parameter blocks.  These 
# 	          			will be replaced by a place holder called 'PARAMETERS_GO_HERE'.
# 	       3. final_filename.dsx - will contain the finished product, ready to be imported
#
# Caveats    : I don't claim to be a Perl maven.  I'm sure someone smarter than me could have done this with
# 		fewer lines of code.  However, it works quickly and efficiently.  It is not complicated at all,
# 		hence the resulting performance.  Part of keeping it simple is that I may be missing some of the
# 		fluff found in lots of Perl code that probably should be there depending on your level of
# 		paranoia ;-)
#		
#		Assumptions made:
# 		I have assumed the user will include both parameters and that the correct filenames will be
# 		in the correct place.  If a filename is missing, then the script will simply puke.  Call
# 		me lazy, but I'm telling you how to use it, so just do it that way and don't do it any other;-).
# 		I have also assumed that the user is DataStage proficient enough to do the following (this 
# 		section also happens to be the instructions for using this script):
# 		1. Create a DS job that has nothing but the correct parameters defined
# 		2. Export the single parameter job to a .dsx file
# 		3. Use VI or editor of choice to remove everything above and below the parameter block 
# 		   (line 1 of the file down to and including the line 'Parameters "CParameters" ' for the top
# 		   part and everything from the line reading 'NULLIndicatorPosition "0" ' to the bottom of the
# 		   file).
# 		4. Save the leftover parameter block as some filename you can modify and use (I used 'parms').
# 		5. Export any number of jobs into an export file (.dsx format) that need the new parameters (I'll
# 		    refer to this as 'filename.dsx').
# 		6. Put this script into your path somewhere (if you haven't already done it) or into a working
# 		   directory you have access to.  Place the other two files in a working directory and issue the
# 		   following command (without the single quotes): 'perl ReplaceParameters.pl filename.dsx parms'
# 		   This will fire off the script.  When it is done, your new parameters will be in a .dsx file
# 		   prepended with 'final_' and will have your original filename tacked on the end 
# 		   (ie.: final_filename.dsx).
#
# 		!!!NOTE!!!   
# 		If there were any previously used parameters that had different names than this current parameter
# 		   scheme, you will need to fix these (they are referenced in different places in each stage).  
# 		   I have distributed with this script another called 'FixParameters.pl' which will allow 
# 		   you to replace parameters in the original file with new ones you manually map either one 
# 		   at a time or with a map file you will need to create.  See the header documentation in that file 
# 		   for the format of that mapping file.   
#
#	Any and all input/improvements to the script are welcome.  Either email me at 'jmiceli@wrberkley.com' or
#	message 'jdmiceli' on DSXchange.com and I will get back to you if I can.
#
#
###################################################################################################################
#
# Disclaimer: This script is provided AS IS and I take no responsibility for it's use or misuse.  It's use is completely
#       the responsibility of YOU the user and is at your own risk.  I do not claim perfection, nor do I warrantee or
#       guarantee the operation of the script.  It appears to work and I am using it and that's all the further I
#       am willing to go!!
#       	
###################################################################################################################
#
# Code Block
# 
###################################################################################################################

use strict;
use warnings;


#  define scalars
my $filename;  		# name of the file to be processed
my $parms;    		# holds name of parameters file
my $out_flag;  		# flag that will determine if something gets printed
my $changed_filename; 	# this will be the working file
my $final_file; 	# final file name for finished merged document 
my $line;      		# value for line for comparison
my $metaflag;		# flag for overriding the $out_flag if there is meta data to put in place
my @meta;		# array for collecting MetaBag rows
my $metel;		# individual element holder for the array elements

#
# collect the filename from the input argument and create the working filename
# 
$filename = "$ARGV[0]";
$parms = "$ARGV[1]";
$changed_filename = "changed_$filename";
$final_file = "final_$filename";

 open (OLD, "< $filename")||die "Unable to open $filename for reading!\n";
 open (NEW, "> $changed_filename")|| die "Unable to open $changed_filename for writing!\n";

#
# initialize out flag to normal mode: 0=normal line, 1=skip line
$out_flag = 0;
$metaflag = 0;

 while ($line = <OLD>)
 {
	chomp $line;

	# 
	#  Pre-empt the parameters section and load into an array for MetaBag stuff
	#
	if (($line =~ /MetaBag/ or $metaflag == 1) and ($out_flag == 1))
	{
		push (@meta, "$line");
		$metaflag = 1;

		if ($line =~ /END DSSUBRECORD/)
		{
			push (@meta, "$line");
			$metaflag = 0;
		}
	}
	
	#
	# see if the line is the beginning of a parameters section
	# 
	if ($line =~ /CParameters/)
	{
		print NEW "$line\n";
		$out_flag = 1;
	} 
	#
	# see if it is the end of the parameters section
	# 
	elsif ($line =~ /NULLIndicatorPosition/ and $metaflag == 0) 
	{
		print NEW "PARAMETERS_GO_HERE\n";

		foreach $metel (@meta)
		{
			unless ($metel =~ /END DSSUBRECORD/)
			{
				print NEW "$metel\n";	
			}
			

		}
		$metaflag = 0;
		$out_flag = 0;
		@meta = ();
		print NEW "      END DSSUBRECORD\n";
	}

	#
	# final check
	#
	if ($out_flag == 0 && $metaflag == 0)
	{
		print NEW "$line\n";
	}
 }

close OLD;
close NEW;

#
# reopen the changed file and insert new parameters at 'PARAMETERS_GO_HERE' placeholder
# 
open (OLD, "< $changed_filename");
open (NEXT, "> $final_file");

while ($line = <OLD>)
{
	chomp $line;

	if ($line =~ /PARAMETERS_GO_HERE/)
	{
		open (CHANGED, "< $parms");

		while (<CHANGED>)
		{
			chomp;
			print NEXT "$_\n";
		}
		close CHANGED;
	} else
	{
		print NEXT "$line\n";
	}
}

close OLD;
close NEXT;
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Pivot script for text file

Post by jdmiceli »

This script will take a text file and pivot the values on the first field. It assumes the header is the first row of the file and that the first field in each row is the key.

All you have to do is put the program in the directory where your file is (or vice versa) and pass it the name of the file, the name of the file you want created to hold the pivoted data and the delimiter you used in the file. When specifying the delimiter, put it in double quotes and use standard pattern matching notation ("\t" = tab, "\n" newline, etc.).

I created a test file with ten values per key and 15400 rows in the file. This script processed it in about three seconds. Hopefully it will help you!

Code: Select all


#!/usr/local/bin/perl

##########################################################################
#  program name: Pivot.pl
#  Author: John Miceli
#  Date: 10/05/2007
#
#  Purpose: To take a flat file with header row and pivot the values
#           based on the first value in a line being the key to the row.
#
#  Usage: perl Pivot.pl sourcefile targetfile delimiter
#       ie: perl Pivot.pl route.txt p_route.txt "\t" (specifies tab)
##########################################################################

use strict;
use warnings;

my ($inputfile, $outputfile, $delimiter, $cnt, $line, $elementcnt);
my ($element, $next, $key, $field, $value);
my (@array, @processing, @headers, @hproc);
my (%positions);

#initialize stuff
$cnt = 0;
$elementcnt = 0;
$next = 0;

$inputfile = "$ARGV[0]";
$outputfile = "$ARGV[1]";
$delimiter = "$ARGV[2]";

open(IN, "< $inputfile") || die "Could not open file $inputfile for reading.\n";
open(OUT, "> $outputfile") || die "Could not open file $outputfile for reading.\n";

# print the headers to the output file using tab delimiter
print OUT "KEY\tFIELD\tVALUE\n";

while ($line = <IN>)
{
    chomp $line;
    $cnt++;

    if ($cnt == 1)
    {
        @headers = split(/$delimiter/, $line);
        $elementcnt = (scalar(@headers)-1);  # get count of headers in file

        foreach $element(@headers)
        {
            if ($element =~ /KEY/)
            {
                next;
            }
            $positions{$next} = $element;
            $next++;
        }

        # get rid of the delimiter elements in the array
        foreach $element(@headers)
        {
            chomp $element;

            if ($element =~ /$delimiter/)
            {
                # do nothing
            }
            else
            {
                push(@hproc,$element);
            }
        }
        next;
    }

    @array = split(/$delimiter/, $line);

    # get rid of the delimiter elements in the array
    foreach $element(@array)
    {
        chomp $element;

        if ($element =~ /$delimiter/)
        {
            # do nothing
        }
        else
        {
            push(@processing,$element);
        }
    }
    $key = shift(@processing);

    # now start building the pivot
    for (my $i=0; $i<= ($elementcnt-1); $i++)
    {
        $field = $positions{$i};
        $value = shift(@processing);
        print OUT "$key\t$field\t$value\n";
    }

}

close IN;
close OUT;

Bestest,
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Here is a sample batch clean up script. Right now, I have a process that runs each night for 18 different companies using the same DataStage jobs. I pass in parameters that identify the companies and the batches they are on, but the same thing happens for all of them.

Since I am moving a fair amount of data each night, I need to be a good housekeeper to prevent my sysadmins from getting mad at me about disk space.

So, this script will take a batch number, the path for your project, the project name and a company identifier to run. See the header documentation for a sample.

This is coded to work in a Unix environment and I have only tested it there. I just didn't have time to make it work for both Windoze and Unix. It also assumes some things such as Ken Bland's directory structure, though it shouldn't matter since there is little hard coding of directory names (excepting 'archive' and 'scripts'). Anywho, don't know if it will help, but feel free to use or modify to suit your needs.

Code: Select all

#!/usr/bin/perl -w
################################################################################
#       program name: BatchCleanup.pl
#       Author      : John Miceli
#       Create Date : 10/24/2007
#       Purpose     : This program is designed to clean up files created during the
#                       running of whatever jobs are used in a batch.  It will
#                       create a tarball holding all the files, zip it and then
#                       go back through and delete the files for the batch.
#                       The script will also delete any backup files older than
#                       5 days as this should be long enough Veritas to backup
#                       the files.
#       Usage       : The program uses the following syntax to run:
#          perl BatchCleanup.pl <BatchNum> <"project_path"> <"project name"> <corp_id>
#                       where
#                       BatchNum = the number of the current batch
#                       project_path = fully qualified path through the version number
#                       project_name = name of the project
#                       corp_id = the company number for this run
#       ie: perl BatchCleanup.pl 12345 "/datastage/bts/dev/ldr/r1" "ldr" 30
################################################################################

use strict;
use File::Find;
no warnings 'File::Find';

my $currentfile;            # holds current file to examine
my $tarcommand;             # Process any command line parameters
my $path;
my $tempdir;
my $nextdir;
my $entry;
my $file;
my $gzipcmd;
my @files;
my @dirs;

# process any command line arguments when this script is called.
my $batch       = $ARGV[0]; # the batch that is being processed
my $root        = $ARGV[1]; # the project path (/datastage/dev/bts/xxx/rX)
my $base        = $ARGV[2]; # basename of the project alirR1, ldr, etc.
my $corp_id     = $ARGV[3]; # company number to avoid stepping on other processes

print "batch: $batch\n";
print "root: $root\n";
print "batch: $batch\n";
print "root: $root\n";
print "Base: $base\n";
print "CorpID: $corp_id\n";

# if the path passed in is not absolute, make it so
if (substr($root,0,1) =~ /0-9a-zA-Z/)
{
    $path = '/' . "$root";
}
else
{
    $path = $root;
}

# set some filenames and paths
my $outfile = $corp_id . "_" . "tmpprocessing" . "_" . $batch . ".txt";
my $archivedir = "$path/archive/";
my $tarfile = $corp_id . "_" . $base . "_" . $batch . ".tar";
my $delbatch = $batch - 5;

# clear the tar files older than 5 batches
chdir($archivedir) || die "Could not change directories to $archivedir\n";
opendir (A,$archivedir);
while (defined($file = readdir(A)))
{
    next if $file =~ /^\.\.?$/;

    if (($file <= $delbatch) and ($file =~ /gz/))  # ditch gzip extentions
    {
        unlink($file);
    }
}
closedir(A);
chdir($path);


# this anonymous sub will get the file listings for all directories and files
find sub {push(@files,($File::Find::name, -d && '/', "\n"))}, $path;

# output list to file for processing
open (F, "> $archivedir$outfile") || die "Unable to open $outfile for writing\n";

my $cnt = 0;

# now process each line keeping only the things that are pertinent to the batch
foreach $entry(@files)
{
    chomp $entry;

    # get rid of any entries for the archive directory
    next if ($entry =~ /archive/);
    next if ($entry =~ /scripts/);

    if ($entry =~ /$batch/)
    {
        print "$entry\n";
        print F "$entry\n";
        $cnt++;
    }
}

# if there are no rows in the file, then exit
if ($cnt < 1)
{
        print "No files to process!\n";
        exit;
}

$tarcommand = "tar cvRfL $tarfile $outfile";
$gzipcmd = "gzip -v -9 $tarfile";

print "Tarring up the files for this batch...\n";
# my $tarstat = system($tarcommand);

print "Zipping tar file for this batch...\n";
# my $zipstat = system($gzipcmd);

# Now we need to go back through and remove the files that were just backed up
chdir $root || die "Could not move to the root directory\n";

opendir (D, "$root");
@dirs = grep !/^\.\.?$/, readdir(D);
closedir D;

foreach $entry(@dirs)
{
    chomp $entry;
    next if ($entry eq "archive");

    $nextdir = "$root/$entry";
    chdir "$nextdir" || die "Unable to change to $nextdir\n";

    opendir (D,$nextdir) || die "Could not open $nextdir for reading!\n";
    while (defined($file = readdir(D) ))
    {
        next if $file =~ /^\.\.?$/;

        if ($file =~ /$batch/)
        {
                print "Removing $nextdir/$file and all contents\n";
                my $deldir = system("rm -r $file");
                next;
        }

    }
}
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

This will allow the user to change anything in a file, item by item. It says it is for parameters, but if it can be identified, it can be changed. It is also not really limited to .dsx files, but that is its primary purpose.

Once again, follow the instructions in the header to use this utility.

Code: Select all


#!/usr/local/bin/perl -w

###################################################################################################################
#
# Documentation Header
# 
###################################################################################################################
# Script Name: FixIndividualParameters.pl
# Author     : John Miceli
# Create Date: 10/10/2006
#
# Purpose: This script is meant to easily replace parameters that have the wrong parameter name.
# 	    This will usually occur after using the Perl script 'ReplaceParameters.pl' to replace large blocks
# 	    of parameters in an export file from DataStage.  This script could also be used just to change 
# 	    default values for a set of parameters as opposed to just parameter names.  It's pretty flexible 
# 	    since it is just doing pattern matching and replacements.  
#
# 	    This script is non-destructive and will not affect your original file in any way.  A new file will
# 	    be created that prepends 'Fixed_' in front of the original filename.  Technically, this program
# 	    will work on any file, not just a .dsx file.  It really doesn't care.  I'm only using the export
# 	    for the example here because that is what I originally wrote the program for making mass changes
# 	    to export files for DS.  
#
# 	    The number of rows processed will print by 10,000 row counts, spitting out a final figure at 
# 	    the end.
#
# 	    NOTE!!!: I wrote this to be as platform independent as possible, so it won't care if you are on
# 	    Unix or Windows.  This is intentional and I realize by doing so, I passed on using Perl features
# 	    that could have simplified it greatly and probably sped things up tremendously by processing the 
# 	    entire file at once by using backticks and 'sed'.  However, 'sed' isn't available on all systems
# 	    (pronounced Windoze) so that type of pattern matching and replacing is not available without 3rd
# 	    party tools.  Maybe the version of the MKS Toolkit that comes with DataStage would be available 
# 	    for that use, but I'm ignorant of it's use.  Feel free to enlighten me and I'll rewrite this.
#
# Test results: I have run several tests with this script thus far on an export file with 716,844 lines in it.
# 		Running the program in 'I' mode processed the file in 8 seconds.  Running the program in 'B'
# 		mode with a batch_file containing 3 pairs to substitute took 35 seconds.  A batch_file containing
# 		XX pairs processed the test export file in XX seconds.  Your results may vary.
#
# Operating command: perl FixParameters.pl 'I' filename.dsx 'old_parm' 'new_parm' (Individual mode)
# 			or
# 		     perl FixParameters.pl 'B' filename.dsx batch_filename (Batch mode)
# 		     
# 		1. operation mode ('I' or 'B') - there are two modes of operation:
# 			Individual mode ('I') - allows you to pass in a pair of parameters, the first is the
# 						old parameter and the second is the new parameter.  The program
# 						will replace any instances of the old with the new
# 			Batch mode ('B') - allows you to prepare a file ahead of time that contains all the
# 					    'old=>new' parameter pairs to be processed.  The program will then 
# 					    load the file into a hash and process the pairs until finished.
# 		2. filename.dsx - this is the name of the export file (.dsx extension) to be processed
# 		3. batch_filename - this is the name of the file that contains pairs of 'old=>new' parameter
# 					values to be processed.  This file is prepared ahead of time and
# 					follows the same general format as insert values into a 
# 					hash (associative array).  The only difference is that you don't put 
# 					in the hash assignment as you would in Perl. 
# 					 
# 					An example would be as follows:
# 						"SourceUserID" => "SrcUserID"
# 						"SourceSystem" => "SrcDSN"
# 						"SourcePassword" => "SrcPwd"
# 					"E:\\datastage\\Dev\\LDR\\r1\\hash\\" => "//datastage//dev//ldr//r1//hash//"
#					"E:\\datastage\\Dev\\common\\r1\\hash\\" => "//datastage//dev//common//r1//hash//"

#		  Some rules to follow when creating the batch_file:
#		  	A. Type exactly as it will be found in a 'whole word'.  As above for directory structures 
#		  	   and if there are no quotes around the term, don't put them there.
#		  	B. The program is pretty literal.  It is smart enough to handle fixing the '\' or '/'
#		  	   characters for matching purposes, so you don't need to change anything in that respect.
#		  	   I'm certain there is something I've missed, so code will have to be changed for those
#		  	   odd pattern matching items that could be in a DS field.
#		  	C. One pair per line with no ending comma or punctuation.  
#		  	D. When typing in the ' => ' make sure it is exactly like that: one space, the =>, and one space.
#		  That's basically it for the rules for that file.
#		  
# Output     : 1. Original file (ie: filename.dsx) - will remain untouched and is used for reading purposes only
# 	       2. Fixed_filename.dsx - will contain the original script with the changed parameter names.
# 	       3. If the process ever breaks (nah, couldn't happen!), you may see a file called 'ff.tmp'.  This is 
# 	          a temp file that is used during the process, and then removed when the rename is done over the 
# 	          top of the Fixed_filename.
#
# Caveats    : I don't claim to be a Perl maven.  I'm sure someone smarter than me could have done this with
# 		fewer lines of code.  However, it works quickly and efficiently.  It is not complicated at all,
# 		hence the resulting performance.  Part of keeping it simple is that I may be missing some of the
# 		fluff found in lots of Perl code that probably should be there depending on your level of
# 		paranoia ;-)
#		
#		Assumptions made:
# 		I have assumed the user will include both parameters and that the correct filenames will be
# 		in the correct place.  If a filename is missing, then the script will simply puke.  Call
# 		me lazy, but I'm telling you how to use it, so just do it that way and don't do it any other ;-).
# 		I have also assumed that the user is DataStage proficient enough to do the following (this 
# 		section also happens to be the instructions for using this script):
# 		1. Once the 'ReplaceParameters.pl' script has been run to get the parameter block in the correct
# 		   places, then it is time to take care of individual problems.
# 		2. This script could also be used just to change default values for a set of parameters as
# 		   opposed to just parameter names.  It's pretty flexible and is just doing pattern matching
# 		   and replaces.
# 		3. Either prepare a file in advance with the pairs of 'old=>new' items to change or map the
# 		   few items you need to fix and run in your mode of choice. An example is given in Item 3 of
# 		   the Operating Command section above.  There is no limit to the number of pairs that
# 		   can be processed and the program doesn't really care if they are parameters or not.
# 		4. Run the program using the commands listed above.  You will find your changes in a file
# 		   that prepends "Fixed_" in front of the original filename.  This process will not affect the
# 		   original file in any way.  It is only used for non-destructive read.
#
#
#	Any and all input/improvements to the script are welcome.  Either email me at 'jmiceli@wrberkley.com' or
#	message 'jdmiceli' on DSXchange.com and I will get back to you if I can.
#
#
###################################################################################################################
#
# Disclaimer: This script is provided AS IS and I take no responsibility for it's use or misuse.  It's use is completely
#       the responsibility of YOU the user and is at your own risk.  I do not claim perfection, nor do I warrantee or
#       guarantee the operation of the script.  It appears to work and I am using it and that's all the further I
#       am willing to go!!
#       	
###################################################################################################################
#
# Code Block
# 
###################################################################################################################
use strict;
use warnings;

#  define scalars
my $filename;  		# name of the file to be processed
my $parms;    		# holds name of parameters file
my $fixed_filename; 	# this will be the working file
my $line;      		# value for line for comparison
my %parms;		# hash for holding the file contents for fixing
my $mode;		# mode for running. Values: 'I'=individual pairs, 'B'=batch processing using file for pairs
my $oldparm; 		# holds old value
my $newparm;		# holds new value

#
# collect the filename from the input argument and create the working filename
# 
$mode = "$ARGV[0]";
if ($mode eq "'I'")
{
	print "Mode is 'I'\n";
} 
elsif ($mode eq "'B'")
{
	print "Mode is 'B'\n";
} 
else
{
	die "Incorrect mode! Must be I or B in single quotes.\n";
}

$filename = "$ARGV[1]";
$fixed_filename = "Fixed_$filename";

if ($mode=~ /'I'/)
{
	$oldparm = "$ARGV[2]";
	$newparm = "$ARGV[3]";

	#
	# check to see if there are slashes of either flavor in the scalar, then put in the escape for it
	#
	if ($oldparm =~ /\\/ || $oldparm =~ /\/\//)
	{
		$oldparm =~ s#//#////#g;
		$oldparm =~ s#\\#\\\\#g;
	}
	$parms{"$oldparm"} = "$newparm";
}

if ($mode=~ /'B'/)
{
	$parms = "$ARGV[2]";
}


#
# see if anything was passed into $parms and add the values to the $str scalar
# 
 
 if (defined ($parms))
 {
	open(PARMS, "< $parms");
	
	while ($line = <PARMS>)
	{
		chomp $line;
		
		($oldparm, $newparm) = split (/ => /, $line);
		#
		# check to see if there are slashes of either flavor in the scalar, then put in the escape for it
		#
		if ($oldparm =~ /\\/ || $oldparm=~ /\/\//)
		{
			$oldparm =~ s#//#////#g;
			$oldparm =~ s#\\#\\\\#g;
		}
		
		$parms{$oldparm} = $newparm;
		# print "Key: $oldparm, Value: $newparm\n";  # uncomment to see the output for debugging
	
	}
	close PARMS;	
}

#
# time to go through the input file and create the backup file
#

open(OLD, "<$filename");
open(NEW, ">$fixed_filename");

while ($line = <OLD>)
{
	chomp $line;
	print NEW "$line\n";
}

close OLD;
close NEW;

#
# make the mods to $fixed_filename
#
my $tmp = ("ff.tmp");
my $cnt = 0;

open(TMP, "> $tmp");
open(FIX, "< $fixed_filename");

while ($line = <FIX>)
{
	chomp $line;
	$cnt++;
	study $line;  				  # for longer comparisons and replaces, this should help

	while (($oldparm, $newparm) = each %parms)
	{
		$line =~ s/$oldparm/$newparm/g; # used different delimiters since slashes can be in the parms
		#print "Key: $oldparm, Value: $newparm\n";  # uncomment for debugging purposes
	}

	if ($cnt % 10000 == 0)
	{
		print "$cnt lines processed.\n";
	}
	
	print TMP "$line\n";
}

close TMP;
close FIX;

print "Total of $cnt lines processed.\n";

rename("$tmp", "$fixed_filename");

Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Please edit your post and wrap all of the code in

Code: Select all

[/b] tags (as John did) - this will preserve the formatting.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Prade
Participant
Posts: 3
Joined: Mon Jun 01, 2009 9:44 am

DSX file creation

Post by Prade »

I am trying to create an automation tool which can create Data Stage jobs outside Data Stage. This tool will be used to generate similar kinds of mappings quickly. Is there a possibility to generate the .dsx file or the XML file outside Data stage thorugh some scripting techniques. The dsx or the xml file generated will be pulled into data stage and the job is ready.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

It is easier to create a template job and export then have your process modify the DSX. Rip out the column names and table names and add in the new ones. I have a DataStage job which does this and it works great. Been using it for years.
Mamu Kim
throbinson
Charter Member
Charter Member
Posts: 299
Joined: Wed Nov 13, 2002 5:38 pm
Location: USA

Post by throbinson »

I'd do all that in a Excel workbook. Create the mappings and template and spit it out as a dsx framework file and import into Datastage. Give the whole process a cool sounding name like Quicktrack or something similiar.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Or FasterTrack. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply