Hi,
The scenario is like this
file1.csv
Customer_id,customer_name,parts_name,vehicle_name,parts_no
file2.csv
Customer_id,customer_name,vehicle_name,parts_no
Here in both the files, parts_no is the key field. Two files come from external systems. Both the files are dynamic and columns can be swapped for each run. Sometimes one field - parts_name can be missing in the file.
How can we handle this and proceed with our jobs.
1) Bringing both the files into a proper order
2) In case one field is not coming in either of the file reject the file saying improper format.
Can this be done exclsuively in datastage itself.
Please suggest.
Comparing two dynamic data files with header using datastage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 15
- Joined: Thu May 31, 2007 3:36 am
I'm not sure you can exclusively handle this through Datastage alone, instead I'd look to check the file format through scripting and produce a relevant schema file to use on input to the initial Datastage job reading the source in. Look at schema files and partial schemas.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
-
- Participant
- Posts: 15
- Joined: Thu May 31, 2007 3:36 am
Yes, I too thought of a similar fashion. Thought would have a reference_file.csv and compare the two data files - file1.csv and file2.csv.
Change the files into the format of reference_file.csv and proceed with our jobs.
But in case of one column not coming in the input file, i.e., parts_name is not coming in the file, I have to return the file back to the users, which I am not sure how to handle it.
file1.csv
Customer_id,customer_name,parts_name,vehicle_name,parts_no
1,abc,wheel,cycle,12
2,def,screw,car,15
file2.csv
Customer_id,customer_name,vehicle_name,parts_no
1,abc,cycle,12
2,frt,car,20
reference_file.csv
Customer_id
customer_name
parts_no
parts_name
vehicle_name
Here in file2.csv, the parts_name column is missing. Need some help to see how this can be handled in script. The file should be rejected.
Thanks.
Change the files into the format of reference_file.csv and proceed with our jobs.
But in case of one column not coming in the input file, i.e., parts_name is not coming in the file, I have to return the file back to the users, which I am not sure how to handle it.
Code: Select all
#!/bin/bash
DATAFILE=${1:-data.txt}
COLUMNFILE=${2:-list.txt}
awk -F, -v colsFile="$COLUMNFILE" '
BEGIN {
j=1
while ((getline < colsFile) > 0) {
col[j++] = $1
}
n=j-1;
close(colsFile)
for (i=1; i<=n; i++) s[col[i]]=i
}
NR==1 {
for (f=1; f<=NF; f++)
if ($f in s) c[s[$f]]=f
next
}
{ sep=""
for (f=1; f<=n; f++) {
printf("%c%s",sep,$c[f])
sep=FS
}
print ""
}
' "$DATAFILE"
Customer_id,customer_name,parts_name,vehicle_name,parts_no
1,abc,wheel,cycle,12
2,def,screw,car,15
file2.csv
Customer_id,customer_name,vehicle_name,parts_no
1,abc,cycle,12
2,frt,car,20
reference_file.csv
Customer_id
customer_name
parts_no
parts_name
vehicle_name
Here in file2.csv, the parts_name column is missing. Need some help to see how this can be handled in script. The file should be rejected.
Thanks.
For the handling part, unless I am misunderstanding something, it seems relatively simple. Does the required/expected number of columns not remain the same? If so, a count of the columns in the header can be checked. A failure to match can result in the process exiting as a failure, which you can handle as necessary to communicate this back to the user.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
-
- Participant
- Posts: 15
- Joined: Thu May 31, 2007 3:36 am