Page 1 of 1

ETL Modeling suggestions

Posted: Fri May 18, 2018 10:42 am
by jackson.eyton
Hi, I just wanted to get some pointers or opinions on what you guys use for relaying ETL instructions from the data architect. Currently we have these setup in a spreadsheet and its not the easiest thing to navigate and/or understand. I'm investigating FastTrack but I'm not positive on it. Our modeler works from source to target, where on the ETL side we work from target backwards. I'd love to hear any suggestions.

Posted: Fri May 18, 2018 11:12 am
by qt_ky
I once worked in a shop with an ETL Architect who relied on a combination of Word and Excel documents. I have not worked an any place that has embraced FastTrack. Personally, I prefer logical source to target (reading from left to right) and always felt that documenting the other way was backwards, especially because it doesn't match how you actually implement the DataStage jobs. Where I am now we just rely on the whiteboard to sketch out ETL job designs. It's much more productive! If someone were to need a formal document, we can simply go into the job(s) in Designer and create a job report. Boom! Document generated.

Posted: Fri May 18, 2018 11:56 am
by chulett
Back when we had an ETL Architect and a ton of new development going on, we also relied on "Source to Target" mapping spreadsheets to officially document the process. Most of them weren't really needed by the developers as we had templates that handled most of the scenarios we had. I mention them because they were the "backwards" kind and actually mapped Target to Source in spite of their names... which worked for us. :)

Posted: Fri May 18, 2018 12:54 pm
by jackson.eyton
Yea, we are doing in a combination of both methods. Our architect models a particular table from our old database into the new model. That table from the old world will populate several different fields in many different destination tables. So from his perspective in the sheet he adds a source, and writes his version of the ETL definitions for each source column in that source table. On the ETL side, once the modeling is done for a given change/addition. We then limit the sheet by the destination entity and add in the appropriate sources and unions to populate that destination. I have two issues with the way we do it now, one is that it being a spreadsheet, I cannot alter or amend the ETL rules that the modeler has done with what we actually had to do without checking out the file, which is often in use. This leads to the changes being forgotten and over time we have ETL rules that only semi apply. So when I, the newer guy, attempt to work off the sheet I am often confused by discrepancies between the ETL job and the modeling worksheet. Secondly is that it is REALLY difficult to get a big picture view of source to target. We're playing with creating a list view in sharepoint to at least allow more dynamic multiparty alterations, but I feel like there must be a better way to do this.

Posted: Sat May 19, 2018 7:57 pm
by ray.wurlod
FastTrack is OK, though limited to "standard" operations.

For simple jobs, the job design can be generated directly from FastTrack. For more complex jobs the same is true, but the generated job will need to be tweaked by an ETL developer.

We also make use of the fact that, after a business analyst profiles the data and adds notes (in Information Analyzer), the analysis results are published into XMETA so that, when the ETL developer imports the shared table definition into the DataStage project, those notes are available.

Posted: Mon May 21, 2018 11:49 am
by eostic
The Job generation piece of FastTrack can be challenging to reconcile and isn't used much, but one nice part about FastTrack is that it can be used to see the lineage, before you have coded anything in DataStage (or any other tooling or language, for that matter). I've seen sites use FastTrack for doing the conceptual mapping that ultimately would yield java code and others... It's aging, but still works well for less technical users who want drag/drop of the metadata names for relational tables that are already in the repository.

Ernie

Posted: Wed May 23, 2018 8:15 am
by jackson.eyton
Hmm I went through one of IBMs training/demo documents for FastTrack and tried to replicate one of our jobs that we already have. I could not get the lookup entered correctly as we use a joined source lookup, meaning our lookup query is a join of multiple classification tables and the join in FastTrak did not appear to support multi source.

Posted: Wed May 23, 2018 2:11 pm
by UCDI
We use FastTrack for mapping columns and the rules that define the mapping. Nothing else, but its been great for just that one task of keeping the mapping public, up to date, centralized, etc. The tool is terribly slow, but that is becoming the norm, opening Excel in Office 365 takes 10-20 times longer to open a file than previous versions (and its on a faster pc!) too.

But for us modeling and architect instructions are different from the mapping instructions. When the database models change, we have an email communication and a public HTML map of the model that we can study.

Ill be honest, and you will laugh... I export our FastTrack mappings to Excel and from that to a CSV file which I then open in an actual high performance, awesome tool... called... Notepad++

Posted: Thu May 24, 2018 6:31 am
by eostic
:D Notepad ++ ...great tool --- and sometimes, it is even too slow when compared to good 'ol regular Notepad.

Posted: Thu May 24, 2018 7:43 am
by jackson.eyton
HAHAHA, ok I only laughed because I use n++ all the time, just not for mapping... yet... UCDI, you mentioned HTML maps of the model, is that something that your architect puts out via Data Architect?

Posted: Thu May 24, 2018 11:15 am
by UCDI
I shamefully do not know what tool generates our HTML models. I can find out. Its *probably* an IBM tool, since we are heavily invested there, but it could be something else entire.

Unrelated..
Regular notepad chugs up on large files. The Gun version of this does not, so I use that one sometimes. N++ default features and performance are well balanced, but some plugins slow it way down.

Posted: Thu May 24, 2018 11:33 am
by chulett
More unrelated... we use UltraEdit here.