Monday, November 3, 2014

Files as data sources - Part 2 - Keeping track

In the last post I spoke about how to time your job with the file's arrival. In this post I will talk about how keep track of your files by keeping some basic metadata, and also how to relate the data in your target destination back to the original file.

Here again is the IsFileAvailable function I am using to wait for the files:

$L_FileFound = wait_for_file($P_FileDirectory || $P_FileName, $P_TimeOut,$P_Interval , -1, $L_FileNames, $L_FileListSize, ',');

if ($L_FileFound = 1)
begin
   $L_Counter = 1;
   WHILE ($L_Counter <= $L_FileListSize)
   begin
      FileNew($P_FileDirectory, word_ext($L_FileNames,$L_Counter,','));
      $L_Counter = $L_Counter + 1;
   end
end

Return $L_FileFound;

Once I have found a file, or files, I then run a routine to step through each of the file names. I use the word_ext function to choose which file name in the list I want to access, and then a counter to step through the list.

For each file name I then call a function I wrote called FileNew:

#Get time the file was created
$L_FileDateTime = get_file_attibute($P_FullFilePathAndName, 'date_created')

#Get the file type eg .txt .csv .pipe
$L_Type = word_ext($L_FileName,2,'.');

#Get new file id
$L_File_ID = sql('Datawarehouse','Select max(FILE_ID) from SYS_FILE') + 1

#Insert a new record into the SYS_FILENAME table
SQL('Datawarehouse','INSERT INTO SYS_FILE (FILE_ID, DI_FILENAME, FILE_DATETIME, FILE_PROCESS_DATETIME, TYPE, STATUS, AUDIT_ID)
VALUES ([$L_File_ID]{$P_FullFilePathAndName}, to_date({$L_FileDateTime}, \'YYYY/MM/DD hh24:mi:ss\'), sysdate,  {$L_Type},  \'STARTED\', [$G_AuditID])');

Return $L_File_ID;

What I am doing here is firstly gathering some metadata about the file such as the date it was created and the type of file it is, and then I am inserting that data in a table called SYS_FILE which is where I keep track of all the files processed. I also assign a file_id by adding 1 to the max file_id currently in the table.

I set the status of the file to STARTED, to signify that we have started processing this file. Later on I will either set the file to COMPLETE or ERROR.

Here is what the data in the file looks like:


You could always expand the number of columns in the SYS_FILE table to store all sorts of additional metadata about the file such as its source, how it was delivered or who is responsible for it.

Now that I have created a record in the SYS_FILE table for a file, how do I link that to the records that are coming through from the file?

First of all you need to go into your file object and set the source information on the file as follows:


There are similar options on XML and Excel files too.

What this does is it adds an extra column at the end of your file that will store the file name along with its path. This is particularly important if you are receiving and processing multiple files at once because DS will batch process all the files through at the same time. So lets say you are looking for File* and the files received are File1, File2, and File3. DS will process all 3 files together at once. It will include the file name that a row of data came from in the DI_FILENAME column, and we will use that information to tie the data back to the specific row we have created for that file.

In order to tie up the row of data with the correct SYS_FILE entry, we use thelookup_ext function as follows:


So now that we have our file and have processed it, we need to decide what to do with the file. I'll be covering archiving and updating the SYS_FILE table to show the completed files in the next post.


No comments:

Post a Comment