oracle DB workflows
At work, we currently have a couple workflows. They both:
1) grab zipped XML files from a FTP server through a proxy
2) check the names against a list of files already processed
3) unzip the unprocessed files
4) parse XML
5) put XML values into a flat file
6) move flat file to some remote network storage space (so Oracle can use it for its external tables)
We use a mixture of Linux, Java, the Oracle DB scheduler, bash script, and this unwieldly thing I want to kill. Back when we use to beam info directly to the DB from the Linux boxes, I might've heard an argument. As it sits now, it's shredding XML and flattening the values. I don't need or want it.
We process, at most, tens of gigs at once. Any Oracle folk out there (like catt) have a standard tool for this kind of thing? I'm leaning towards getting rid of Kettle/Kitchen and its shitty design tool and converting everything into Python using proxyhandler in urllib2 for (S)FTP and cElementTree for XML parsing.
Edited By Malcolm on 1413818684
1) grab zipped XML files from a FTP server through a proxy
2) check the names against a list of files already processed
3) unzip the unprocessed files
4) parse XML
5) put XML values into a flat file
6) move flat file to some remote network storage space (so Oracle can use it for its external tables)
We use a mixture of Linux, Java, the Oracle DB scheduler, bash script, and this unwieldly thing I want to kill. Back when we use to beam info directly to the DB from the Linux boxes, I might've heard an argument. As it sits now, it's shredding XML and flattening the values. I don't need or want it.
We process, at most, tens of gigs at once. Any Oracle folk out there (like catt) have a standard tool for this kind of thing? I'm leaning towards getting rid of Kettle/Kitchen and its shitty design tool and converting everything into Python using proxyhandler in urllib2 for (S)FTP and cElementTree for XML parsing.
Edited By Malcolm on 1413818684
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Funnily enough, we're evaluating Pentaho right now.
But everything you describe there should be really easy in Pentaho. I wrote a similar script in about 60 minutes that does everything you describe except shredding the XML.
Maybe pay for the real version of Pentaho, instead of community.
But everything you describe there should be really easy in Pentaho. I wrote a similar script in about 60 minutes that does everything you describe except shredding the XML.
Maybe pay for the real version of Pentaho, instead of community.
It's not me, it's someone else.
I'm sure it's easy if I can figure out the annoying visual designer.
Hah. I've got nothing except the XML parsing.
I wrote a similar script in about 60 minutes that does everything you describe except shredding the XML.
Hah. I've got nothing except the XML parsing.
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
We used to pipe the values into the DB with SQL loader. It would occasionally crash during data loads with no explanation, probably due to the shitty JDBC lib it uses. Our local Oracle guru said to use files and external tables. Data set is maybe 20M entities (in the largest set) 10-20ish gigs over all 3 data sets.
EDIT: Worst thing about Pentaho:
We run the Kettle/Kitchen jobs on our Linux box. We design the config files on a Windows box. Pentaho will swap in non-linux friendly escape values during the design. When uploading the config files back to linux, they explode. So I have to un-convert those values back to linux-friendly mode.
Edited By Malcolm on 1413826236
EDIT: Worst thing about Pentaho:
We run the Kettle/Kitchen jobs on our Linux box. We design the config files on a Windows box. Pentaho will swap in non-linux friendly escape values during the design. When uploading the config files back to linux, they explode. So I have to un-convert those values back to linux-friendly mode.
Edited By Malcolm on 1413826236
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Your dev and prod environments should be the same. You should not develop and deploy on separate OSes.
HAHAHAHAHA. You obviously work in a very different place.
Not aware that sqlldr uses JDBC, and is exceptionally reliable.
It's kitchen that uses sqlldr to bulk load data to oracle. From a few things I've read, sqlldr + kitchen + JDBC = occasional sockets randomly closing during the upload to DB.
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
TheCatt wrote:Malcolm wrote:Your dev and prod environments should be the same. You should not develop and deploy on separate OSes.
HAHAHAHAHA. You obviously work in a very different place.
VMs are cheap and easy.
This isn't a dev v. prod difference. Every environment, from dev to prod has linux and windows components running to the Oracle DB.
I'm leaning towards python if it can handle the retroactive work. We have a metric fuck-ton of memory on those servers that sits idle 99-100% of the time ... because that's how it's been "optimized."
Edited By Malcolm on 1413850922
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Malcolm wrote:This isn't a dev v. prod difference. Every environment, from dev to prod has linux and windows components running to the Oracle DB.
That statement does not square with this one
We run the Kettle/Kitchen jobs on our Linux box. We design the config files on a Windows box.
Pentaho is a very good tool for converting XML to flat files. Perhaps you should consider different developers?
Edited By TheCatt on 1413851691
It's not me, it's someone else.
Every environment has a linux and windows component. Due to the shitty GUI designer required for config file work (I'm sure as shit not doing it by hand) and the fact that our linux boxes at work are strictly command line, I'm forced into that crap. Every environment has a linux component and a windows component. Our DB workflows are a goddamn mess.
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."