oracle DB workflows

Malcolm
Posts: 32040
Joined: Fri May 21, 2004 1:04 pm
Location: Minneapolis

Post by Malcolm »

At work, we currently have a couple workflows. They both:

1) grab zipped XML files from a FTP server through a proxy
2) check the names against a list of files already processed
3) unzip the unprocessed files
4) parse XML
5) put XML values into a flat file
6) move flat file to some remote network storage space (so Oracle can use it for its external tables)

We use a mixture of Linux, Java, the Oracle DB scheduler, bash script, and this unwieldly thing I want to kill. Back when we use to beam info directly to the DB from the Linux boxes, I might've heard an argument. As it sits now, it's shredding XML and flattening the values. I don't need or want it.

We process, at most, tens of gigs at once. Any Oracle folk out there (like catt) have a standard tool for this kind of thing? I'm leaning towards getting rid of Kettle/Kitchen and its shitty design tool and converting everything into Python using proxyhandler in urllib2 for (S)FTP and cElementTree for XML parsing.




Edited By Malcolm on 1413818684
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
TheCatt
Site Admin
Posts: 53976
Joined: Thu May 20, 2004 11:15 pm
Location: Cary, NC

Post by TheCatt »

Funnily enough, we're evaluating Pentaho right now.

But everything you describe there should be really easy in Pentaho. I wrote a similar script in about 60 minutes that does everything you describe except shredding the XML.

Maybe pay for the real version of Pentaho, instead of community.
It's not me, it's someone else.
Malcolm
Posts: 32040
Joined: Fri May 21, 2004 1:04 pm
Location: Minneapolis

Post by Malcolm »

I'm sure it's easy if I can figure out the annoying visual designer.
I wrote a similar script in about 60 minutes that does everything you describe except shredding the XML.

Hah. I've got nothing except the XML parsing.
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
TheCatt
Site Admin
Posts: 53976
Joined: Thu May 20, 2004 11:15 pm
Location: Cary, NC

Post by TheCatt »

I would say something about external table use being dumb... but I assume that the data set is small, and therefore not a performance problem?
It's not me, it's someone else.
Malcolm
Posts: 32040
Joined: Fri May 21, 2004 1:04 pm
Location: Minneapolis

Post by Malcolm »

We used to pipe the values into the DB with SQL loader. It would occasionally crash during data loads with no explanation, probably due to the shitty JDBC lib it uses. Our local Oracle guru said to use files and external tables. Data set is maybe 20M entities (in the largest set) 10-20ish gigs over all 3 data sets.

EDIT: Worst thing about Pentaho:
We run the Kettle/Kitchen jobs on our Linux box. We design the config files on a Windows box. Pentaho will swap in non-linux friendly escape values during the design. When uploading the config files back to linux, they explode. So I have to un-convert those values back to linux-friendly mode.




Edited By Malcolm on 1413826236
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
TheCatt
Site Admin
Posts: 53976
Joined: Thu May 20, 2004 11:15 pm
Location: Cary, NC

Post by TheCatt »

Your dev and prod environments should be the same. You should not develop and deploy on separate OSes.

Not aware that sqlldr uses JDBC, and is exceptionally reliable. I question your Oracle person. I've never had problems with it, using it for more than a decade.
It's not me, it's someone else.
Malcolm
Posts: 32040
Joined: Fri May 21, 2004 1:04 pm
Location: Minneapolis

Post by Malcolm »

Your dev and prod environments should be the same. You should not develop and deploy on separate OSes.

HAHAHAHAHA. You obviously work in a very different place.

Not aware that sqlldr uses JDBC, and is exceptionally reliable.

It's kitchen that uses sqlldr to bulk load data to oracle. From a few things I've read, sqlldr + kitchen + JDBC = occasional sockets randomly closing during the upload to DB.
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
TheCatt
Site Admin
Posts: 53976
Joined: Thu May 20, 2004 11:15 pm
Location: Cary, NC

Post by TheCatt »

I've not used sqlldr via Pentaho so far, but why not just call it as a script with parameters?
It's not me, it's someone else.
TheCatt
Site Admin
Posts: 53976
Joined: Thu May 20, 2004 11:15 pm
Location: Cary, NC

Post by TheCatt »

Malcolm wrote:
Your dev and prod environments should be the same. You should not develop and deploy on separate OSes.
HAHAHAHAHA. You obviously work in a very different place.
VMs are cheap and easy.
It's not me, it's someone else.
GORDON
Site Admin
Posts: 54559
Joined: Sun Jun 06, 2004 10:43 pm
Location: DTManistan
Contact:

Post by GORDON »

Image
"Be bold, and mighty forces will come to your aid."
TheCatt
Site Admin
Posts: 53976
Joined: Thu May 20, 2004 11:15 pm
Location: Cary, NC

Post by TheCatt »

I was expecting you to post "Your MOM is cheap and easy"
It's not me, it's someone else.
GORDON
Site Admin
Posts: 54559
Joined: Sun Jun 06, 2004 10:43 pm
Location: DTManistan
Contact:

Post by GORDON »

I fell asleep before I could think of it.
"Be bold, and mighty forces will come to your aid."
Malcolm
Posts: 32040
Joined: Fri May 21, 2004 1:04 pm
Location: Minneapolis

Post by Malcolm »

TheCatt wrote:
Malcolm wrote:
Your dev and prod environments should be the same. You should not develop and deploy on separate OSes.

HAHAHAHAHA. You obviously work in a very different place.

VMs are cheap and easy.

This isn't a dev v. prod difference. Every environment, from dev to prod has linux and windows components running to the Oracle DB.

I'm leaning towards python if it can handle the retroactive work. We have a metric fuck-ton of memory on those servers that sits idle 99-100% of the time ... because that's how it's been "optimized."




Edited By Malcolm on 1413850922
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Malcolm
Posts: 32040
Joined: Fri May 21, 2004 1:04 pm
Location: Minneapolis

Post by Malcolm »

GORDON wrote:Image
No, that's the image for AS/400 discussions.
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
GORDON
Site Admin
Posts: 54559
Joined: Sun Jun 06, 2004 10:43 pm
Location: DTManistan
Contact:

Post by GORDON »

Malcolm wrote:
GORDON wrote:Image
No, that's the image for AS/400 discussions.
Which, out of good taste, I do not initiate.
"Be bold, and mighty forces will come to your aid."
TheCatt
Site Admin
Posts: 53976
Joined: Thu May 20, 2004 11:15 pm
Location: Cary, NC

Post by TheCatt »

Malcolm wrote:This isn't a dev v. prod difference. Every environment, from dev to prod has linux and windows components running to the Oracle DB.

That statement does not square with this one
We run the Kettle/Kitchen jobs on our Linux box. We design the config files on a Windows box.

Pentaho is a very good tool for converting XML to flat files. Perhaps you should consider different developers?




Edited By TheCatt on 1413851691
It's not me, it's someone else.
Malcolm
Posts: 32040
Joined: Fri May 21, 2004 1:04 pm
Location: Minneapolis

Post by Malcolm »

Every environment has a linux and windows component. Due to the shitty GUI designer required for config file work (I'm sure as shit not doing it by hand) and the fact that our linux boxes at work are strictly command line, I'm forced into that crap. Every environment has a linux component and a windows component. Our DB workflows are a goddamn mess.
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
TheCatt
Site Admin
Posts: 53976
Joined: Thu May 20, 2004 11:15 pm
Location: Cary, NC

Post by TheCatt »

Every environment has Windows + Linux, yet you design on Windows, and run on Linux.
It's not me, it's someone else.
GORDON
Site Admin
Posts: 54559
Joined: Sun Jun 06, 2004 10:43 pm
Location: DTManistan
Contact:

Post by GORDON »

Every time you guys say "pentaho" I think of this:

Image
"Be bold, and mighty forces will come to your aid."
TheCatt
Site Admin
Posts: 53976
Joined: Thu May 20, 2004 11:15 pm
Location: Cary, NC

Post by TheCatt »

That would be PEN-ta-ho(es). The software is pen-TAH-ho. For reasons unknown to me.
It's not me, it's someone else.
Post Reply