Python
Python
Back off the road. Looking for something to do at my cube other than reddit.
This sucker called out to me. First couple chapters have been straightforward enough, still haven't moved passed what I'd already gotten from Python Meetups and Conferences.
I just need to fully dust off my Python skills, figure out the new libraries, and then find a worthy dataset.
This sucker called out to me. First couple chapters have been straightforward enough, still haven't moved passed what I'd already gotten from Python Meetups and Conferences.
I just need to fully dust off my Python skills, figure out the new libraries, and then find a worthy dataset.
Python
Machine learning? Yawn. You will come to hate the phrases "training data" and "vector machine."
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Python
In Frisco dollars?
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Python
I wrote my first real Python program for professional use today. Just a simple script validating the # of exhibits and page numbers(and therefore individual TIFFS) in a trial database.
The program our techs were using took hours to run.
Mine takes a few minutes! (a few hours to write)
6k documents, 480k TIFFs.
The program our techs were using took hours to run.
Mine takes a few minutes! (a few hours to write)
6k documents, 480k TIFFs.
Python
I should've subcontracted out the shitty custom file compare job I had to you.
Diogenes of Sinope: "It is not that I am mad, it is only that my head is different from yours."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Arnold Judas Rimmer, BSC, SSC: "Better dead than smeg."
Python
I wrote my first python script. It logs into a client-certificate secured website, and then gets a list of available files. I then hard-coded one file to download.
Need to re-write it so that it will parse the list of files, and then download all of them.
But not bad for 60 minutes or work... which was really a lot of googling.
Need to re-write it so that it will parse the list of files, and then download all of them.
But not bad for 60 minutes or work... which was really a lot of googling.
It's not me, it's someone else.
Python
I did a little more python, got more familiar with scripts, and some of the (many, many) modules out there.
I was asked by a energy trading firm to write a program in python as part of a job test/recruitment thing. I sent them a version today... and have to fly up to Cambridge, MA to see about an interview later this month
I was asked by a energy trading firm to write a program in python as part of a job test/recruitment thing. I sent them a version today... and have to fly up to Cambridge, MA to see about an interview later this month
It's not me, it's someone else.
Python
Killer! That sounds awesome. Are you adding it to your contracting side-hustle or thinking about switching?TheCatt wrote: I did a little more python, got more familiar with scripts, and some of the (many, many) modules out there.
I was asked by a energy trading firm to write a program in python as part of a job test/recruitment thing. I sent them a version today... and have to fly up to Cambridge, MA to see about an interview later this month
Can you tell more about the program they asked you to write as a test?
Last edited by Troy on Sat Mar 31, 2018 1:51 pm, edited 1 time in total.
Python
They are part of my existing side-hustle. They wanted to hire me 3 years ago, but I refused to move to Boston. They hired me to write a trading system for them, which I did. They also hired someone else to be their day-to-day guy. That guy quit, so they're looking. Since they've worked with me for 3 years now, they are open to me working remote, and visiting 1 week/month. So this would be switching.
They are trying to collect data that's in an XML format from a certain publisher. The files are published as links on an HTML page. There are two types of files, one is published every 15 minutes, the other published every 60 minutes.
Program 1 (one for 15 minute data, one for 60 minute data)
1 - Turns out the requests library is very handy for downloading HTML.
2 - To identify all links, I just used xpath against the HTML to identify all a/href tags.
3 - I then iterate over the collection, and verify that each link is a file I want (some links are irrelevant)
4 - If it's relevant, I extract the filename, and use scandir to look for that in history. If it doesn't exit, I download it.
Program 2 (one for 15 minute, one for 60 minute)
1 - Takes a given XML file.
2 - Looks for the relevant elements in each XML node that I care about (about 1000 for 1 file, 11000 for the 2nd file)
3 - I load these into a tuple.
4 - I load the tuples into an array.
5 - Every 1,000 rows, I send the data to the DB using fast_executemany (orders of magnitude faster than single-line inserts)
There's a lot to do in terms of error-handling, email notifications, robustness, etc. but this was basically a test to see if I could do one of their main needs.
It's not me, it's someone else.
Python
I'm going up there tomorrow. They also want me to present about AWS, and what will make their life easier in AWS. They tend to use a lot of python, Excel, client-server style apps. So that could be a bit trickier.
It's not me, it's someone else.
-
- Posts: 8056
- Joined: Thu May 20, 2004 7:32 pm
Python
Glue or Batch? Spark on EMR? I dunno.
Python
Glue is terrible. Do you actually use it? I had a half day with our AWS Account reps, and I told them if they mentioned Glue they had to buy lunch. Of course, we dialed in some AWS experts later, and the first one mentioned Glue, so we get 'free lunch'
They only use S3 (and only a little), EC2, and RDS.
So the ones I think would be relevant to them, or could be:
Lambda – Run code for short periods of time, serverlessly.
Scheduled operations
Triggered operations (new file in S3)
S3 – Storage (S3IA, Glacier)
RDS – Aurora or open-source engines ($) – Serverless (auto-scaling RDBMS)
Boto3 – AWS API via python
Neptune – Graph DB representation of power networks
RedShift – OLAP DB for analytical workloads.
EMR – Spark?
Step Functions – (State machines)
SageMaker – Guided/Automated Machine Learning.
It's not me, it's someone else.
-
- Posts: 8056
- Joined: Thu May 20, 2004 7:32 pm
Python
I have not used Glue. I was maybe also going to recommend Sagemaker since it'll do some management of training jobs and also stand up a REST endpoint but it's pretty specialized unless you want to roll your own docker image.
Python
Have you used Sagemaker? The docs promise to bring ML to people who don't know ML. Truth? I haven't used it at all yet.
It's not me, it's someone else.
-
- Posts: 8056
- Joined: Thu May 20, 2004 7:32 pm
Python
I don’t think it’s gonna do that. I’ve done some tutorials and read some docs. It’s basically a docker container that will run a Jupiter notebook and also launch training jobs and host a web server.
It might be more accurate to say it manages some of the plumbing of ML, so that if you do understand ML you can actually do ML instead of the plumbing which is normally about 90% of what you have to do to get ML in production. And not even sure how good a job it does of that.
It might be more accurate to say it manages some of the plumbing of ML, so that if you do understand ML you can actually do ML instead of the plumbing which is normally about 90% of what you have to do to get ML in production. And not even sure how good a job it does of that.