Don't even know the term for what I'm trying to make

Don't even know the term for what I'm trying to make
0

#1

I’m usually pretty good at teaching myself something new, but I’m stuck at the beginning because I don’t know the term for what I’m trying to make.

Basically, I have an Excel sheet that uses a web query to pull information from sites, update automatically, calculate a bunch of stuff, and then gets displayed through something like OneDrive. I want to cut Excel out of the picture and create something from scratch that does this.

What on earth am I trying to make??

Is this an API? A web app? A perpetual motion machine?

I feel like I can start researching how to do this as soon as I learn what “this” is. Of course, I’d need to gut through learning whatever language is best (any advice there would also be appreciated), but I just need help moving in the right direction.

Thanks in advance…first post here!


#2

Well essentially, what you are making is a program!

Without knowing the exact needs you have, it sounds like what you really want to build is a web scraper that outputs a report to either a PDF or webpage?

I got started making stuff like this with Python (Scrapy and Beautiful Soup are good webscraping modules) to automate parts of my job like you describe :slight_smile:

Nowadays I’d likely do it with JavaScript and having it spit out printable webpages, but only because that’s what I’ve been using most recently. X-ray is a good npm package for webscraping.


#3

Definitely a perpetual motion machine. Good luck.


#4

Yea, i think JacksonBates gave some good advice - It all depends what info your pulling - how and where you want it displayed - how do you want it to work - to make something that will fit your needs. You can’t just cut excel out of the equation - that’s where you get the info - you would need to replace it with something. Im not familiar with onedrive so im not sure how that works - is it just displaying the spreadsheet? (.xls i think). How does what you have work - does it constantly search for new info and update - does it search when you open it in onedrive? do you need to hit a button. --perhaps the best way for us to help get you on a better track would be to tell us the desired result you want


#5

It might help us provide you with better information if you could elaborate on the following (if you are at liberty to):

  1. What information are you scraping from the web?
  2. What kind of processing does it go through?
  3. What format do you need to present the final product in?

#6

Sounds to me more like OP’s current solution gets all the data from the sites, not the Excel sheet. But yeah, more info needed.

Also, Excel now uses the XML-based format .xlsx, not the binary format .xls. Probably completely irrelevant here, but… The more you know! (I’m a bit of a file format nerd.)


#7

well, perhaps its excel 2003


#8

Ah okay, now we’re moving. Let me try to compile, and thanks for the help so far:

  1. When I say “cut Excel out,” I do mean “replace it with whatever it is im making.” I just don’t know what I’m making, but…

  2. “Web scraping program” appears to be the term most associated with pulling the information. Obviously, I’ll need to program this and that will be the next battle, but conceptually speaking, after I pull that information, I’m stuck where it would store it while it does its calculations. Does this happen online? Backend?

  3. Silly of me, but I didn’t consider how it would output. I took that for granted from Excel. Probably displaying something that looked like a spreadsheet would work.

But now for the specifics…

I used a web query to pull MLB’s probable pitchers and schedules from their website (I think they have an API portion of the website, if not, I take it from another website - it’s been a year and I forget). I format these to my liking in Excel, combine them with statistics I pulled from other sites, and basically made a new spreadsheet.

One of the main problems here is that I need to keep Excel open throughout this process. I’d like it to run in the background of a website and only spit out what I need. Hopefully that’s clearer.


#9

Ah, yes, that’s clearer. I’d suggest you look up libraries for generating Excel files (can use front-end JavaScript if processing required isn’t too intense). Then see if DropBox has an API for uploading stuff if you want to automate that part as well.


#10

If you’re getting the data from an API, it’s not web scraping. Web scraping is when you extract the data from the HTML code of a website. If you get the data from an API the data is already formatted in a standard format, e.g. JSON or XML, and you can use it directly.

I’m just wondering how much work replacing Excel is going to be. If the formulas are simple, it shouldn’t be too hard. However it’s possible to get Excel to do really complex calculations on lots of data - that’s what it’s designed to do -, so if that’s your case, doing all those calculations yourself, accurately, might be difficult. I am unfamiliar with baseball so I have no idea how complex those calculations are likely to be.

I’ve spent a lot of time trying to answer this question. I find it very hard to answer because it’s not clear why you want Excel out of the picture. Are you trying to produce a file that you can upload to OneDrive, just without opening Excel, for your own needs, or are you going one step further and publishing it on the web? Or maybe you are already publishing the data on the web, updating the data manually, and you want to streamline the process of updating the data? Your technical choices will be different and so will the time you will need to invest into learning and producing your solution.

I see three main options, depending on different assumptions:

Option 1: a script that runs in the terminal and produces a file (no web development)
Assumptions: you want a program that either runs all the time on your computer or is launched automatically at a given interval. All it does is grab the data from the web API, performs calculations and produces a file that maybe gets uploaded to OneDrive or some other place. The point is to always have the calculated results available for the latest raw data, without needing to have Excel running all the time. You don’t care about publishing on the web.

Here what you need is a language that allows you to do calculations easily. I wouldn’t recommend Javascript for this as it has weird quirks when dealing with numbers, leading to bugs. Python is friendly to beginners and has the NumPy library which is good for calculations and widely used by scientists. Just make sure you choose between Python 2 and Python 3 and stick to it, as they are not compatible (see https://wiki.python.org/moin/Python2orPython3/).

Option 2: a purely front-end web app
Assumptions: The reason you want to get rid of Excel is to make your calculated data available to other people through a web page. The data needs to be always up to date. The calculations are not very time-consuming and you don’t mind making the calculation code public. You don’t mind if the data is only available to browsers running Javascript.

I suppose, since you asked here, that you plan to publish your results on a web page.

If at all possible I’d design this as a front-end project. What you talked about (a script running on the back-end) requires a lot of programming experience and it will be a long time before it’s up and running and reasonably solid.

Doing this as a front-end project means you write a web page that uses Javascript on the user’s computer to query the API, do the calculations and display the results. It might not be viable if the calculations are very long or require a lot of processing power. If it can all be done within 10 or even 20 seconds on the average computer, it might be worth it, provided you show the user that progress is happening while you calculate.

The main hurdle I can see here is implementing the calculations efficiently and bug-free. Note that Excel is provided in binary code specific to your processor and is optimised for fast calculations. Javascript would be at least 10 times slower and has weird quirks when you need precision in calculations. So you’d need to read up on that. But it’s the only language you can use in the browser.

Option 3: do the calculations on the back-end
Assumptions: You want to make your calculated data available on the web. You either can’t or won’t perform the calculations in the browser.

Now you’re stuck learning both front-end for the web page and back-end for the data.

If you have your own server machine, you can choose to implement your calculation script in any language, but you have to handle the DevOps responsibilities. If instead you host your web server on someone else’s machines (AWS, Digital Ocean, Heroku…), you will be limited by what they allow you to do. Check carefully the limits and the prices. As far as I can tell, the more freedom you have, the more expensive it gets. Also the price goes up the more data you have and the more pages you serve. It’s possible that different languages are priced differently, I’m not sure, but some languages will give you more choice than others in terms of hosting, e.g. PHP seems available everywhere but other languages are only supported by some providers.

You need to implement:

  • a script that will get your data, similar to the one in option 1. It can write to a file or to a database (a database is more work to maintain and you need to pay extra to host it, updating a file will cause access conflicts which you’ll need to deal with).

  • a web server program that will serve pages in response to a browser request. It will access the data from the file/database and produce HTML that gets sent back to the client.

Possibly the web server could include the code that gets the data from the API and run it asynchronously. I’m not sure what’s best.

Note that, if storing data in a file, there are likely to be annoying issues of conflicting access when the web server wants to read it while the update script is writing to it. If you have a database, you don’t have to deal with this but it’s yet another thing you’ll have to learn to use correctly.

There are other options too. For instance you could implement your own API and have your web page query it. But honestly that’s just complicating the matter. Going the server route is not an easy option and if you’re just starting web development, it’s really a long term goal.

If you hesitate between options 2 and 3, you can start with option 2 knowing that you can always implement your server with Node and reuse the calculation code.


#11

I’m wondering how to get around the need for a back-end. There might be a way to hack it. I haven’t tried it.

I’m thinking if you store the calculated data into a JSON file, you could write a simple front-end application that accesses the file through AJAX. You would need a variant of the script in option 1 to do the calculations, create the JSON file and upload it to your web host with the rest of your web site files.

It sounds like a rather crappy way to solve the problem but it might be enough for your needs, at least in the short term.


#12

Okay, sfiquet, that was an excellent, awesome response that really got me thinking. It highlighted a bunch of potential issues and it’s why I knew I was unclear as to what I was actually trying to accomplish. Let me rephrase and respond to your points, and thanks so much for the help, so far.

1. Option 1: a script that runs in the terminal and produces a file (no web development)

  • I started tinkering with Python and I can see a way where this would ultimately be helpful. But, it is somewhat similar to my Excel issue. I don’t want to use Excel because I don’t want to keep a computer always on and operating this program. I want it happening, for lack of a better term, “in the cloud.” That’s really the reason I want to replace Excel.

2. Option 2: a purely front-end web app

  • This one actually does make sense and I think what you’re saying is that the user would basically push a button and the most up-to-date results would calculate at the moment and then appear. That’s not too bad. But, to your next point, these calculations are insanely complex. I don’t mind converting them to another language (let’s say Python because I’m moving in that direction) manually, but I believe you when you say it is a lot to process. That probably makes this option not possible.

Option 3: do the calculations on the back-end

  • This was originally how I expected to attack my problem, but I didn’t know what and where the “back-end” was. I’m assuming that’s why I would need to get into something like AWS to handle and hold the information. This option seems like the best fit for me, but definitely the most difficult.

Also, to your point about the difference between the API and web scraping, I now understand that you are right. Most of these sites provide an API that output the information I need in JSON. My question would be, what language or type of program do I make to receive this JSON information? That’s where I currently use Excel, and that’s what I want to create to use in Excel’s place.

To lay out my idea a bit clearer, I currently have an Excel file that takes a bunch of players’ statistics from a bunch of different sources (mainly APIs), weather from a weather API, and schedules from MLB’s API. I then have the Excel file combine the relevant information on a single sheet and do some complex calculations that produce a few numbers I display. This works well except I always need to keep Excel running on the host computer and Excel doesn’t always love pulling information without a random error. My goal is to eliminate the need for a host computer to be on all the time, and I assume that means I will create my own “web app” or “program in the cloud” to do the same things Excel is currently doing. Maybe that helps clear up the goal.

Again, thanks so much for the help!


#13

Thanks for your reply, it’s clearer now.

Option 1:
Your issue is that you don’t want your computer always on, doing calculations. Fair enough. You’re not the only one: When I google “how to run a script continuously in the cloud” quite a few similar questions turn up.

Python Anywhere
For the option of running a script continuously in the cloud without running a web server at the same time, I don’t know what would be involved. As far as I know most cloud offers are geared toward hosting websites (but my knowledge is limited so take it with a pinch of salt). Maybe look into PythonAnywhere if using Python. It has scheduled tasks (see below) and an ‘Always On Task’ feature in beta for paid accounts.

Raspberry Pi
An alternative could be running your script on a Raspberry Pi instead of your computer. In case you’re not familiar with it, it’s a super-cheap, super-small computer that typically runs some version of Linux. It was initially meant for kids but it’s been adopted with enthusiasm by hobbyists. I just read an article saying that it’s not a great idea to use it as a web server because your ISP will not like it, but for something private (like, say, continuously getting data and running calculations) it does a great job. There’s tons of information on the web explaining how to set up a Raspberry Pi and it might even be possible to find a club of enthusiasts next to you.

Scheduled tasks
Otherwise I have a question, since you consider important to continually run the program: How often do you expect to pull the data and run a new calculation? I was thinking maybe a few times a day, but maybe you are really thinking continually, i.e. do the process again as soon as it’s finished?

I’m asking because if the update doesn’t need to take place all the time, you could use the operating system to schedule script runs at certain times. You could look into cron jobs if you host on a unix-like platform. I believe that’s what PythonAnywhere is doing with its scheduled tasks. Cron tasks are also an possibility for option 3.

Option 2:
To reiterate, this option would only work with the calculations being implemented in Javascript. To be fair, I don’t think the performance would be worse in Javascript than in Python. Both are relatively slow languages, compared to something like C.

BTW my claim that Javascript would be 10 times slower than Excel is probably wrong. It came from a half-remembered difference in the time allowed to solve programming challenges in various languages on HackerRank but I remembered wrong. You are actually given 10 seconds for Javascript or Python and 2 seconds for C and C++, so it would only be 5 times slower (if the given ratio is typical).

Option 3
You say this sounds like the best fit for you. It might be if your motivation is to create a website and option 2 is not possible. It is definitely overkill if you just want to run your calculations remotely.

Getting JSON data from an API:
To answer your question, any language with internet libraries will do. They will not necessarily behave the same way though:

  • In Javascript the requests are asynchronous, which means that you send a request to the API and then your program can do something else while it waits for the response. When the response arrives back, a function you provided with your request is called and deals with the data. It’s a special way of programming that needs some getting used to but it means that your website is able to stay reactive to user input while waiting for data to come back. It also means that you can fire several API requests to various sources in one go, and then they can be dealt with in parallel by their respective servers before coming back to you.

  • In Python you usually use urllib or urllib2 (See https://stackoverflow.com/questions/12965203/how-to-get-json-from-webpage-into-python-script for examples). Those libraries are synchronous, meaning once you send your request, everything stops until the response gets back. It’s easier to program but it’s less flexible. It’s not a big deal for a script but you wouldn’t want your web server to stop responding for 2 seconds while waiting for an API response. From a cursory google search there are ways to do asynchronous requests in Python but I’m not familiar with them.

  • I don’t know much about using other languages for web development. They probably have their own way of doing things. I hear Java uses threads a lot (asynchronous), not sure how it implements API requests.


I’ve been re-reading all your posts on this question and it seems that what you really want and need is option 1, just not on your machine. You seem open to web development but you never mention needing it. So if I were you, I’d just focus on option 1. There’s enough work to keep you busy for a while. You don’t need to get distracted by all the web development stuff, not for now at least.

Once it’s up and running, you can always decide whether creating a web server around your program is a good use of your time.