How to get data from web in python using curl?
How to get data from web in python using curl?
In bash when I used
myscript.sh
file="/tmp/vipin/kk.txt"
curl -L "myabcurlx=10&id-11.com" > $file
cat $file
./myscript.sh
gives me below output
./myscript.sh
1,2,33abc
2,54fdd,fddg3
3,fffff,gfr54
When I tried to fetch it using python and tried below code -
mypython.py
command = curl + ' -L ' + 'myabcurlx=10&id-11.com'
output = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE).stdout.read().decode('ascii')
print(output)
python mypython.py
throw error, Can you please point out what is wrong with my code.
python mypython.py
Error :
/bin/sh: line 1: &id=11: command not found
Wrong Parameter
Meanwhile, why are you trying to build a command line and run it with
shell=True
? Why not just build a list and run it directly? Among many other benefits, that means you don’t need to worry about getting the arguments properly quoted.– abarnert
Aug 20 at 15:53
shell=True
Moreover, why use an external tool such as
curl
when Python has a built-in HTTP client library (though you'll probably want to use the third-party convenience library requests
)?– tripleee
Aug 20 at 16:00
curl
requests
2 Answers
2
command = curl + ' -L ' + 'myabcurlx=10&id-11.com'
Print out what this string is, or just think about it. Assuming that curl
is the string 'curl'
or '/usr/bin/curl'
or something, you get:
curl
'curl'
'/usr/bin/curl'
curl -L myabcurlx=10&id-11.com
That’s obviously not the same thing you typed at the shell. Most importantly, that last argument is not quoted, and it has a &
in the middle of it, which means that what you’re actually asking it to do is to run curl
in the background and then run some other program that doesn’t exist, as if you’d done this:
&
curl
curl -L myabcurlx=10 &
id-11.com
Obviously you could manually include quotes in the string:
command = curl + ' -L ' + '"myabcurlx=10&id-11.com"'
… but that won’t work if the string is, say, a variable rather than a literal in your source—especially if that variable might have quote characters within it.
The shlex
module has helpers to quoting things properly.
shlex
But the easiest thing to do is just not try to build a command line in the first place. You aren’t using any shell features here, so why add the extra headaches, performance costs, problems with the shell getting in the way of your output and retcode, and possible security issues for no benefit?
Make the arguments a list rather than a string:
command = [curl, '-L', 'myabcurlx=10&id-11.com']
… and leave off the shell=True
shell=True
And it just works. No need to get spaces and quotes and escapes right.
Well, it still won’t work, because Popen
doesn’t return output, it’s a constructor for a Popen
object. But that’s a whole separate problem—which should be easy to solve if you read the docs.
Popen
Popen
But for this case, an even better solution is to use the Python bindings to libcurl instead of calling the command-line tool. Or, even better, since you’re not using any of the complicated features of curl in the first place, just use requests
to make the same request. Either way, you get a response object as a Python object with useful attributes like text
and headers
and request.headers
that you can’t get from a command line tool except by parsing its output as a giant string.
requests
text
headers
request.headers
Probably use
run
or check_output
rather than Popen
though.– tripleee
Aug 20 at 16:06
run
check_output
Popen
@tripleee Yes—either that, or he has to call
communicate
. But fixing all of the problems, even the ones he didn’t ask for, with copy-pastable code often leads to people (future readers, not just the OP) just copy-pasting and ignoring the whole last paragraph, while fixing the problem only to immediately run into another one they didn’t know to ask about might make them consider that maybe using pycurl or requests really is easier.– abarnert
Aug 20 at 16:30
communicate
import subprocess
fileName="/tmp/vipin/kk.txt"
with open(fileName,"w") as f:
subprocess.read(["curl","-L","myabcurlx=10&id-11.com"],stdout=f)
print(fileName)
recommended approaches:
Thank you, Worked for me after adding braces with fileName. Without braces got error "SyntaxError: Missing parentheses in call to 'print'" and need to remove double quote from curl and define it separately otherwise it will take it as string.
– VIPIN KUMAR
Aug 20 at 18:21
Giving people Python 2-specific answers when they’re using Python 3, especially novices, is a bad idea. It’s not just the
print
; you’re also recommending urllib2
, which no longer exists, and subprocess.call
, which is deprecated.– abarnert
Aug 20 at 18:46
print
urllib2
subprocess.call
@abarnert thanks for the feedback!
– Ryan Zebian
Aug 20 at 19:08
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Your command line has quotes around the second argument. Your Python string does not.
– abarnert
Aug 20 at 15:52