 |
|
 |
|
Next: [PATCH 0/10] Per-bdi writeback flusher threads v1..
|
| Author |
Message |
External

Since: Jun 17, 2009 Posts: 4
|
(Msg. 1) Posted: Wed Jun 17, 2009 6:19 am
Post subject: Wget request from Linux-PHP [expert] ? Archived from groups: comp>os>linux>development>apps, others (more info?)
|
|
|
Hi,
I am working in an admin department where I have to retrieve the
information from a State Website regarding 1.500 companies on a daily
basis. Manually doing the job represents a couple of hours. I try to
computerize that process to save energy.
I try to wget the data from a Linux/PHP server. Do you know how to
retrieve information from complicated WGET ?
Eg : the KOUGLOFF company.
State website is : http://www.infogreffe.fr/infogreffe/reset.do
Siren entry is : 448676973
Manually doing the job provide the information regarding KOUGLOFF.
Once, I get a session ID, I can copy and paste that link to retireve
the info :
http://www.infogreffe.fr/infogreffe/newRechercheEntreprise.xml?siren=448676973
I created a unix application ready to retrieve my daily 1.500
companies using recurring WGET.
Problem is that, as I do not have a session ID, my server cannot WGET
the file. I try to add a parallel session ID to access the file but
that does not lead me to the content :
http://www.infogreffe.fr/infogreffe/newRechercheEntreprise.xml?siren=4...76973&b
From your knowledge and experience : Do you know how I could retrieve
the information ?
Adding a sessionid to the URL did not lead to a success for a Linux
script.
Thank you very much for any help, solution or advice.
Norman. |
|
| Back to top |
|
 |  |
External

Since: Jun 17, 2009 Posts: 4
|
(Msg. 2) Posted: Wed Jun 17, 2009 9:25 am
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Hi,
thanks : so, carefully reading the CURL command instead of the WGET
command.
The CURL instruction can be asked at least from LINUX and PHP. LINUX
commands are working much better, to my initial opinion. The website
listed above seems not to use cookies.
The access to the KOUGLOFF company is available through session
recognition. It seems that the principle is : no session equals no
access to the company details.
Problem is that the session principles using CURL under Linux is not
so easy reading : http://linux.about.com/od/commands/l/blcmdl1_curl.htm
There are informations regarding cookies but nothing regarding
sessions.
curl_init(), curl_setopt(), curl_exec(), curl_close() seem to be only
available using PHP.
Thank you very much for any help, operating solutions or advice.
Norman. |
|
| Back to top |
|
 |  |
External

Since: Jun 17, 2009 Posts: 2
|
(Msg. 3) Posted: Wed Jun 17, 2009 11:20 am
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Pseudonyme schreef:
> Hi,
>
Hello Norman,
> I am working in an admin department where I have to retrieve the
> information from a State Website regarding 1.500 companies on a daily
> basis. Manually doing the job represents a couple of hours. I try to
> computerize that process to save energy.
Makes sense.
>
> I try to wget the data from a Linux/PHP server. Do you know how to
> retrieve information from complicated WGET ?
Why use WGET if you are on PHP?
Why not simply use file() or file_get_contents() and feed it an URL?
http://nl3.php.net/manual/en/function.file.php
>
> Eg : the KOUGLOFF company.
>
> State website is : http://www.infogreffe.fr/infogreffe/reset.do
> Siren entry is : 448676973
> Manually doing the job provide the information regarding KOUGLOFF.
>
> Once, I get a session ID, I can copy and paste that link to retireve
> the info :
> http://www.infogreffe.fr/infogreffe/newRechercheEntreprise.xml?siren=448676973
>
>
> I created a unix application ready to retrieve my daily 1.500
> companies using recurring WGET.
>
> Problem is that, as I do not have a session ID, my server cannot WGET
> the file. I try to add a parallel session ID to access the file but
> that does not lead me to the content :
> http://www.infogreffe.fr/infogreffe/newRechercheEntreprise.xml?siren=4...76973&b
>
OK, that makes sense.
You must first log in, so you have an active session.
After that you can get the info.
A few remarks:
1) Judging by the length of the sessionid, this is not a standard PHP
generated sessionid which are shorter.
2) maybe they only accept php sessionid via a cookie instead of GET.
If I were you I would start by figuring out how the sessionid is
transferred to you. Is it in a cookie? In a form? (Appearantly it is not
meant to be in the URL)
Maybe consider using CURL instead of WGET or file() as I suggested above.
http://nl3.php.net/manual/en/book.curl.php
Using CURL, you can add cookies that contain sessionid, and also mimic
POSTS reliably.
Regards,
Erwin Moller
--
"There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult."
-- C.A.R. Hoare |
|
| Back to top |
|
 |  |
External

Since: Jun 13, 2009 Posts: 4
|
(Msg. 4) Posted: Wed Jun 17, 2009 11:20 am
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
On 06/17/2009 02:19 PM, Pseudonyme wrote:
> Hi,
>
> I am working in an admin department where I have to retrieve the
> information from a State Website regarding 1.500 companies on a daily
> basis. Manually doing the job represents a couple of hours. I try to
> computerize that process to save energy.
>
> I try to wget the data from a Linux/PHP server. Do you know how to
> retrieve information from complicated WGET ?
>
> Eg : the KOUGLOFF company.
>
> State website is : http://www.infogreffe.fr/infogreffe/reset.do
> Siren entry is : 448676973 Manually doing the job provide the
> information regarding KOUGLOFF.
>
> Once, I get a session ID, I can copy and paste that link to retireve
> the info :
> http://www.infogreffe.fr/infogreffe/newRechercheEntreprise.xml?siren=448676973
>
Looking
>
at the HTTP headers (http://pastebin.com/f8ebfee) retrieving
the above page involves 2 redirects and 1 cookie.
>
> I created a unix application ready to retrieve my daily 1.500
> companies using recurring WGET.
>
> Problem is that, as I do not have a session ID, my server cannot WGET
> the file. I try to add a parallel session ID to access the file but
> that does not lead me to the content :
> http://www.infogreffe.fr/infogreffe/newRechercheEntreprise.xml?siren=4...76973&b
If
>
wget is giving you problems, then you can either try and fix them,
or else use another API. Something like libcurl will handle your network
IO for you and provide an API to read and set the HTTP headers.
> From your knowledge and experience : Do you know how I could retrieve
> the information ? Adding a sessionid to the URL did not lead to a
> success for a Linux script.
I would look into using libcurl based on what you've said and the
headers. It's be a simple matter to parse the cookies and sessionID to
use in subsequent requests.
>
> Thank you very much for any help, solution or advice.
>
> Norman.
>
>
>
>
> |
|
| Back to top |
|
 |  |
External

Since: Jun 17, 2009 Posts: 4
|
(Msg. 5) Posted: Thu Jun 18, 2009 1:02 am
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Hi,
To retrieve the content, I will not use : wget, file() nor
file_get_contents. I believe CURL using a UNIX script is more
effective.
The point is that the CURL has to open a session to access the
detailed content.
From the documentation, opening a session with UNIX/CURL command is
not documented.
Do you know how to open a session using a UNIX/CURL command ?
Norman |
|
| Back to top |
|
 |  |
External

Since: Jun 17, 2009 Posts: 2
|
(Msg. 6) Posted: Thu Jun 18, 2009 3:21 am
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Pseudonyme schreef:
> Hi,
> thanks : so, carefully reading the CURL command instead of the WGET
> command.
>
> The CURL instruction can be asked at least from LINUX and PHP. LINUX
> commands are working much better, to my initial opinion. The website
> listed above seems not to use cookies.
> The access to the KOUGLOFF company is available through session
> recognition. It seems that the principle is : no session equals no
> access to the company details.
>
> Problem is that the session principles using CURL under Linux is not
> so easy reading : http://linux.about.com/od/commands/l/blcmdl1_curl.htm
> There are informations regarding cookies but nothing regarding
> sessions.
>
> curl_init(), curl_setopt(), curl_exec(), curl_close() seem to be only
> available using PHP.
>
> Thank you very much for any help, operating solutions or advice.
>
> Norman.
>
What part excactly in my previous answer did you not understand?
Erwin Moller
--
"There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult."
-- C.A.R. Hoare |
|
| Back to top |
|
 |  |
External

Since: Jun 18, 2009 Posts: 1
|
(Msg. 7) Posted: Thu Jun 18, 2009 5:44 am
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
You feel like you are beating your head against a wall? Hehe. This
groups is php based so it's ok to assume we are talking about php.
Some people just don't understand what is going on and they expect you
to do their work for them. It's not hard. Your answer was perfect and
if he/she does not get it, let them go somewhere else. |
|
| Back to top |
|
 |  |
External

Since: Jun 18, 2009 Posts: 3
|
(Msg. 8) Posted: Thu Jun 18, 2009 6:48 am
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Pseudonyme wrote:
> Hi,
> To retrieve the content, I will not use : wget, file() nor
> file_get_contents. I believe CURL using a UNIX script is more
> effective.
> The point is that the CURL has to open a session to access the
> detailed content.
> From the documentation, opening a session with UNIX/CURL command is
> not documented.
> Do you know how to open a session using a UNIX/CURL command ?
> Norman
http is a stateless protocol - there is no such thing as a session in
the protocol. The only "session" is that which is defined by the server
software. Therefore, there is no special API for establishing a session.
For this session to work, the server sends a session id to the client,
and the client responds with this session id on each request. The
session id may be stored on the client in a cookie (the most common
case), or it may be a parameter in the URI (generally when the client
does not support cookies). cURL can handle cookies just fine; if
instead it's part of the URI you'll have to parse the page containing
the link to get the session id.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex.RemoveThis@attglobal.net
================== |
|
| Back to top |
|
 |  |
External

Since: Jun 17, 2009 Posts: 4
|
(Msg. 9) Posted: Thu Jun 18, 2009 8:59 am
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
|
|
| Back to top |
|
 |  |
External

Since: Jun 18, 2009 Posts: 3
|
(Msg. 10) Posted: Thu Jun 18, 2009 10:25 am
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Danny Wilkerson wrote:
> You feel like you are beating your head against a wall? Hehe. This
> groups is php based so it's ok to assume we are talking about php.
> Some people just don't understand what is going on and they expect you
> to do their work for them. It's not hard. Your answer was perfect and
> if he/she does not get it, let them go somewhere else.
No, I'll try to help those who are interested in learning. The op
obviously is not familiar with how sessions work, which is quite common.
Most of the time it can be somewhat ignored because PHP does most of
the session handling behind the scenes. However, when you start trying
to do the actions the op is talking about, it requires a little better
understanding about how sessions work.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex DeleteThis @attglobal.net
================== |
|
| Back to top |
|
 |  |
External

Since: Oct 23, 2005 Posts: 36
|
(Msg. 11) Posted: Thu Jun 18, 2009 1:20 pm
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
|
|
| Back to top |
|
 |  |
External

Since: Jun 18, 2009 Posts: 3
|
(Msg. 12) Posted: Thu Jun 18, 2009 1:20 pm
Post subject: Re: Wget request from Linux-PHP [expert] ? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Pseudonyme wrote:
> Hi all, CURL Command
> 1) That is working ok getting properly the content in non-obscured
> website like. That works for transparency-governed websites like :
>> curl 'http://www.abca.com' | more
>
>
> 2) To retrieve properly the content of the KOUGLOF company
>
> from here: http://tinyurl.com/lyhsmh
>
> SIREN : 453786980
>
>
> That is a major problem. We tested to the SIREN POST data :
>
>> curl -F "siren=453786980" http://www.infogreffe.fr/infogreffe/newRechercheEntreprise.xml | more
>> curl -F "siren=@453786980" http://www.infogreffe.fr/infogreffe/newRechercheEntreprise.xml | more
>> curl -d "siren=453786980" http://www.infogreffe.fr/infogreffe/newRechercheEntreprise.xml?siren=453786980
>
> No content at-all can be retrieved !... and we carefully read 3 times
> each of your message as well as all the CURL UNIX official
> documentation.
>
> That one suffer from the same transparency problem :
> http://avis-situation-sirene.insee.fr/avisitu/IdentificationListeSiret...?bSubmi
>
> Problem is that I have to suggest my initial searches to my
> supervisor, and I do not see where to progress and get the answer.
> Thank you for any help or operating solutions,
> Norman
>
>
That's because the site is using javascript on the button clicks. You
will have to emulate the javascript to get it to work - and this is
likely to fail if they change the page and/or javascript.
I'm with Tauno on this one - the site is obviously designed to prevent
what you're trying to do. I would recommend you contact the company to
see if there is another way to retrieve the data.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex.RemoveThis@attglobal.net
================== |
|
| Back to top |
|
 |  |
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|
 |
|
|