Forum Discussion
manny213
Dec 16, 2024Brass Contributor
download data from web
Hi everyone I need to download a bunch of files from a website: https://www.finra.org/finra-data/browse-catalog/equity-short-interest/files The address doesn't show the filters that need to be app...
- Dec 17, 2024
Hi manny213 ,
The safest, most reliable approach would be to retrieve the data using the API supplied by INRA:
- Home | FINRA API Developer Center
- Documentation | FINRA API Developer Center
- Catalog of Datasets | FINRA API Developer Center
I've included an alternative hack below for fetching the files you mentioned, but while this works today, there's many reasons it could fail in the future and so I don't recommend this approach.
Get-FinraFiles.ps1
[cmdletbinding()] param() # Request URI. $uri = 'https://www.finra.org/views/ajax?_wrapper_format=drupal_ajax&custom_month%5Bmonth%5D=any&custom_year%5Byear%5D=any'+ '&view_name=transparency_services&view_display_id=equity_short_interest_biweekly&view_args=&view_path=%2Fnode%2F336166'+ '&view_base_path=&view_dom_id=6df270b0453b3e8d5fc32607f5a986f1d230dcb33864626710bc37237fa9aec2&pager_element=0&_drupal_ajax=1'+ '&ajax_page_state%5Btheme%5D=finra_bootstrap_sass&ajax_page_state%5Btheme_token%5D='+ '&ajax_page_state%5Blibraries%5D=addtoany%2Faddtoany.front%2Cbetter_exposed_filters%2Fauto_submit%2Cbetter_exposed_filters'+ '%2Fgeneral%2Cblazy%2Fbio.ajax%2Cbootstrap_barrio%2Fbreadcrumb%2Cbootstrap_barrio%2Fform%2Cbootstrap_barrio%2Fgesta_opensans'+ '%2Cbootstrap_barrio%2Fglobal-styling%2Cchosen%2Fdrupal.chosen%2Cchosen_lib%2Fchosen.css%2Cfinra_bootstrap_sass'+ '%2Fapp-dynamic-reporting%2Cfinra_bootstrap_sass%2Fback-button-handler%2Cfinra_bootstrap_sass%2Fcookie-classification'+ '%2Cfinra_bootstrap_sass%2Fgesta%2Cfinra_bootstrap_sass%2Fglobal-styling%2Cfinra_bootstrap_sass%2FglossaryViews'+ '%2Cfinra_bootstrap_sass%2Fopensans%2Cfontawesome%2Ffontawesome.webfonts%2Cfontawesome%2Ffontawesome.webfonts.shim'+ '%2Cparagraphs%2Fdrupal.paragraphs.unpublished%2Csuperfish%2Fsuperfish%2Csuperfish%2Fsuperfish_hoverintent%2Csuperfish'+ '%2Fsuperfish_supersubs%2Csuperfish%2Fsuperfish_supposition%2Csystem%2Fbase%2Cviews%2Fviews.ajax%2Cviews%2Fviews.module'; # Destination folder. $destination = "D:\Data\Temp\Forum\finra"; # Invoke the web call. $data = Invoke-RestMethod -Method Get -Uri $uri -UseBasicParsing -ErrorAction:Stop; #region Use BITS to download the files. $bitsFiles = [regex]::Matches($data[2].data, "https.*\.csv", [System.Text.RegularExpressions.RegexOptions]::IgnoreCase).Value | ForEach-Object { $parts = $_.Split("/"); $filename = $parts[$parts.Length - 1]; [PSCustomObject] @{ Source = $_; Destination = "$destination\$filename"; } }; $bitsJobName = "forumExample"; $bitsFiles | Start-BitsTransfer -DisplayName $bitsJobName -ErrorAction:Stop; Get-BitsTransfer -Name $bitsJobName | Remove-BitsTransfer; #endregion
Cheers,
Lain
LainRobertson
Dec 17, 2024Silver Contributor
Hi manny213 ,
The safest, most reliable approach would be to retrieve the data using the API supplied by INRA:
- Home | FINRA API Developer Center
- Documentation | FINRA API Developer Center
- Catalog of Datasets | FINRA API Developer Center
I've included an alternative hack below for fetching the files you mentioned, but while this works today, there's many reasons it could fail in the future and so I don't recommend this approach.
Get-FinraFiles.ps1
[cmdletbinding()]
param()
# Request URI.
$uri = 'https://www.finra.org/views/ajax?_wrapper_format=drupal_ajax&custom_month%5Bmonth%5D=any&custom_year%5Byear%5D=any'+
'&view_name=transparency_services&view_display_id=equity_short_interest_biweekly&view_args=&view_path=%2Fnode%2F336166'+
'&view_base_path=&view_dom_id=6df270b0453b3e8d5fc32607f5a986f1d230dcb33864626710bc37237fa9aec2&pager_element=0&_drupal_ajax=1'+
'&ajax_page_state%5Btheme%5D=finra_bootstrap_sass&ajax_page_state%5Btheme_token%5D='+
'&ajax_page_state%5Blibraries%5D=addtoany%2Faddtoany.front%2Cbetter_exposed_filters%2Fauto_submit%2Cbetter_exposed_filters'+
'%2Fgeneral%2Cblazy%2Fbio.ajax%2Cbootstrap_barrio%2Fbreadcrumb%2Cbootstrap_barrio%2Fform%2Cbootstrap_barrio%2Fgesta_opensans'+
'%2Cbootstrap_barrio%2Fglobal-styling%2Cchosen%2Fdrupal.chosen%2Cchosen_lib%2Fchosen.css%2Cfinra_bootstrap_sass'+
'%2Fapp-dynamic-reporting%2Cfinra_bootstrap_sass%2Fback-button-handler%2Cfinra_bootstrap_sass%2Fcookie-classification'+
'%2Cfinra_bootstrap_sass%2Fgesta%2Cfinra_bootstrap_sass%2Fglobal-styling%2Cfinra_bootstrap_sass%2FglossaryViews'+
'%2Cfinra_bootstrap_sass%2Fopensans%2Cfontawesome%2Ffontawesome.webfonts%2Cfontawesome%2Ffontawesome.webfonts.shim'+
'%2Cparagraphs%2Fdrupal.paragraphs.unpublished%2Csuperfish%2Fsuperfish%2Csuperfish%2Fsuperfish_hoverintent%2Csuperfish'+
'%2Fsuperfish_supersubs%2Csuperfish%2Fsuperfish_supposition%2Csystem%2Fbase%2Cviews%2Fviews.ajax%2Cviews%2Fviews.module';
# Destination folder.
$destination = "D:\Data\Temp\Forum\finra";
# Invoke the web call.
$data = Invoke-RestMethod -Method Get -Uri $uri -UseBasicParsing -ErrorAction:Stop;
#region Use BITS to download the files.
$bitsFiles = [regex]::Matches($data[2].data, "https.*\.csv", [System.Text.RegularExpressions.RegexOptions]::IgnoreCase).Value | ForEach-Object {
$parts = $_.Split("/");
$filename = $parts[$parts.Length - 1];
[PSCustomObject] @{
Source = $_;
Destination = "$destination\$filename";
}
};
$bitsJobName = "forumExample";
$bitsFiles | Start-BitsTransfer -DisplayName $bitsJobName -ErrorAction:Stop;
Get-BitsTransfer -Name $bitsJobName | Remove-BitsTransfer;
#endregion
Cheers,
Lain
- manny213Dec 17, 2024Brass Contributor
Thank you LainRobertson
I didn't know about that API. I will take a look. I agree that API is the best approach vs scraping.
I ran the script and it downloaded the files. Thank you!!