Forum Discussion

manny213's avatar
manny213
Brass Contributor
Dec 16, 2024
Solved

download data from web

Hi everyone

I need to download a bunch of files from a website:

https://www.finra.org/finra-data/browse-catalog/equity-short-interest/files

The address doesn't show the filters that need to be applied. If you go to that website and select 'Any' for both Month and Year then you will see all the files.  Can someone help me with creating the PowerShell script to download all the files to a local folder on my machine?

Thank you

cc: LainRobertson 

  • Hi manny213 ,

     

    The safest, most reliable approach would be to retrieve the data using the API supplied by INRA:

     

     

    I've included an alternative hack below for fetching the files you mentioned, but while this works today, there's many reasons it could fail in the future and so I don't recommend this approach.

     

    Get-FinraFiles.ps1

    [cmdletbinding()]
    param()
    
    # Request URI.
    $uri = 'https://www.finra.org/views/ajax?_wrapper_format=drupal_ajax&custom_month%5Bmonth%5D=any&custom_year%5Byear%5D=any'+
        '&view_name=transparency_services&view_display_id=equity_short_interest_biweekly&view_args=&view_path=%2Fnode%2F336166'+
        '&view_base_path=&view_dom_id=6df270b0453b3e8d5fc32607f5a986f1d230dcb33864626710bc37237fa9aec2&pager_element=0&_drupal_ajax=1'+
        '&ajax_page_state%5Btheme%5D=finra_bootstrap_sass&ajax_page_state%5Btheme_token%5D='+
        '&ajax_page_state%5Blibraries%5D=addtoany%2Faddtoany.front%2Cbetter_exposed_filters%2Fauto_submit%2Cbetter_exposed_filters'+
        '%2Fgeneral%2Cblazy%2Fbio.ajax%2Cbootstrap_barrio%2Fbreadcrumb%2Cbootstrap_barrio%2Fform%2Cbootstrap_barrio%2Fgesta_opensans'+
        '%2Cbootstrap_barrio%2Fglobal-styling%2Cchosen%2Fdrupal.chosen%2Cchosen_lib%2Fchosen.css%2Cfinra_bootstrap_sass'+
        '%2Fapp-dynamic-reporting%2Cfinra_bootstrap_sass%2Fback-button-handler%2Cfinra_bootstrap_sass%2Fcookie-classification'+
        '%2Cfinra_bootstrap_sass%2Fgesta%2Cfinra_bootstrap_sass%2Fglobal-styling%2Cfinra_bootstrap_sass%2FglossaryViews'+
        '%2Cfinra_bootstrap_sass%2Fopensans%2Cfontawesome%2Ffontawesome.webfonts%2Cfontawesome%2Ffontawesome.webfonts.shim'+
        '%2Cparagraphs%2Fdrupal.paragraphs.unpublished%2Csuperfish%2Fsuperfish%2Csuperfish%2Fsuperfish_hoverintent%2Csuperfish'+
        '%2Fsuperfish_supersubs%2Csuperfish%2Fsuperfish_supposition%2Csystem%2Fbase%2Cviews%2Fviews.ajax%2Cviews%2Fviews.module';
    
    # Destination folder.
    $destination = "D:\Data\Temp\Forum\finra";
    
    # Invoke the web call.
    $data = Invoke-RestMethod -Method Get -Uri $uri -UseBasicParsing -ErrorAction:Stop;
    
    #region Use BITS to download the files.
    $bitsFiles = [regex]::Matches($data[2].data, "https.*\.csv", [System.Text.RegularExpressions.RegexOptions]::IgnoreCase).Value | ForEach-Object {
        $parts = $_.Split("/");
        $filename = $parts[$parts.Length - 1];
    
        [PSCustomObject] @{
            Source = $_;
            Destination = "$destination\$filename";
        }
    };
    
    $bitsJobName = "forumExample";
    $bitsFiles | Start-BitsTransfer -DisplayName $bitsJobName -ErrorAction:Stop;
    Get-BitsTransfer -Name $bitsJobName | Remove-BitsTransfer;
    #endregion

     

    Cheers,

    Lain

  • LainRobertson's avatar
    LainRobertson
    Silver Contributor

    Hi manny213 ,

     

    The safest, most reliable approach would be to retrieve the data using the API supplied by INRA:

     

     

    I've included an alternative hack below for fetching the files you mentioned, but while this works today, there's many reasons it could fail in the future and so I don't recommend this approach.

     

    Get-FinraFiles.ps1

    [cmdletbinding()]
    param()
    
    # Request URI.
    $uri = 'https://www.finra.org/views/ajax?_wrapper_format=drupal_ajax&custom_month%5Bmonth%5D=any&custom_year%5Byear%5D=any'+
        '&view_name=transparency_services&view_display_id=equity_short_interest_biweekly&view_args=&view_path=%2Fnode%2F336166'+
        '&view_base_path=&view_dom_id=6df270b0453b3e8d5fc32607f5a986f1d230dcb33864626710bc37237fa9aec2&pager_element=0&_drupal_ajax=1'+
        '&ajax_page_state%5Btheme%5D=finra_bootstrap_sass&ajax_page_state%5Btheme_token%5D='+
        '&ajax_page_state%5Blibraries%5D=addtoany%2Faddtoany.front%2Cbetter_exposed_filters%2Fauto_submit%2Cbetter_exposed_filters'+
        '%2Fgeneral%2Cblazy%2Fbio.ajax%2Cbootstrap_barrio%2Fbreadcrumb%2Cbootstrap_barrio%2Fform%2Cbootstrap_barrio%2Fgesta_opensans'+
        '%2Cbootstrap_barrio%2Fglobal-styling%2Cchosen%2Fdrupal.chosen%2Cchosen_lib%2Fchosen.css%2Cfinra_bootstrap_sass'+
        '%2Fapp-dynamic-reporting%2Cfinra_bootstrap_sass%2Fback-button-handler%2Cfinra_bootstrap_sass%2Fcookie-classification'+
        '%2Cfinra_bootstrap_sass%2Fgesta%2Cfinra_bootstrap_sass%2Fglobal-styling%2Cfinra_bootstrap_sass%2FglossaryViews'+
        '%2Cfinra_bootstrap_sass%2Fopensans%2Cfontawesome%2Ffontawesome.webfonts%2Cfontawesome%2Ffontawesome.webfonts.shim'+
        '%2Cparagraphs%2Fdrupal.paragraphs.unpublished%2Csuperfish%2Fsuperfish%2Csuperfish%2Fsuperfish_hoverintent%2Csuperfish'+
        '%2Fsuperfish_supersubs%2Csuperfish%2Fsuperfish_supposition%2Csystem%2Fbase%2Cviews%2Fviews.ajax%2Cviews%2Fviews.module';
    
    # Destination folder.
    $destination = "D:\Data\Temp\Forum\finra";
    
    # Invoke the web call.
    $data = Invoke-RestMethod -Method Get -Uri $uri -UseBasicParsing -ErrorAction:Stop;
    
    #region Use BITS to download the files.
    $bitsFiles = [regex]::Matches($data[2].data, "https.*\.csv", [System.Text.RegularExpressions.RegexOptions]::IgnoreCase).Value | ForEach-Object {
        $parts = $_.Split("/");
        $filename = $parts[$parts.Length - 1];
    
        [PSCustomObject] @{
            Source = $_;
            Destination = "$destination\$filename";
        }
    };
    
    $bitsJobName = "forumExample";
    $bitsFiles | Start-BitsTransfer -DisplayName $bitsJobName -ErrorAction:Stop;
    Get-BitsTransfer -Name $bitsJobName | Remove-BitsTransfer;
    #endregion

     

    Cheers,

    Lain

    • manny213's avatar
      manny213
      Brass Contributor

      Thank you LainRobertson 

      I didn't know about that API.  I will take a look.  I agree that API is the best approach vs scraping.

      I ran the script and it downloaded the files. Thank you!! 

  • manny213's avatar
    manny213
    Brass Contributor

    I took a look at the website. There is a file path to use:

    https://cdn.finra.org/equity/otcmarket/biweekly/shrt20241129.csv
    https://cdn.finra.org/equity/otcmarket/biweekly/shrt20241115.csv

    so we can use https://cdn.finra.org/equity/otcmarket/biweekly/shrt*.csv

     

     

Resources