Query an Athena database over a URL with AWS Lambda

This tutorial will show you how to query an Athena database over a url by using AWS Lambda. This will allow you to easily query your database using JavaScript or a server-side language such as Python or PHP. In other words, it will show you how to turn your Athena database into a Web API.

It assumes you have already imported data into an Athena database. If you haven't done that, please follow the tutorial: Load a CSV file into AWS Athena for SQL Analysis first.

This tutorial will assume that you followed that tutorial and have an Athena database with the table name of worldcities. If your table name differs, adjust the following code accordingly.

Let's get started:

  1. Login to your AWS Account.

  2. Open Lambda and click "Create Function"

  3. Give the function a name of query_worldcities with a Runtime of Python 3.9.

  4. Under "Advanced Settings" click "Enable function URL", choose and "Auth type" of NONE. This means anyone with the url will be able to access your database. You wouldn't want to use these settings for sensitive data, but they simplify things for our purposes.

  5. Click "Create Function".

  6. Click "Configuration" and then "Function URL" to get the url where you can view the output from the Lambda function. It should look something like: https://y75op6ymubgwsi4dtbi5rw4hq40iylrb.lambda-url.us-east-1.on.aws/

  7. Enable CORS. We want the url to be accessible to JavaScript calls from any webpage. So, Click "Edit" and then check "Configure cross-origin resource sharing (CORS)" and set Allow origin to be "*".

  8. Go to that url, you should see a welcome message from the default Lambda function: We now need to change the Lambda function so that going to this url will query the database.

  9. We now need to give Lambda permission to query Athena. Click "Configuration" --> "Permssions" and then "Edit" under "Execution Role".

  10. You'll see that a role has already been created for you. Click the link to view that role in the IAM console.

  11. Under the role, click "Add permissions" and "Create inline policy":

  12. Choose JSON and paste in the following:
  13. {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "athena:*",
                "Resource": "*"
            },
            {
                "Effect": "Allow",
                "Action": "s3:*",
                "Resource": [
                    "arn:aws:s3:::simplemaps-example-athena/*",
                    "arn:aws:s3:::simplemaps-example-athena"
                ]
            },
            {
                "Effect": "Allow",
                "Action": "glue:*",
                "Resource": "*"
            }
        ]
    }
    

    where simplemaps-example-athena is the name of the bucket that contains your Athena queries. Note, we need to grant access to both the bucket and the contents of the bucket. Review the policy. Name the policy query_worldcities_policy and click "Create policy". You've now given the Lambda function the access it needs to query Athena and update the associated S3 bucket.

  14. Go back to Lambda and open your function.
  15. Go to "Configuration" and "General Configuration" and set the "Timeout" to be 30 seconds. This will prevent any longer queries from timing out.
  16. Go to the "Code" tab and replace the lambda_function file with the following code, save and deploy:
  17. import boto3
    import time
    import json
    import re
    import csv
    ### Copyright 2022, Simplemaps.com, https://simplemaps.com, Released under MIT license - https://opensource.org/licenses/MIT ### 
    
    def query_athena(params):
          
      def athena_to_s3(session, params, max_execution = 20):
          
          def single_query(client, params):
            QueryString = params["query"]
            QueryExecutionContext = {'Database': params['database']}  
            ResultConfiguration = {'OutputLocation': 's3://' + params['bucket'] +'/' + params['path']}
            response = client.start_query_execution(QueryString=QueryString, QueryExecutionContext=QueryExecutionContext, ResultConfiguration=ResultConfiguration)
            return response
      
          client = session.client('athena', region_name=params["region"])
          execution = single_query(client, params)
          execution_id = execution['QueryExecutionId']
          state = 'RUNNING'
    
          while (max_execution > 0 and state in ['RUNNING', 'QUEUED']):
              max_execution = max_execution - 1
              response = client.get_query_execution(QueryExecutionId = execution_id)
              if 'QueryExecution' in response and \
                      'Status' in response['QueryExecution'] and \
                      'State' in response['QueryExecution']['Status']:
                  state = response['QueryExecution']['Status']['State']
                  if state == 'FAILED':
                      error_reason = response['QueryExecution']['Status']['StateChangeReason']
                      return False
                  elif state == 'SUCCEEDED':
                      s3_path = response['QueryExecution']['ResultConfiguration']['OutputLocation']
                      filename = re.findall('.*\/(.*)', s3_path)[0]
                      return filename
              time.sleep(1)     
          return False    
          
      def cleanup(session, params): #deletes all files in path
          s3 = session.resource('s3')
          my_bucket = s3.Bucket(params['bucket'])
          for item in my_bucket.objects.filter(Prefix=params['path']):
              item.delete()    
    
      session = boto3.Session()
      filename = athena_to_s3(session, params)
      key =  '{path}{filename}'.format(path=params['path'], filename=filename)
      s3 = session.resource('s3')    
      obj = s3.Object(params['bucket'], key).get()
      obj_lines = [x.decode("utf-8") for x in obj["Body"].iter_lines()]
      cleanup(session, params)    
      output = [x for x in csv.DictReader(obj_lines)]
      return output
    
    def get_query_from_url(params, event): #formulate query from url
        url_parameters = event.get('queryStringParameters')
        query = params['query'] 
        if url_parameters:
          url_query, field, value, operator = url_parameters.get('query'), url_parameters.get('field'), url_parameters.get('value'), url_parameters.get('operator', '=')
          field2, value2, operator2 = url_parameters.get('field2'), url_parameters.get('value2'), url_parameters.get('operator2', '=')
          limit = url_parameters.get('limit', '10')
          orderby = url_parameters.get('orderby', '')
          orderby = ' order by {}'.format(orderby) if orderby else ''
          if url_query:
            query = url_parameters.get('query')
          elif field and value:
            if field2 and value2:
              query = "select * from {} where {}{}'{}' and {}{}'{}'{} limit {};".format(params['table'], field, operator, value, field2, operator2, value2, orderby, limit)
            else:
              query = "select * from {} where {}{}'{}'{} limit {};".format(params['table'], field, operator, value, orderby, limit)
          
          #convert any string values into numbers
          pattern = r"(.*)([<>]=?)'([\d\.]*)'(.*)"
          replacement = r"\1\2\3\4"
          query = re.sub(pattern, replacement, query)
          query = re.sub(pattern, replacement, query)
             
        return query
    
    def lambda_handler(event, context):
     
        params = {
            "region": "us-east-1",
            "database": "default",
            "table": "worldcities",
            "bucket": "simplemaps-example-athena",
            "path": "query/",
            "query": "select * from worldcities limit 1;" #default query
        }
        
        params['query'] = get_query_from_url(params, event)
    
        output = query_athena(params)
    
        return {
            'statusCode': 200,
            'body':json.dumps(output, ensure_ascii=False)  
        }
    

    Download lambda_function.py

    The full details of the script are outside of the scope of this tutorial, but once we minimize a couple of functions, the structure becomes clear:

    • The function get_query_from_url converts the url into an SQL compatible query.
    • The function query_athena runs that query against the Athena database and returns the results as a JSON list.
    • The function lambda_handler outputs the results based on your provided inputs.
  18. You can query the database using an SQL query that is passed through the url parameter query.

    So, if your query is:
    select * from worldcities where iso2='US' limit 1;

    You must escape it to make it safe for urls:
    select%20%2A%20from%20worldcities%20where%20iso2%3D%27US%27%20limit%201%3B

    and then append it to your url like this:
    ?query=select%20%2A%20from%20worldcities%20where%20iso2%3D%27US%27%20limit%201%3B

    And you'll get the following result in the browser:

    [{"city": "New York", "city_ascii": "New York", "lat": "40.6943", "lng": "-73.9249", "country": "United States", "iso2": "US", "iso3": "USA", "admin_name": "New York", "capital": "", "population": "18713220", "id": "1840034016"}]
    

  19. Alternatively, you can query the database using the following parameters:
    • field (the first field's name)
    • value (the first field's desired value)
    • operator (= by default, > or <)
    • field2 (the second field's name)
    • value2 (the second field's value)
    • operator2 (= by default, > or <)
    • orderby (e.g. population desc)
    • limit (e.g. 1)

    Example:
    ?field=iso2&value=US&field2=city&value2=Chicago&limit=1

    This method is less powerful than query, but it is simpler and easier to read.

  20. If you'd like to query over AJAX, here is a simple example:
  21. var query = "select * from worldcities where iso2='US' and admin_name='Kansas' limit 1;";
    var base_url = "https://y75op6ymubgwsi4dtbi5rw4hq40iylrb.lambda-url.us-east-1.on.aws/?query=";
    var url = base_url + encodeURIComponent(query);
    
    //Assumes you have jQuery installed
    $.ajax({
      url: url,
      type: 'get',
      success: function(response){
        console.log(response)
      }
    });
    
  22. I hope you've found this tutorial helpful. Feel free to try it with any of our data products. If you have trouble, feel free to contact us.

Home (Maps) | Database License | Privacy | Database Releases | Database Order Lookup | Resources | Contact | FAQs
SimpleMaps.com is a product of Pareto Software, LLC. © 2010-2024.