Author: Greg Reboul


How to build an API proxy leveraging Twitter search API?

If you have not read the intro to API Gateway, you can familiarize yourself with it here.

In this post, we will put ourselves in an API developer’s shoes and try to create our first API. For this example, I have chosen to use the Twitter API. The purpose will be to accommodate its authentication and comply with usage limits; and I think every API developer has an old dusty personal project linked to it. Personally, I do have a few of them.

Apigee concepts

Main concepts

At the highest level, you can find Endpoints: The proxy endpoint is on the frontend and the target endpoint for the service that is backing your API. You can control the behavior of these endpoints by defining flows. Each endpoint has three types of flows: PreFlow, ConditionalFlow(s), and PostFlow executed in that order. Apigee Flows are based on a sequence of atomic operations (policies) manipulating or assessing the requests and responses messages. They are represented as blocks in the UI and you can place them using drag and drop on a flow. All these configurations are stored in XML files.

Build and deployment overview

After using the UI for a bit, I found it quicker and more convenient to directly manipulate the XML, packaging, and uploading them using a very handy tool called apigeecli.

Apigee proxy package structure

$ tree apiproxy/
apiproxy/
├── policies                   <- atomic operations
│   ├── Cleanup-payload.xml
│   ├── EncodeCreds.xml
│   ├── Format-Response.xml
│   ├── Get-Bearer-Token.xml
│   ├── LoadCreds.xml
│   ├── Set-Bearer-token.xml
│   └── Spike-Arrest-1.xml
├── proxies
│   └── default.xml            <- flows for the frontend endpoint
├── resources
├── targets
│   └── default.xml            <- flows for the backend endpoint
└── twitter.xml                <- every element above is referenced here

This folder has to be zipped and then uploaded. If the upload goes well, we can then deploy the new revision.

Let’s get started

At first, I was a bit swamped with the debugging. When my proxy is not syntactically correct, finding the relevant error message is tedious. So far the best method I have found is to get the logs from the runtime pod, as it usually yields a Java stack trace containing the message.

My initial plan was to implement two flows, the happy path when the bearer token is found in the KV store and the other one where we would have to query the OAuth endpoint to obtain this token.

Apigee Twitter

To use the Twitter API you need to register as a developer, you will obtain a key and a password. That will allow you to request a bearer token that you can use to query the Twitter API.

Two things to keep in mind here, Twitter is super clear on what is authorized use of their services, please respect the guidelines. The number of queries is capped and your account can be disabled if you are suspected of abusing the system.

Skeleton

Let’s try to build a simple proxy to Twitter search endpoint.

The minimum required is three files, a proxy spec, a target spec, and the main description.

  • twitter/apiproxy/targets/default.xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<TargetEndpoint name="default">
    <PreFlow name="PreFlow" />
    <Flows/>
    <PostFlow name="PostFlow" />
    <HTTPTargetConnection>
        <URL>https://api.twitter.com/1.1/search/tweets.json</URL>
    </HTTPTargetConnection>
</TargetEndpoint>
  • twitter/apiproxy/proxies/default.xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ProxyEndpoint name="default">
    <PreFlow name="PreFlow" />
    <Flows/>
    <PostFlow name="PostFlow" />
    <HTTPProxyConnection>
        <BasePath>/twitter</BasePath>
    </HTTPProxyConnection>
    <RouteRule name="default">
        <TargetEndpoint>default</TargetEndpoint>
    </RouteRule>
</ProxyEndpoint>
  • twitter/apiproxy/twitter.xml:
<APIProxy name="twitter">
    <BasePaths>/twitter</BasePaths>
    <Policies />
    <ProxyEndpoints>
        <ProxyEndpoint>default</ProxyEndpoint>
    </ProxyEndpoints>
    <TargetEndpoints>
        <TargetEndpoint>default</TargetEndpoint>
    </TargetEndpoints>
</APIProxy>

To deploy these files, we will have to create a bundle as described above, upload it, and deploy it. Two environment variables $APIGEEORG and $APIGEEENV are initialized respectively with our Apigee organization and environment names.

# Get a new access token from a service account and cache it
$ apigeecli token cache ~/service_account_key.json
# Store the org name
$ APIGEE_ORG=$(grep org: hybrid-files/overrides/overrides.yaml | awk '{print $2}')
# Store the env name
$ APIGEE_ENV=$(grep env: hybrid-files/overrides/overrides.yaml | awk '{print $3}')

Keep in mind that apigeecli token has a limited TTL, if you start getting Authentication errors you know what to do.

$ zip twitter.zip -r apiproxy
  adding: apiproxy/ (stored 0%)
  adding: apiproxy/twitter.xml (deflated 56%)
  adding: apiproxy/proxies/ (stored 0%)
  adding: apiproxy/proxies/default.xml (deflated 49%)
  adding: apiproxy/targets/ (stored 0%)
  adding: apiproxy/targets/default.xml (deflated 38%)

$ apigeecli -o $APIGEEORG apis import -f twitter.zip
{
    "basepaths": [
        "/twitter"
    ],
    "configurationVersion": {
        "majorVersion": 4
    },
    "createdAt": "1589905066756",
    "entityMetaDataAsProperties": {
        "lastModifiedAt": "1589905066756",
        "bundle_type": "zip",
        "subType": "Proxy",
        "createdAt": "1589905066756"
    },
    "lastModifiedAt": "1589905066756",
    "name": "twitter",
    "proxies": [
        "default"
    ],
    "proxyEndpoints": [
        "default"
    ],
    "resourceFiles": {},
    "targetEndpoints": [
        "default"
    ],
    "targets": [
        "default"
    ],
    "type": "Application",
    "revision": "1"
}

$ apigeecli -o $APIGEEORG apis deploy -n twitter -r -v 1  -e $APIGEEENV
{
    "environment": "test",
    "apiProxy": "twitter",
    "revision": "1",
    "deployStartTime": "1589905288393",
    "basePath": "/"
}

Now let’s test it in a browser:

Basic Proxy

Perfect! This error is expected as we have not used any type of authentication to talk to Twitter.

Add in Authentication

For the first iteration, we will just add a hard-coded authorization header. To do so we will have to use a policy that manipulates the HTTP request, code name: AssignMessage

To get this bearer token:

$ curl -X POST https://api.twitter.com/oauth2/token --user <twitter_api_key>:<twitter_api_secret> -d grant_type=client_credentials

Each policy is a simple XML file, AssignMessage can also modify or delete any part of the request or response.

  • twitter/apiproxy/policies/Authorization.xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage async="false" continueOnError="false" enabled="true" name="Authorization">
    <DisplayName>Authorization</DisplayName>
    <Properties/>
    <Set>
        <Headers>
            <Header name="host">api.twitter.com</Header>
            <Header name="accept-encoding">gzip,gzip,deflate</Header>
            <Header name="Authorization">Bearer ****</Header>
        </Headers>
    </Set>
    <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
    <AssignTo createNew="false" transport="http" type="request"/>
</AssignMessage>

To reference it, the API definition needs this block added:

<Policies>
    <Policy>Authorization</Policy>
</Policies>

And in the proxy definition:

    <PreFlow name="PreFlow">
        <Request>
            <Step>
                <Name>Authorization</Name>
            </Step>
        </Request>
    </PreFlow>

Testing

After following the same deployment steps (zip, upload, deploy), this is what I get… it works!

$ curl -I  https://runtime.arctiqgreg.team.arctiq.ca/twitter?q=test
HTTP/2 200
cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
content-disposition: attachment; filename=json.json
content-length: 69157
content-type: application/json;charset=utf-8

Now we want to get rid of this content-disposition header. Another AssignMessage will do the job.

  • twitter/apiproxy/policies/FormatResponse.xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage async="false" continueOnError="false" enabled="true" name="FormatResponse">
    <DisplayName>FormatResponse</DisplayName>
    <Properties/>
    <Delete>
        <Headers>
            <Header name="content-disposition">application/json;charset=utf-8</Header>
        </Headers>
    </Delete>
    <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
    <AssignTo createNew="false" transport="http" type="response"/>
</AssignMessage>

This policy is affecting the response back from the target, after declaring it in the API definition, we need to add it to the PreFlow of the response.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage async="false" continueOnError="false" enabled="true" name="FormatResponse">
    <DisplayName>FormatResponse</DisplayName>
    <Properties/>
    <Remove>
        <Headers>
            <Header name="content-disposition"></Header>
        </Headers>
    </Remove>
    <IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
    <AssignTo createNew="false" transport="http" type="response"/>
</AssignMessage>

Now we can see the result in a browser without the forced download.

Adding other types of policies

Our next addition: A rate limiter that will prevent us from exceeding the number of queries allowed by Twitter, code name: SpikeArrest. Twitter Search API limit is 180 Requests per 15 mins. So with 10 requests per minute we are within bounds.

  • twitter/apiproxy/policies/RateLimit.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<SpikeArrest async="false" continueOnError="false" enabled="true" name="RateLimit">
    <DisplayName>RateLimit</DisplayName>
    <Properties/>
    <Identifier ref="request.header.some-header-name"/>
    <MessageWeight ref="request.header.weight"/>
    <Rate>10pm</Rate>
</SpikeArrest>

This policy has to be added to the proxy flow. This is simple enough, just a new step in the default proxy definition before Authorization.

Time to test

To test this feature I will write a few lines of bash:

for i in {0..200}
do
    date +"[%H:%M:%S] > " | tr -d "\n"
    curl -I -s  https://runtime.arctiqgreg.team.arctiq.ca/twitter?q=test$i | head -n 1
    sleep 1
done

And here are the results before and after the rate limiter:

[09:23:51] > HTTP/2 200
[...]
[09:24:18] > HTTP/2 200
[09:24:19] > HTTP/2 429
[09:24:19] > HTTP/2 429
[09:24:19] > HTTP/2 429
[...]

We can see this HTTP 429 Too Many Requests after the 180 allowed requests. Now let’s deploy, let it cool down and retry.

[09:34:41] > HTTP/2 200
[09:34:42] > HTTP/2 200
[09:34:44] > HTTP/2 500
[09:34:45] > HTTP/2 500
[09:34:46] > HTTP/2 500
[09:34:48] > HTTP/2 500
[09:34:49] > HTTP/2 200
[09:34:51] > HTTP/2 500
[09:34:52] > HTTP/2 500
[09:34:53] > HTTP/2 500
[09:34:55] > HTTP/2 200
[09:34:56] > HTTP/2 500
[09:34:57] > HTTP/2 500
[09:34:59] > HTTP/2 500
[09:35:00] > HTTP/2 200

As we can see Apigee is now keeping a rolling count of requests and yields 500 errors when the limit is reached, effectively accepting only 1 request every 6 seconds. By using a Conditional Flow we can rewrite it to a more meaningful message.

You can see here the graph during my dev and testing:

Apigee Perf Dashboard

If you would like to know more about API development, Apigee will release a new version of their Coursera training this summer, the existing one is still good, but a bit dated (2018). Some features have changed since, mostly around scripts and deployments. Stay tuned!

First impressions during this journey

What I liked so far is the apigeecli which makes iterations fast and allows us to use a source code management (SCM) system to version and peer-review changes. The fact that you can always go back to the UI and take advantage of the graphical representation of the flow is a big plus. At first glance, you will be able to notice if your policies are not where they should be. It helps to make sure the naming is consistent, such as the DisplayName versus the name attribute of the root element.

Apigee Trace Tool

Speaking of consistency, it is something I consider important as the wording is not always what you would expect. Take Set/Remove vs Add/Delete, for example. Flows use Steps identified by Names but the API definition uses Policies for the same thing. The naming convention is a mix of CamelCase, hyphenated or lower cases.

Something I disliked, as a Python aficionado, is the limited support of embedded scripts. You will be limited to what Jython 2.5.2 offers which is far from ideal given that Python values reside mostly in the plethora of available modules.

At first, I struggled a bit to find the relevant error messages but I ended up using the output of apigeectl API import and the logs from the runtime pod. Once you know how to find them, the experience becomes much better.

I have almost forgotten the cherry on the cake: the Trace tool. It is IMHO the best feature where you can trace step-by-step your flows and see how the payload gets transformed along the way.

Apigee Trace Tool

What’s to come ?

As we have seen, Apigee allows us to limit and secure calls to our API backends. Next time, we will see how we can use conditional flows to adapt and format responses to dynamically fetch an OAuth token, and cache it to the Key-Value integrated store.

Interested in porting your existing Apigee proxies to your on-prem infrastructure ?

//take the first step

Tagged:



//comments


//blog search


//other topics