Chiwai Chan's Blog

These words are my own

Rockit Apple payslip Analyzer with GenAI Chatbot using Bedrock and Streamlit

Problem

It's the time of year where I normally have to start doing taxes, not for myself but for my parents. Mum works at various fruit picking / packing places in Hawkes Bay throughout the year, so that means there are all sorts of Payslips from different employers for the last financial year. Occasionally mum would ask me specific details about her weekly payslips, and that usually means: download a PDF from and email -> open up the PDF -> find what's she asking for -> look at the PDF -> can't find it so ask what mum meant -> find the answer -> explain it to her.

Solution & Goal

The usual format, challenge: create a Generative AI conversational chatbot to enable mum to ask in her natural language specific details of her Payslips without me

And the goal: outsource the work to AI = more time to play. 🙂

Success Criterias

  • Automatically extract details from Payslips - I've only tested it on Payslips from Rockit Apple.
  • Enable mum to ask in Cantonese details of Payslips
  • Retrieve data from an Athena Table where the Payslip detail will be stored after they are extract from the PDFs
  • Create a Chatbot to receive questions in Cantonese around the user's Payslips stored in the Athena Table, and generate a response back to the user in Cantonese

So what's the Architecture?

Architecture

Note

I've only tried it for Payslips generated by this employer: Rockit Apple

Deploy it for yourself to try out

Prerequisites

  • Python 3.12 installed - the only version I've validated
  • Pip installed
  • Node.js and npm Installed
  • CDK installed - using npm install -g aws-cdk
  • AWS CLI Profile configured
  • Bedrock Models enabled - specifically amazon.titan-text-express-v1

    Deployment CLI Commands

  • Open up a terminal
  • And run the following commands

    git clone git@github.com:chiwaichan/rockitapple-payslip-analyzer-with-genai-chatbot-using-bedrock-streamlit.git 
    cd rockitapple-payslip-analyzer-with-genai-chatbot-using-bedrock-streamlit
    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    cdk deploy

If all goes well

You should see this as a result of calling the cdk deploy command

CDK Deploy

Check that the CloudFormation Stack is being created in the AWS Console

CloudFormation Create

Click on it to see the Events, Resources and Output for the Stack

CloudFormation Create Events

Find the link to the S3 Bucket to upload Payslip PDFs into, in the Stack's Resources, find the S3 Bucket with a with a Logical ID that starts with "sourcepayslips" and click on its Physical ID link.

S3 Buckets

Upload your PDF Payslips into here
S3 Source Payslip PDFs

Find the link to the S3 Bucket where the extracted Data will be stored into for the Athena Table, in the Stack's Resources, find the S3 Bucket with a with a Logical ID that starts with "PayslipAthenaDataBucket" and click on its Physical ID link.

CloudFormation S3 Buckets

There you can find a JSON file, it should take about a few minutes to appear after you upload the PDF.

Athena Table JSON file in S3 Bucket

It was created by the Lambda shown in the architecture diagram we saw earlier, it uses Amazon Textract to extract the data from each Payslip using OCR, using the Queries based feature to extract the Text from a PDF by enabling us to use queries in natural language to configure what we want to extract out from a PDF.
Find the "app.py" file shown in the folder structure in the screenshot below, you can modify the wording of the Questions the Lambda function uses to extract the details from the Payslip, to suit the specific needs based on the wording of your Payslip; the result of each Question extracted is saved to the Athena table using the column name shown next to the Question.

Textract Queries

What it looks like in action

Go to the CloudFormation Stack's Outputs to get the URL to open the Streamlit Application's frontend.

Click the value for the Key "StreamlitFargateServiceServiceURL"

Streamlit URL

That will take you to a Streamlit App hosted in the Fargate Container shown in the architecture diagram, use "cats" as the username and "cats" as the password - make sure you modify the code with your own authentication/authorisation if you are building on top of this.

Streamlit App

Lets try out some examples

Example 1
Example 2
Example 3
Example 4
1 payslip

Things don't always go well

Error

You can tweak the Athena Queries generated by the LLM by providing specific examples tailoured to your Athena Table and its column names and values - known as a Few-Shot Learning. Modify this file to tweak the Queries feed into the Few-shot examples used by Bedrock and the Streamlit app.

Error

Thanks to this repo

I was able to learn and build my first GenAI app: AWS Samples - genai-quickstart-pocs

I based my app on the example for Athena, I wrapped the Streamlit app into a Fargate Container and added Textract to extract Payslips details from PDFs and this app was the output of that.

What's Next

Mum is going to test this app over the next few weeks

Rockit Apple payslip Analyzer with GenAI Chatbot using Bedrock and Streamlit
Scroll to top