Language Evaluation & Cultural QA for AI Safety

AI Singapore | Feb 2025 – Present


Context

As part of AI Singapore’s efforts to develop large language models (LLMs) that are contextually fluent, culturally grounded, and safe for multilingual deployment, I worked on a specialized team evaluating AI-generated responses in Javanese and Indonesian. The goal was to ensure naturalness, factual alignment, and cultural sensitivity—especially in high-stakes domains such as labor, domestic work, and identity representation.

 


1. Javanese Dialect & Register Evaluation

Goal

To assess whether AI-generated Javanese dialogue demonstrated accurate use of speech levels (e.g., krama vs. ngoko) and regional dialects (e.g., Solo), while preserving conversational coherence and cultural appropriateness.

 

What I Did

  • Reviewed hundreds of model-generated dialogues involving user–assistant interactions
  • Evaluated linguistic accuracy, coherence, tone, and dialect alignment
  • Flagged literal translations from English that resulted in unnatural phrasing or mismatched social register
  • Clarified internal confusion between dialect labeling and sociolinguistic register distinctions

 

Skills & Knowledge Applied

  • Sociolinguistics: Applied theoretical understanding from undergraduate coursework and thesis research
  • Dialectology & Speech Level Analysis: Differentiated between krama/ngoko and regional varieties like Surabaya vs. Solo-Yogyakarta
  • Language-Culture Mapping: Assessed linguistic expressions for alignment with Javanese cultural nuance

 

 

 

2. Indonesian Cultural & Labor Ethics QA

Goal

To evaluate AI responses to Indonesian-language prompts involving domestic work, employment norms, and migrant labor—with an emphasis on fairness, empathy, and cultural realism.

 

What I Did

  • Assessed AI outputs for factual accuracy, ethical framing, and tone appropriateness
  • Reviewed prompt–response pairs dealing with holiday labor, wages, and employer–employee dynamics
  • Identified assumptions that risked reproducing biases or harmful norms in labor discourse
  • Submitted structured feedback to guide dataset refinement and alignment protocols

 

Skills & Knowledge Applied

  • Public Policy & Labor Ethics: Used policy background to assess fairness and social alignment
  • Linguistic Naturalness & Fluency: Evaluated tone and idiomatic clarity in Bahasa Indonesia
  • Cross-cultural Sensitivity: Identified responses that could be misinterpreted across Southeast Asian contexts

 

Confidentiality Note

Due to my contractual obligations with AI Singapore, I am unable to share direct examples or datasets. However, this work involved the evaluation of pre-release LLM behavior in linguistically and ethically complex scenarios, contributing to ongoing efforts in safe, inclusive AI development for multilingual Southeast Asia.


 

What I Learned

This project strengthened my ability to evaluate AI systems not just for accuracy, but for ethical coherence and cultural integrity. It deepened my understanding of language as infrastructure—and the risks and responsibilities of embedding it into AI systems that interact with real people in real-world contexts.