Latest Post

Reinforcement Learning for Credit Scoring: Applications in Fintech

Here’s something that’ll blow your mind: the way fintech companies decide whether to lend you money is getting a serious upgrade. And I’m not talking about minor tweaks to old formulas — I’m talking about reinforcement learning algorithms that literally learn from every lending decision they make.

TensorFlow Extended (TFX): Production ML Pipelines for Python

Your model works perfectly in your Jupyter notebook. Then your manager asks you to put it in production, retrain it monthly, monitor for data drift, and handle edge cases gracefully. Suddenly you’re drowning in infrastructure code — data validation, preprocessing pipelines, model versioning, serving infrastructure, monitoring systems. Six months later, you’re maintaining more plumbing code than actual ML code, and you’re wondering if there’s a better way.

I learned TFX the hard way on a project that went from “prototype” to “critical production system” in three months. We spent weeks building custom pipelines, validation, and monitoring before discovering TFX does all of it — and does it better. TFX is Google’s answer to production ML, battle-tested on systems serving billions of predictions. It’s overcomplicated for prototypes but invaluable for real production systems.

Let me show you what TFX actually does and when it’s worth the steep learning curve.

TensorFlow Extended

What Is TFX and Why It Exists

TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. It’s not a training library — it’s infrastructure for everything around training.

What TFX handles:

  • Data validation and statistics
  • Feature engineering at scale
  • Training and hyperparameter tuning
  • Model analysis and validation
  • Model serving and deployment
  • Pipeline orchestration
  • Metadata tracking

What TFX is NOT:

  • A replacement for TensorFlow/Keras
  • Easy to learn (it’s complex)
  • Necessary for small projects
  • The only way to do production ML

Think of TFX as Kubernetes for ML pipelines — powerful, scalable, and way more complicated than you need until you really need it.

When You Actually Need TFX

Before diving in, understand when TFX makes sense:

Use TFX when:

  • Models retrain regularly (weekly/monthly)
  • Multiple models in production
  • Team size > 5 ML engineers
  • Handling terabytes of data
  • Strict validation requirements
  • Need reproducible pipelines
  • Scale matters (millions of predictions)

Skip TFX when:

  • One-off model deployment
  • Team size < 3
  • Prototyping or experimenting
  • Simple batch predictions
  • Using simpler alternatives works

I’ve seen teams waste months implementing TFX for a single model that retrains quarterly. That’s overkill. TFX pays off at scale, not for toy projects.

Core TFX Components (The Building Blocks)

TFX is modular. You use the components you need:

ExampleGen: Data Ingestion

Ingests and splits data into train/eval sets:

python

from tfx.components import CsvExampleGen
# Ingest CSV data
examples = CsvExampleGen(input_base='data/')

Supports CSV, TFRecord, BigQuery, and custom formats. Handles data splitting automatically.

StatisticsGen: Data Analysis

Generates statistics about your data:

python

from tfx.components import StatisticsGen
# Generate statistics
statistics = StatisticsGen(examples=examples.outputs['examples'])

Computes distributions, missing values, correlations — everything you’d do in exploratory analysis, but automated and scalable.

SchemaGen: Data Schema

Infers schema from statistics:

python

from tfx.components import SchemaGen
# Infer schema
schema = SchemaGen(statistics=statistics.outputs['statistics'])

Defines expected data types, ranges, and distributions. This becomes your data contract.

ExampleValidator: Data Validation

Validates new data against schema:

python

from tfx.components import ExampleValidator
# Validate data
validator = ExampleValidator(
statistics=statistics.outputs['statistics'],
schema=schema.outputs['schema']
)

Catches data drift, anomalies, and schema violations before training. This alone justifies TFX for critical systems.

Transform: Feature Engineering

Applies transformations consistently:

python

from tfx.components import Transform
# Define preprocessing
transform = Transform(
examples=examples.outputs['examples'],
schema=schema.outputs['schema'],
module_file='preprocessing.py'
)

The preprocessing logic applies identically during training and serving — no train/serve skew. This is huge.

Trainer: Model Training

Trains your TensorFlow model:

python

from tfx.components import Trainer
# Train model
trainer = Trainer(
module_file='model.py',
examples=transform.outputs['transformed_examples'],
schema=schema.outputs['schema'],
transform_graph=transform.outputs['transform_graph'],
train_args={'num_steps': 1000},
eval_args={'num_steps': 100}
)

Integrates with TensorBoard, supports distributed training, handles checkpointing.

Evaluator: Model Validation

Validates trained model performance:

python

from tfx.components import Evaluator
# Evaluate model
evaluator = Evaluator(
examples=examples.outputs['examples'],
model=trainer.outputs['model'],
baseline_model=previous_model # Compare to production model
)

Prevents bad models from reaching production. Compares new models to baselines automatically.

Pusher: Model Deployment

Deploys validated models:

python

from tfx.components import Pusher
# Deploy model
pusher = Pusher(
model=trainer.outputs['model'],
model_blessing=evaluator.outputs['blessing'], # Only deploys if blessed
push_destination={'filesystem': {'base_directory': '/serving'}}
)

Only deploys if model passes validation. Supports TensorFlow Serving, AI Platform, and custom deployment targets.

Building Your First TFX Pipeline

Let’s build a complete pipeline:

Step 1: Define Preprocessing

python

# preprocessing.py
import tensorflow as tf
import tensorflow_transform as tft
def preprocessing_fn(inputs):
"""Preprocessing function for Transform component."""
outputs = {}

# Numerical features
outputs['age'] = tft.scale_to_z_score(inputs['age'])
outputs['income'] = tft.scale_to_z_score(inputs['income'])

# Categorical features
outputs['category_idx'] = tft.compute_and_apply_vocabulary(
inputs['category'],
vocab_filename='category_vocab'
)

# Feature crosses
outputs['age_income_cross'] = tft.hash_strings(
tf.strings.join([
tf.strings.as_string(inputs['age']),
tf.strings.as_string(inputs['income'])
], separator='_')
)

# Label
outputs['label'] = inputs['label']

return outputs

This preprocessing applies identically during training and serving.

Step 2: Define Model

python

# model.py
import tensorflow as tf
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils
def _build_keras_model():
"""Build Keras model."""
inputs = {
'age': tf.keras.Input(shape=(1,), name='age'),
'income': tf.keras.Input(shape=(1,), name='income'),
'category_idx': tf.keras.Input(shape=(1,), name='category_idx', dtype=tf.int64)
}

# Concatenate inputs
x = tf.keras.layers.concatenate([
inputs['age'],
inputs['income'],
tf.keras.layers.Embedding(input_dim=1000, output_dim=8)(inputs['category_idx'])
])

# Hidden layers
x = tf.keras.layers.Dense(64, activation='relu')(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Dense(32, activation='relu')(x)

# Output
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model = tf.keras.Model(inputs=inputs, outputs=outputs)

model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)

return model
def run_fn(fn_args):
"""Training function called by Trainer."""
# Load transform output
tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)

# Create train/eval datasets
train_dataset = _input_fn(
fn_args.train_files,
tf_transform_output,
batch_size=32
)

eval_dataset = _input_fn(
fn_args.eval_files,
tf_transform_output,
batch_size=32
)

# Build and train model
model = _build_keras_model()

model.fit(
train_dataset,
validation_data=eval_dataset,
epochs=10,
callbacks=[
tf.keras.callbacks.TensorBoard(log_dir=fn_args.model_run_dir)
]
)

# Save model
signatures = {
'serving_default': _get_serve_tf_examples_fn(model, tf_transform_output)
}

model.save(fn_args.serving_model_dir, save_format='tf', signatures=signatures)
def _input_fn(file_pattern, tf_transform_output, batch_size=32):
"""Create input dataset."""
dataset = tf.data.experimental.make_batched_features_dataset(
file_pattern=file_pattern,
batch_size=batch_size,
features=tf_transform_output.transformed_feature_spec(),
label_key='label'
)
return dataset
def _get_serve_tf_examples_fn(model, tf_transform_output):
"""Create serving signature."""
@tf.function
def serve_tf_examples_fn(serialized_tf_examples):
feature_spec = tf_transform_output.raw_feature_spec()
parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
transformed_features = tf_transform_output.transform_raw_features(parsed_features)
return model(transformed_features)

return serve_tf_examples_fn

Step 3: Create Pipeline

python

# pipeline.py
from tfx import v1 as tfx
from tfx.orchestration import metadata, pipeline
from tfx.orchestration.local.local_dag_runner import LocalDagRunner
def create_pipeline(
pipeline_name: str,
pipeline_root: str,
data_root: str,
module_file: str,
serving_model_dir: str,
metadata_path: str
):
"""Creates TFX pipeline."""

# Data ingestion
example_gen = tfx.components.CsvExampleGen(input_base=data_root)

# Generate statistics
statistics_gen = tfx.components.StatisticsGen(
examples=example_gen.outputs['examples']
)

# Infer schema
schema_gen = tfx.components.SchemaGen(
statistics=statistics_gen.outputs['statistics']
)

# Validate data
example_validator = tfx.components.ExampleValidator(
statistics=statistics_gen.outputs['statistics'],
schema=schema_gen.outputs['schema']
)

# Transform features
transform = tfx.components.Transform(
examples=example_gen.outputs['examples'],
schema=schema_gen.outputs['schema'],
module_file=module_file
)

# Train model
trainer = tfx.components.Trainer(
module_file=module_file,
examples=transform.outputs['transformed_examples'],
schema=schema_gen.outputs['schema'],
transform_graph=transform.outputs['transform_graph'],
train_args=tfx.proto.TrainArgs(num_steps=1000),
eval_args=tfx.proto.EvalArgs(num_steps=100)
)

# Evaluate model
evaluator = tfx.components.Evaluator(
examples=example_gen.outputs['examples'],
model=trainer.outputs['model']
)

# Deploy model
pusher = tfx.components.Pusher(
model=trainer.outputs['model'],
model_blessing=evaluator.outputs['blessing'],
push_destination=tfx.proto.PushDestination(
filesystem=tfx.proto.PushDestination.Filesystem(
base_directory=serving_model_dir
)
)
)

components = [
example_gen,
statistics_gen,
schema_gen,
example_validator,
transform,
trainer,
evaluator,
pusher
]

return pipeline.Pipeline(
pipeline_name=pipeline_name,
pipeline_root=pipeline_root,
components=components,
metadata_connection_config=metadata.sqlite_metadata_connection_config(
metadata_path
)
)
if __name__ == '__main__':
# Create and run pipeline
tfx_pipeline = create_pipeline(
pipeline_name='my_pipeline',
pipeline_root='./pipeline_output',
data_root='./data',
module_file='model.py',
serving_model_dir='./serving_model',
metadata_path='./metadata.db'
)

LocalDagRunner().run(tfx_pipeline)

This creates a complete production pipeline with validation, training, and deployment.

Get clear, high-res images with AI : Click Here

TFX with Apache Beam (Distributed Processing)

For large-scale data processing, use Apache Beam:

python

from tfx.orchestration.beam.beam_dag_runner import BeamDagRunner
# Run pipeline with Beam
BeamDagRunner().run(tfx_pipeline)

This scales preprocessing and validation to massive datasets using distributed computing.

TFX with Kubeflow (Kubernetes Orchestration)

For Kubernetes environments:

python

from tfx.orchestration.kubeflow import kubeflow_dag_runner
# Configure Kubeflow
runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig(
kubeflow_metadata_config=kubeflow_dag_runner.KubeflowMetadataConfig(),
tfx_image='tensorflow/tfx:latest'
)
# Run on Kubeflow
kubeflow_dag_runner.KubeflowDagRunner(config=runner_config).run(tfx_pipeline)

The Harsh Reality of TFX

Let me be brutally honest about TFX’s downsides:

TFX is complicated:

  • Steep learning curve
  • Lots of boilerplate code
  • Debugging is painful
  • Documentation assumes expertise
  • Many moving parts

TFX is opinionated:

  • Forces specific patterns
  • TensorFlow-centric (duh)
  • Limited flexibility
  • Not ideal for research

TFX has overhead:

  • Setup takes days/weeks
  • More code than simple alternatives
  • Requires infrastructure knowledge
  • Overkill for simple projects

I’ve seen teams spend three months implementing TFX for a model that could have been deployed with Flask in a week. That’s wasteful. IMO, TFX makes sense only when you have multiple production models or complex pipelines requiring strong validation and monitoring.

Alternatives to TFX

Before committing to TFX, consider simpler alternatives:

MLflow:

  • Lighter weight
  • Works with any framework
  • Good for small teams
  • Less comprehensive

Kubeflow Pipelines:

  • More flexible
  • Framework-agnostic
  • Kubernetes-native
  • Less opinionated

Airflow + Custom:

  • Maximum flexibility
  • Use existing tools
  • Build what you need
  • More maintenance

ZenML, Metaflow, Kedro:

  • Modern alternatives
  • Better developer experience
  • Less Google-specific
  • Worth evaluating

For many teams, MLflow + simple deployment handles 90% of production needs with 10% of TFX’s complexity.

When TFX Actually Shines

Despite the complexity, TFX excels in specific scenarios:

TFX is worth it when:

  • Running 10+ production models
  • Data validation is critical
  • Team > 10 ML engineers
  • Processing terabytes of data
  • Need reproducible pipelines
  • Already using TensorFlow heavily
  • Scale justifies complexity

Real TFX success stories:

  • Google (obviously — they built it)
  • Twitter (recommendation systems)
  • Spotify (music recommendations)
  • Large enterprises with dedicated ML platform teams

Notice the pattern? Large scale, dedicated teams, critical systems. Not startups or small teams.

The Bottom Line

TFX is industrial-strength production ML infrastructure. It’s powerful, scalable, and battle-tested at Google scale. It’s also complex, opinionated, and overkill for most projects.

Use TFX when:

  • Scale demands it
  • Team size supports it
  • Validation is critical
  • Already invested in TensorFlow
  • Building ML platform for multiple teams

Skip TFX when:

  • Small team or project
  • Simpler tools work
  • Not using TensorFlow
  • Prototyping or experimenting
  • Don’t have ML platform expertise

Most teams should start simpler (MLflow, basic deployment) and graduate to TFX only when hitting real limitations. The complexity overhead is only justified at scale.

Installation:

bash

pip install tfx

But honestly, if you’re just trying TFX out of curiosity, you’ll probably abandon it after a week. TFX requires commitment — architectural decisions, team buy-in, infrastructure support. It’s not a library you casually try on a weekend project.

If you genuinely need production ML pipelines at scale with strong validation and monitoring, TFX is excellent. For everything else, simpler alternatives work better. Don’t let Google’s marketing convince you that TFX is necessary for production ML. It’s one option — a powerful one at the right scale, but often overkill.

Now stop over-engineering your ML deployment and start with whatever gets your model serving predictions reliably. You can always graduate to TFX later when you actually need it. Most teams never do. :)

Comments