Joule: Decentralized Data Processing

Joule is a framework for decentralized data processing. Joule distributes computation into independent executable modules that are connected by timestamped data flows called streams. Streams can connect modules executing on any device in the network enabling complex pipelines that distribute computation from edge nodes all the way to the data center.

_images/module_stream.png

A typical deployment is shown below. Embedded sensors collect high bandwidth data (module 1) and perform feature extraction locally (module 2). This lower bandwidth feature data is transmitted to local nodes that convert this into actionable information by applying machine learning (ML) models (module 3). Aggregation nodes at the data center collect streams from a variety of local nodes and perform more computationally intensive tasks like training new ML models (module 4).

_images/pipeline_example.png

See the Getting Started for quick introduction. Then read Using Joule for an overview of how the system works.