Thursday, March 23, 2017

Bringing Rust's Result type to Java

The Rust programming language was designed without exceptions to handle errors. Instead, the concept of errors is addressed with the generic Result<T, E> enum type. In this post I will compare Rust's and Java's error handling mechanisms and discuss if and how Rust's way of doing it can be applied to Java.

Java exceptions

Java's error handling mechanism is built around the Throwable interface. Every type (i.e. interface, class) that is a subtype of Throwable can be thrown and caught in a try-catch block. The classes Error and Exception are subtypes of Throwable and whether a type inherits from Error or Exception is the first main distinction: Errors are really exceptional (no pun intended) situations that shouldn't be caught or handled by the overwhelming majority of programs. Exceptions on the other hand are the bread and butter for Java developers.

  +-----------+
  | Throwable |-------+
  +-----------+       |
        |             |
        V             V
    +-------+   +-----------+
    | Error |   | Exception |-----------+
    +-------+   +-----------+           |
                      |                 |
                      V                 V
          +------------------+   +-------------+
          | RuntimeException |   | IOException |
          +------------------+   +-------------+

When Java was designed, the decision was made to add the concept of checked exceptions. A checked exception is any class that inherits from Exception and doesn't also have RuntimeException as an ancestor, which in turn is a direct subtype of Exception. Whenever your code calls a method that may throw a checked exception, you have to handle it, either by adding a compatible throws declaration to your method or by catching it in a try-catch block.

Rust Results

Rust's error handling mechanism is built around the generic Result<T, E> enum type. The enum is defined as follows:

pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

A function that returns a string but can fail (recoverably) will define its return type as Result<String, E> where E is the type that is returned in the error case. Rust's type system is designed in such a way that the simplest definition of a Result is actually Result<(), ()>, so both the happy and the error case contain an empty tuple as the payload. Even this case conveys a minimal amount of information: whether the operation succeeded or failed, and it does so in a more explicit way than a boolean return type could achieve.

Similar to Java's Error and Exception types, Rust has a second way of 'handling' errors: the thread panic. When a thread panics it means something has gone horribly wrong. Thread panics are not meant to be caught. Even though panics are are comparatively extreme measure, they're not without use.

When you design your API you have to think about what kinds of errors deserve to be treated as recoverable and what kinds are for example the result of false usage. An example from the Java world would be the well-known NullPointerException: You usually wouldn't define a catch block that handles an NPE, because it's usually the developer's fault. A NumberFormatException on the other hand could very well be the result of a user entering an invalid value.

An example for this kind of consideration in API design is Rust's Vec type: the get method returns an Option<T>, so trying to get an index that doesn't exist in the data structure will never panic, but it could tell you that there's nothing under that index by returning None. However, Vec also allows you to access its elements by index, using the square bracket syntax (foo[5]), but it's important to know that this will make the thread panic if the index is out of bounds. It's important to get this design right because it greatly influences the usability of your API - panic too often and the users of your API need to do a lot of verifications; overuse the Result type and developers need to handle them all over the place - in both cases the usability of your API suffers.

A Result type for Java

I've created the result-flow library, which brings a Result interface and the implementing classes Ok and Err. It's located in the Nexus repository here.

Consider the following example:

public class Numbers {
    public static void main(final String[] args) {
        final Result<Integer, String> result = readLine()
            .andThen(Numbers::parseInt)
            .map(Numbers::doubleUp);
        System.out.println(result);
    }
 
    private static Integer doubleUp(final Integer value) {
        return value * 2;
    }
 
    private static Result<Integer, String> parseInt(final String input) {
        try {
            return Result.ok(Integer.parseInt(input));
        } catch (final NumberFormatException e) {
            return Result.err(e.getMessage());
        }
    }
 
    private static Result<String, String> readLine() {
        try {
            final InputStreamReader in = new InputStreamReader(System.in);
            final BufferedReader buf = new BufferedReader(in);
            return Result.ok(buf.readLine());
        } catch (final IOException e) {
            return Result.err(e.getMessage());
        }
    }
}

The main method reads a line from stdin, then parses the read line to an Integer and finally doubles the value. As you can see, this code does not handle an error at all, it simply prints the result at the end. If the user enters a valid integer the output will be something like Ok(14). Should the user input something like 'a', the output will be Err(For input string: "a"), so the Err wraps the message of the NumberFormatException.

Notice the difference between andThen and map: The former is used when the method to be called returns a Result, whereas the latter is used when that method does not fail with a Result itself.

Notice also that an IOException that occurs when we try to read from the InputStream will also be wrapped in an Err. This obviously doesn't make a lot of sense in production code. Depending on the context an IOException would rather be treated as an exceptional or unrecoverable error.

Hence, my advice would be to keep any truly exceptional and/or unrecoverable errors like the aforementioned panics in Rust and use the conventional try-catch block on some level in the call stack. For errors of the application domain however, I think the pattern could be applicable on the JVM.

Error types

The Result type is generic, so any type of error (or ok value of course) is possible. In the Rust world a common pattern is to use enums as error types, but depending on the necessary information structs are not unheard of in this role either. When you use a library (or crate for Rustaceans) that returns Results it is typical to either wrap or translate the erroneous values into a type of the domain of the application, typically an enum.

pub enum ApplicationError {
  AppError,         // some meaningful error in the application
  DbError(PgError), // wraps an error of the database connector
}

Rust enums are more powerful than Java's in the sense that they can wrap values, whereas Javas enum instances are static. This is easily overcome in Java by using actual classes or instances respectively, it cannot help with the language-specific problems.

Match expressions

Rust's match statement can be compared to Java's switch, but it is much more powerful. For instance the Java compiler will not complain about a switch statment over an enum that is not exhaustive, whereas Rust will fail the compilation if not all enum values have been addressed. Furthermore, Rust's match statement can actually look into the provided enum and bind the contained value to variables. This is one shortcoming that cannot easily be helped in Java. Less important in this context but nonetheless worthy of mentioning: Rust's match is an expression and can return a value, whereas Java's switch is a statement.

let foo: Result<Stringi8> = Ok("Hi!");
match foo {
  Ok(x) => println!("Got Ok: {}", x),
  Err(f) => println!("Got error: {}", f),
}

Macros

Rust's support for macros adds greatly to the usefulness of the Result enum, because it enables a function to not explicitly handle an error but to stop the execution and return the error. This is closely related to Java's throws declaration.

fn foo() -> Result<String, ()> {
  let b = bar()?;
  let c = try!(bar());
 
  // do something
}
 
fn bar() -> Result<String, ()> {
  Err(())
}

In the example above, function foo calls function bar. Both functions have the same return type. Rust's compiler complains about unhandled results, but foo doesn't want to handle any errors. Instead it uses the try!(<expr>) macro (which can also be written as <expr>?) that generates the necessary code to return an eventual error preemptively from the function. The Java equivalent can be seen in the next code sample. This is a feature that cannot be mimicked in Java.

public String foo() throws MyError {
  final String b = bar();
  final String c = bar();
 
  // do something 
}
 
public String bar() throws MyError {
  throw new MyError();
}

Conclusion

The biggest disadvantage that I see with Rust's Result type in Java is that it breaks with the idiomatic way to code in Java and that the developer has to think very carefully about which errors they encode in a Result and which of them as RuntimeExceptions (or panics in Rust). A great difficulty are third-party libraries as well as some parts of the standard library that rely on checked exceptions. Those will most probably have to be wrapped with try-catch and converted to either RuntimeExceptions or Results.

The great advantage of the approach is the way it enables a more functional type of programming, like version 8 of Java did with the Optional type. I have yet to try the library in any type of project apart from small experiments. Should you try it out I'll be glad to have your feedback and thoughts about it.

Monday, March 6, 2017

My shot at RESTful Microservices in Rust - Part 3

Part 3 - Linking REST endpoint and db layer

Welcome to part 3 of my Rust microservices series! If you haven't read parts 1 or 2, here are the respective links: part 1 part 2. In this installment I'm going to connect the REST endpoint with the database layer and take care of serialization and deserialization of the Rust structs.

JSON serialization

There are several crates that give you automatic serialization and deserialization of structs to JSON strings. I'm going to use Serde in this PoC. Serde is divided into a core crate and one additional crate per source/target format. So I'm going to use the crates serde, serde_derive and serde_json. The crate serde_derive contains the Serialize and Deserialize macros that implement the trais with same names. This enables us to serialize a struct by calling serde_json::to_string.

src/models/game.rs:

#[derive(Debug, Serialize)]
pub struct DbGame { /* omitted. */ }
 
#[derive(Debug, Serialize)]
pub struct Dimensions { /* omitted. */ }

src/main.rs:

#[macro_use] extern crate serde_derive;
extern crate serde_json;
 
fn main() {
    for game in dao::get_games() {
        println!("{:?}", serde_json::to_string(&game).unwrap());
    }
}

Unsurprisingly, deserialization works the same way.

Connecting the REST endpoint to the database

I'm gonna create a simple endpoint listening on GET /games that will return a list of all games. src/main.rs:

fn main() {
    let mut server = Nickel::new();
    server.get("/games", middleware! {|_req, mut resp|
        resp.set(MediaType::Json);
        let games = dao::get_games();
        serde_json::to_string(&games).unwrap()
    });
    server.listen("0.0.0.0:8080")
        .expect("Error starting server");
}

When I cURL this endpoint I get

< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Sun, 05 Mar 2017 18:26:36 GMT
< Server: Nickel
< Transfer-Encoding: chunked
<
* Connection #0 to host localhost left intact
[{"id":1,"dimensions":{"x":3,"y":3}},{"id":2,"dimensions":{"x":4,"y":5}}]

Deserializing JSON

So now that we've got a working endpoint that lists all the games, let's add one that actually creates a game. I'm gonna keep things simple here and let the caller choose the id of the game and not care about key uniqueness issues for this PoC. The first step is to add the Deserialize macro to the entity structs. After that it's mostly about the dao and the controller code.

src/main.rs:

// ...
server.post("/games"middleware! {|req, mut resp|
    match get_game_from_request(req) {
        Ok(game) => {
            resp.set(StatusCode::Created);
            dao::create_game(game);
            "Ok!".to_string()
        },
        Err(e) => {
            resp.set(StatusCode::BadRequest);
            e
        }
    }
});
// ...
fn get_game_from_request(
    req: &mut nickel::Request,
) -> Result<DbGame, String> {
    let mut body = String::new();
    req.origin.read_to_string(&mut body).unwrap();
    serde_json::from_str::<DbGame>(&body)
        .map_err(|e| e.description().to_string() )
}

Nickel provides built-in JSON deserialization, but this feature relies on the rustc_serialize crate, which I'm not using. Serde is a newer and more modular implementation for serialization and deserialization. The get_game_from_request function extracts the body from the request and then tries to deserialize it. The database access code is straight-forward:

game_dao.rs:

pub fn create_game(game: DbGame) {
    let conn = connect();
    conn.execute(r#"
        INSERT INTO games (id, dimension_x, dimension_y)
        VALUES ($1, $2, $3)"#,
        &[&game.id, &game.dimensions.x, &game.dimensions.y]
    ).expect("Error inserting into database");
}

As promised at the beginning, I don't care about primary key uniqueness in this PoC, so if you try to POST a game with an id that's already there, the thread is going to panic.

Conclusions

We've seen that it is possible to create microservices in Rust with little effort, even though compared to older languages there's more boilerplate code that you have to write yourself. Especially Nickel seems to have a lot of room for improvement. I don't like that you seem to have to return a String from every endpoint definition in the middleware! macro, but then I'm not very good at reading macro definitions in Rust yet.

One could think that interacting with postgres directly and not using an OR-Mapper is a bad idea, but I think that especially in microservices, the number of entities is usually small enough for that not to matter too much.

This concludes the third and last part of this proof of concept. You can find the source code here. Thanks for reading and please feel free to comment.

Wednesday, March 1, 2017

My shot at RESTful Microservices in Rust - Part 2

Part 2 - Database interaction

Welcome back! If you haven't read part 1 yet: this series of blog posts is about creating a simple RESTful service in Rust. After setting up the project in part 1, I'm gonna set up a basic database interaction, to make the scenario more realistic.

I initially wanted to use a full-fledged ORM solution for this PoC but then decided it's better to concentrate on a few things at a time. To put it in a nutshell, for this project I use Diesel's migration features without the actual OR-mapping.

Diesel setup

Diesel comes as a library and additionally as a tool for the command line, called diesel_cli. I install the command line tool with cargo install diesel_cli.

For diesel to know how to connect to the database I add a .env file to the project:

DATABASE_URL=postgres://postgres@localhost/battleship

The .env file is just a means of collecting environment variables and it can easily incorporated into your program with the dotenv library.

Now i need to create a database. I chose to just spin up a dockerized postgres server for development purposes like so:

docker run \
  -d --name battleship_db \
  -p 5432:5432 -e POSTGRES_PASSWORD='' \
  postgres

When I now run diesel setup two things happen:

  1. a migrations directory is created
  2. the battleship database is created inside the postgres container

A database migration

Now that there is a database, I'll create a migration to initialize the database with a table. I run diesel migration generate create_games, which creates two files in migrations/20170301195954_create_games/: up.sql and down.sql. Unsurprisingly, one of them is used to make a change in the database, whereas the other reverts the change.

up.sql:

CREATE TABLE games (
  id BIGSERIAL NOT NULL PRIMARY KEY,
  dimension_x INTEGER NOT NULL,
  dimension_y INTEGER NOT NULL
);

down.sql:

DROP TABLE games;

I now run diesel migration run and up.sql is executed in the dockerized database.

The model

I need a representation of a game in Rust, so I create the following structs:

src/models/game.rs:

#[derive(Debug)]
pub struct DbGame {
    pub id: i64,
    pub dimensions: Dimensions,
}
 
#[derive(Debug)]
pub struct Dimensions {
    pub x: i32,
    pub y: i32,
}

Interacting with the db

Since I'm not using an OR-Mapper, I'm gonna query the database through plain SQL, using the native postgres driver (added to Cargo.toml). Futhermore, I'll use dotenv to get the database connection URL from .env.

I create a method that establishes a database connection and another one that queries the games table for all entries. The latter iterates over the results and maps each row to a DbGame using one of the standard type conversion mechanisms in Rust, the From trait. For this to work, there must be an implementation of From<Row> for DbGame, which is listed below.

src/dao/game_dao.rs:

use dotenv::dotenv;
use models::DbGame;
use postgres::{Connection, TlsMode};
use std::env;
 
fn connect() -> Connection {
    dotenv().ok();
    let database_url = env::var("DATABASE_URL")
        .expect("DATABASE_URL must be set");
    Connection::connect(&*database_url, TlsMode::None)
        .expect(&format!("Error connecting to {}"&database_url))
}
 
pub fn get_games() -> Vec<DbGame> {
    let conn = connect();
    let rows = conn.query("SELECT * FROM games"&[])
        .expect("Error querying database");
 
    rows.iter()
        .map(DbGame::from)
        .collect()
}

src/models/game.rs:

impl<'a> From<Row<'a>> for DbGame {
    fn from(row: Row) -> Self {
        DbGame {
            id: row.get("id"),
            dimensions: Dimensions {
                x: row.get("dimension_x"),
                y: row.get("dimension_y"),
            },
        }
    }
}

I can then list the database entries in main.rs:

fn main() {
    for game in dao::get_games() {
        println!("{:?}", game);
    }
}

Which yields the following output for me, after I've manually inserted some data:

DbGame { id: 1, dimensions: Dimensions { x: 3, y: 3 } }
DbGame { id: 2, dimensions: Dimensions { x: 4, y: 5 } }

This concludes part 2 of the PoC. In part 3 I will show how I connected the database layer with the REST endpoint and how to convert the Rust structs into JSON.