Parsing Nested JSON in Haskell with AESON

Parsing JSON in Haskell is one of the most painless experiences one can have as far as parsing goes... except for one little thing. Aeson, which is one of the best JSON libraries for Haskell out there, is so under-documented so as to make it almost impossible to use for all the the most trivial case.

My case was that I needed to parse the Yammer JSON feed, which had quite a few nested fields. The schema looked, basically like this:

 
{
  "messages": [
    {
      "client_url": "https://www.yammer.com/",
      "created_at": "2011/03/28 20:39:12 +0000",
      "system_message": false,
      "body": {
        "parsed": "message with photo attachment.",
        "plain": "message with photo attachment."
      },
      "sender_type": "user",
      "network_id": 104604,
      "thread_id": 84402777,
      "web_url": "https://www.yammer.com/yammerdeveloperstestcommunity/messages/84402777",
      "direct_message": false,
      "id": 84402777,
      "url": "https://www.yammer.com/api/v1/messages/84402777",
      "client_type": "Web",
      "message_type": "update",
      "sender_id": 4022984,
      "replied_to_id": null,
      "attachments": [
        {
          "type": "image",
          "content_type": "",
          "uuid": null,
          "web_url": "https://www.yammer.com/yammerdeveloperstestcommunity/uploads/857663/Firefly.jpg",
          "y_id": 857663,
          "image": {
            "thumbnail_url": "https://www.yammer.com/api/v1/file/857663/Firefly.jpg?view=thumbnail",
            "url": "https://www.yammer.com/api/v1/file/857663/Firefly.jpg",
            "size": 0
          },
          "name": "Firefly.jpg",
          "id": 974915
        }
      ],
      "liked_by": {
        "count": 0,
        "names": []
      },
      "privacy": "public"
    },
    {
      "client_url": "http://www.yammer.com",
      "created_at": "2011/03/25 00:49:29 +0000",
      "system_message": false,
      "body": {
        "parsed": "new test message 1",
        "plain": "new test message 1"
      },
      "network_id": 104604,
      "thread_id": 83957686,
      "sender_type": "user",
      "direct_message": false,
      "web_url": "https://www.yammer.com/yammerdeveloperstestcommunity/messages/83957686",
      "id": 83957686,
      "url": "https://www.yammer.com/api/v1/messages/83957686",
      "client_type": "testingtest",
      "sender_id": 4022984,
      "replied_to_id": null,
      "message_type": "update",
      "liked_by": {
        "count": 0,
        "names": []
      },
      "attachments": [],
      "privacy": "public"
    }
  ],
 
    {
      "type": "user",
      "stats": {
        "followers": 1,
        "updates": 14,
        "following": 2
      },
      "web_url": "https://www.yammer.com/yammerdeveloperstestcommunity/users/mikealrogers-guest",
      "mugshot_url": "https://assets3.yammer.com/images/no_photo_small.gif",
      "url": "https://www.yammer.com/api/v1/users/4022984",
      "full_name": "mikeal",
      "name": "mikealrogers-guest",
      "state": "active",
      "job_title": "Test Title",
      "id": 4022984
    },
    {
      "type": "user",
      "stats": {
        "followers": 1,
        "updates": 4,
        "following": 2
      },
      "web_url": "https://www.yammer.com/yammerdeveloperstestcommunity/users/mknopp",
      "mugshot_url": "https://assets1.yammer.com/user_uploaded/photos/p1/0141/2640/n1644278019_46479_62_small.jpg",
      "url": "https://www.yammer.com/api/v1/users/1452329",
      "full_name": "Matt Knopp",
      "name": "mknopp",
      "state": "active",
      "job_title": null,
      "id": 1452329
    }
 
  ]
}

Not so nice. Particularly, I was interested in extracting each message, a list of users, as well as data on likes. Turns out this isn't too hard in Haskell.

First, we define three types to hold this data.

{-# LANGUAGE OverloadedStrings #-}
 
module Yamulator where
 
import           Control.Applicative
import           Control.Monad
import           Data.Aeson
import           Data.Aeson.Types
import qualified Data.HashMap.Strict  as HM
import qualified Data.ByteString.Lazy.Char8 as C
import qualified Data.Text as T
 
 
data Message = Message {
      mid       :: Integer,
      plainText :: T.Text,
      byUserId       :: Integer,
      likes   :: Integer,
      inReplyTo :: Maybe Integer,
      createdAt :: T.Text
 
} deriving (Eq, Show)
 
data User = User {
      name :: T.Text,
      userId  :: Integer
 
} deriving (Eq, Show)
 
 
data Yammers = Yammers {
      messages :: [Message],
      users    :: [User] 
} deriving (Eq, Show)

This part is important. Note that Yammers contains [Message] and [User].

Next, we define the instances.

instance FromJSON Yammers where
  parseJSON (Object o) = do
      messages <- parseJSON =<< (o .: "messages")
      users <- mapM parseJSON . filter (\(Object ref) -> HM.lookup "type" ref == Just (String "user")) =<< o .: "references"
      return $ Yammers messages users
  parseJSON _ = mzero
 
instance FromJSON Message where
    parseJSON (Object v) = Message <$>
                          v .: "id" <*>
                          ((v .: "body") >>= (.: "plain")) <*>
                          v .: "sender_id" <*>
                          ((v .: "liked_by") >>= (.: "count")) <*> -- note how we can keep drilling into nested structures like this.
                          v .:? "replied_to_id"  <*>
                          v .: "created_at"
 
    parseJSON _ = mzero

Then, reading it in is a piece of cake. We can define a method like the following to serialize things out:

 
decodeYammers :: C.ByteString -> Maybe Yammers
decodeYammers response = decode response
 
decodeUsers :: C.ByteString -> Maybe [User]
decodeUsers response = decode response

It doesn't get any easier. I hope that helps some other Haskellers out there. Many thanks to #haskell for all the help while I was putting this together!