Mutual issues and resolutions for Azure IoT Edge

Utilise this article to find steps to resolve common problems that you may experience when deploying IoT Border solutions. If you need to larn how to notice logs and errors from your IoT Border device, meet Troubleshoot your IoT Border device.

IoT Border agent stops afterward about a minute

Observed behavior:

The edgeAgent module starts and runs successfully for well-nigh a minute, and so stops. The logs betoken that the IoT Border agent attempts to connect to IoT Hub over AMQP, and then attempts to connect using AMQP over WebSocket. When that fails, the IoT Edge agent exits.

Example edgeAgent logs:

              2017-11-28 eighteen:46:xix [INF] - Starting module management agent. 2017-eleven-28 18:46:19 [INF] - Version - 1.0.7516610 (03c94f85d0833a861a43c669842f0817924911d5) 2017-eleven-28 18:46:19 [INF] - Edge agent attempting to connect to IoT Hub via AMQP... 2017-11-28 xviii:46:49 [INF] - Edge agent attempting to connect to IoT Hub via AMQP over WebSocket...                          

Root cause:

A networking configuration on the host network is preventing the IoT Edge amanuensis from reaching the network. The amanuensis attempts to connect over AMQP (port 5671) showtime. If the connection fails, information technology tries WebSockets (port 443).

The IoT Edge runtime sets upwardly a network for each of the modules to communicate on. On Linux, this network is a bridge network. On Windows, information technology uses NAT. This issue is more mutual on Windows devices using Windows containers that utilize the NAT network.

Resolution:

Ensure that there is a route to the internet for the IP addresses assigned to this bridge/NAT network. Sometimes a VPN configuration on the host overrides the IoT Border network.

IoT Edge amanuensis tin can't access a module's epitome (403)

Observed behavior:

A container fails to run, and the edgeAgent logs bear witness a 403 fault.

Root crusade:

The IoT Edge agent doesn't take permissions to access a module's epitome.

Resolution:

Make sure that your registry credentials are correctly specified in your deployment manifest.

Border Amanuensis module reports 'empty config file' and no modules start on the device

Observed behavior:

The device has problem starting modules defined in the deployment. But the edgeAgent is running but continually reporting 'empty config file...'.

Root cause:

By default, IoT Edge starts modules in their own isolated container network. The device may exist having trouble with DNS name resolution within this private network.

Resolution:

Option one: Gear up DNS server in container engine settings

Specify the DNS server for your environment in the container engine settings, which will utilize to all container modules started past the engine. Create a file named daemon.json specifying the DNS server to use. For example:

              {     "dns": ["1.1.1.i"] }                          

The in a higher place example sets the DNS server to a publicly accessible DNS service. If the edge device can't admission this IP from its surroundings, replace information technology with DNS server address that is accessible.

Place daemon.json in the right location for your platform:

Platform Location
Linux /etc/docker
Windows host with Windows containers C:\ProgramData\iotedge-moby\config

If the location already contains daemon.json file, add the dns key to it and save the file.

Restart the container engine for the updates to take effect.

Platform Command
Linux sudo systemctl restart docker
Windows (Admin PowerShell) Restart-Service iotedge-moby -Force

Identify daemon.json in the /etc/docker directory on your device.

If the location already contains a daemon.json file, add the dns cardinal to it and salvage the file.

Restart the container engine for the updates to have event.

                sudo systemctl restart docker                              

Option 2: Set DNS server in IoT Border deployment per module

Y'all can set DNS server for each module's createOptions in the IoT Border deployment. For instance:

              "createOptions": {   "HostConfig": {     "Dns": [       "x.10.x.x"     ]   } }                          

Alert

If you utilize this method and specify the incorrect DNS accost, edgeAgent loses connection with IoT Hub and can't receive new deployments to fix the issue. To resolve this issue, you lot tin reinstall the IoT Border runtime. Before you install a new instance of IoT Border, be certain to remove any edgeAgent containers from the previous installation.

Be sure to prepare this configuration for the edgeAgent and edgeHub modules as well.

IoT Edge hub fails to start

Observed beliefs:

The edgeHub module fails to start. You may see a message like one of the following errors in the logs:

              One or more than errors occurred. (Docker API responded with status code=InternalServerError, response= {\"message\":\"driver failed programming external connectivity on endpoint edgeHub (6a82e5e994bab5187939049684fb64efe07606d2bb8a4cc5655b2a9bad5f8c80): Fault starting userland proxy: Demark for 0.0.0.0:443 failed: port is already allocated\"}\n)                          

Or

              info: edgelet_docker::runtime -- Starting module edgeHub... warn: edgelet_utils::logging -- Could not starting time module edgeHub warn: edgelet_utils::logging --     caused past: failed to create endpoint edgeHub on network nat: hnsCall failed in Win32:           The process cannot access the file because it is being used past another process. (0x20)                          

Root crusade:

Some other process on the host machine has bound a port that the edgeHub module is trying to bind. The IoT Edge hub maps ports 443, 5671, and 8883 for utilise in gateway scenarios. The module fails to starting time if another process has already bound one of those ports.

Resolution:

You can resolve this issue two ways:

If the IoT Edge device is functioning as a gateway device, then you need to find and cease the process that is using port 443, 5671, or 8883. An error for port 443 usually means that the other procedure is a web server.

If you don't need to utilise the IoT Border device as a gateway, then yous tin remove the port bindings from edgeHub's module create options. You tin can change the create options in the Azure portal or straight in the deployment.json file.

In the Azure portal:

  1. Navigate to your IoT hub and select IoT Edge.

  2. Select the IoT Edge device that you want to update.

  3. Select Set Modules.

  4. Select Runtime Settings.

  5. In the Edge Hub module settings, delete everything from the Create Options text box.

  6. Salvage your changes and create the deployment.

In the deployment.json file:

  1. Open the deployment.json file that you applied to your IoT Border device.

  2. Discover the edgeHub settings in the edgeAgent desired backdrop section:

                      "edgeHub": {     "settings": {         "epitome": "mcr.microsoft.com/azureiotedge-hub:1.1",         "createOptions": "{\"HostConfig\":{\"PortBindings\":{\"8883/tcp\":[{\"HostPort\":\"8883\"}],\"443/tcp\":[{\"HostPort\":\"443\"}]}}}"     },     "type": "docker",     "status": "running",     "restartPolicy": "ever" }                                  
  3. Remove the createOptions line, and the trailing comma at the finish of the image line earlier it:

                      "edgeHub": {     "settings": {         "image": "mcr.microsoft.com/azureiotedge-hub:1.i"     },     "type": "docker",     "condition": "running",     "restartPolicy": "always" }                                  
  4. Save the file and apply it to your IoT Edge device again.

IoT Edge security daemon fails with an invalid hostname

Observed beliefs:

Attempting to bank check the IoT Border security director logs fails and prints the following message:

              Error parsing user input information: invalid hostname. Hostname cannot be empty or greater than 64 characters                          

Root cause:

The IoT Edge runtime can but support hostnames that are shorter than 64 characters. Physical machines normally don't accept long hostnames, but the issue is more common on a virtual machine. The automatically generated hostnames for Windows virtual machines hosted in Azure, in particular, tend to be long.

Resolution:

When you come across this error, you can resolve it by configuring the DNS proper noun of your virtual machine, and then setting the DNS name every bit the hostname in the setup command.

  1. In the Azure portal, navigate to the overview page of your virtual car.

  2. Select configure under DNS name. If your virtual auto already has a DNS proper name configured, yous don't need to configure a new i.

    Configure DNS name of virtual machine

  3. Provide a value for DNS name characterization and select Salvage.

  4. Copy the new DNS name, which should be in the format <DNSnamelabel>.<vmlocation>.cloudapp.azure.com.

  5. Inside the virtual machine, use the post-obit command to gear up the IoT Edge runtime with your DNS name:

    • On Linux:

                              sudo nano /etc/iotedge/config.yaml                                              
    • On Windows:

                              notepad C:\ProgramData\iotedge\config.yaml                                              
  1. In the Azure portal, navigate to the overview folio of your virtual machine.

  2. Select configure under DNS name. If your virtual machine already has a DNS name configured, yous don't need to configure a new i.

    Configure DNS name of virtual machine

  3. Provide a value for DNS name label and select Save.

  4. Copy the new DNS proper name, which should be in the format <DNSnamelabel>.<vmlocation>.cloudapp.azure.com.

  5. On the IoT Border device, open up the config file.

                        sudo nano /etc/aziot/config.toml                                      
  6. Supervene upon the value of hostname with your DNS name.

  7. Save and close the file, so apply the changes to IoT Border.

                        sudo iotedge config employ                                      

Tin can't get the IoT Edge daemon logs on Windows

Observed beliefs:

Yous get an EventLogException when using Get-WinEvent on Windows.

Root cause:

The Go-WinEvent PowerShell command relies on a registry entry to be nowadays to find logs past a specific ProviderName.

Resolution:

Ready a registry entry for the IoT Edge daemon. Create a iotedge.reg file with the following content, and import in to the Windows Registry past double-clicking information technology or using the reg import iotedge.reg command:

                Windows Registry Editor Version 5.00  [HKEY_LOCAL_MACHINE\Arrangement\CurrentControlSet\Services\EventLog\Application\iotedged] "CustomSource"=dword:00000001 "EventMessageFile"="C:\\ProgramData\\iotedge\\iotedged.exe" "TypesSupported"=dword:00000007                              

DPS customer mistake

Observed behavior:

IoT Edge fails to showtime with error message failed to provision with IoT Hub, and no valid device backup was found dps client fault.

Root cause:

A grouping enrollment is used to provision an IoT Edge device to an IoT Hub. The IoT Border device is moved to a unlike hub. The registration is deleted in DPS. A new registration is created in DPS for the new hub. The device is not reprovisioned.

Resolution:

  1. Verify your DPS credentials are correct.
  2. Employ your configuration using sudo iotedge apply config.
  3. If the device isn't reprovisioned, restart the device using sudo iotedge system restart.
  4. If the device isn't reprovisioned, force reprovisioning using sudo iotedge system reprovision.

To automatically reprovision, set dynamic_reprovisioning: true in the device configuration file. Setting this flag to truthful opts in to the dynamic re-provisioning characteristic. IoT Edge detects situations where the device appears to have been reprovisioned in the cloud by monitoring its own IoT Hub connection for certain errors. IoT Edge responds by shutting itself and all Border modules downwardly. The next time the daemon starts upwards, it will endeavor to reprovision this device with Azure to receive the new IoT Hub provisioning data.

When using external provisioning, the daemon will also notify the external provisioning endpoint about the re-provisioning issue before shutting down. For more information, see IoT Hub device reprovisioning concepts.

Stability bug on smaller devices

Observed behavior:

You may feel stability problems on resource constrained devices like the Raspberry Pi, particularly when used as a gateway. Symptoms include out of retention exceptions in the IoT Edge hub module, downstream devices failing to connect, or the device failing to ship telemetry letters afterwards a few hours.

Root cause:

The IoT Edge hub, which is office of the IoT Edge runtime, is optimized for performance by default and attempts to allocate big chunks of memory. This optimization is non ideal for constrained edge devices and tin cause stability bug.

Resolution:

For the IoT Border hub, prepare an environment variable OptimizeForPerformance to imitation. There are two ways to ready environment variables:

In the Azure portal:

In your IoT Hub, select your IoT Edge device and from the device details folio and select Ready Modules > Runtime Settings. Create an environment variable for the IoT Edge hub module called OptimizeForPerformance that is set to false.

OptimizeForPerformance set to false

In the deployment manifest:

              "edgeHub": {   "type": "docker",   "settings": {     "image": "mcr.microsoft.com/azureiotedge-hub:one.1",     "createOptions": <snipped>   },   "env": {     "OptimizeForPerformance": {       "value": "false"     }   },                          

IoT Border module fails to ship a bulletin to edgeHub with 404 error

Observed behavior:

A custom IoT Edge module fails to ship a message to the IoT Edge hub with a 404 Module not plant error. The IoT Edge daemon prints the following message to the logs:

              Fault: Time:Thu Jun  4 19:44:58 2018 File:/usr/sdk/src/c/provisioning_client/adapters/hsm_client_http_edge.c Func:on_edge_hsm_http_recv Line:364 executing HTTP asking fails, status=404, response_buffer={"bulletin":"Module not institute"}u, 04 )                          

Root crusade:

The IoT Edge daemon enforces process identification for all modules connecting to the edgeHub for security reasons. It verifies that all letters existence sent by a module come from the principal process ID of the module. If a bulletin is existence sent by a module from a dissimilar process ID than initially established, it volition refuse the message with a 404 error message.

Resolution:

As of version 1.0.7, all module processes are authorized to connect. For more data, run across the 1.0.seven release changelog.

If upgrading to one.0.vii isn't possible, complete the following steps. Make sure that the same process ID is always used by the custom IoT Border module to send messages to the edgeHub. For instance, make sure to ENTRYPOINT instead of CMD command in your Docker file. The CMD command leads to i process ID for the module and another process ID for the bash command running the main program, but ENTRYPOINT leads to a single process ID.

IoT Border module deploys successfully then disappears from device

Observed beliefs:

After setting modules for an IoT Edge device, the modules are deployed successfully but afterward a few minutes they disappear from the device and from the device details in the Azure portal. Other modules than the ones defined might also appear on the device.

Root cause:

If an automatic deployment targets a device, it takes priority over manually setting the modules for a single device. The Set modules functionality in Azure portal or Create deployment for unmarried device functionality in Visual Studio Code will take effect for a moment. Yous see the modules that yous defined start on the device. Then the automatic deployment's priority kicks in and overwrites the device's desired properties.

Resolution:

Only utilize one blazon of deployment mechanism per device, either an automatic deployment or individual device deployments. If you take multiple automatic deployments targeting a device, you tin change priority or target descriptions to make sure the correct ane applies to a given device. Yous can also update the device twin to no longer match the target description of the automatic deployment.

For more than information, see Understand IoT Edge automatic deployments for single devices or at scale.

IoT Edge module reports connectivity errors

Observed behavior:

IoT Edge modules that connect direct to cloud services, including the runtime modules, stop working as expected and return errors around connection or networking failures.

Root cause:

Containers rely on IP package forwarding in order to connect to the internet so that they can communicate with cloud services. IP packet forwarding is enabled by default in Docker, merely if it gets disabled then any modules that connect to deject services will not work as expected. For more data, see Empathize container communication in the Docker documentation.

Resolution:

Apply the following steps to enable IP packet forwarding.

On Windows:

  1. Open the Run application.

  2. Enter regedit in the text box and select Ok.

  3. In the Registry Editor window, browse to HKEY_LOCAL_MACHINE\Arrangement\CurrentControlSet\Services\Tcpip\Parameters.

  4. Look for the IPEnableRouter parameter.

    1. If the parameter exists, set the value of the parameter to 1.

    2. If the parameter doesn't exist, add together it as a new parameter with the following settings:

      Setting Value
      Name IPEnableRouter
      Type REG_DWORD
      Value one
  5. Close the registry editor window.

  6. Restart your system to apply the changes.

On Linux:

  1. Open the sysctl.conf file.

                      sudo nano /etc/sysctl.conf                                  
  2. Add the post-obit line to the file.

                      net.ipv4.ip_forward=1                                  
  3. Save and close the file.

  4. Restart the network service and docker service to apply the changes.

IoT Edge behind a gateway cannot perform HTTP requests and start edgeAgent module

Observed behavior:

The IoT Edge daemon is active with a valid configuration file, but information technology cannot get-go the edgeAgent module. The command iotedge list returns an empty listing. The IoT Edge daemon logs report Could not perform HTTP request.

Root cause:

IoT Edge devices behind a gateway get their module images from the parent IoT Edge device specified in the parent_hostname field of the config file. The Could not perform HTTP request fault means that the child device isn't able to attain its parent device via HTTP.

Resolution:

Make certain the parent IoT Edge device can receive incoming requests from the kid IoT Edge device. Open network traffic on ports 443 and 6617 for requests coming from the child device.

IoT Edge behind a gateway cannot connect when migrating from one IoT hub to another

Observed beliefs:

When attempting to migrate a bureaucracy of IoT Edge devices from 1 IoT hub to some other, the top level parent IoT Edge device can connect to IoT Hub, just downstream IoT Edge devices cannot. The logs report Unable to authenticate customer downstream-device/$edgeAgent with module credentials.

Root cause:

The credentials for the downstream devices were not updated properly when the migration to the new IoT hub happened. Because of this, edgeAgent and edgeHub modules were set to have authentication type of none (default if not set explicitly). During connectedness, the modules on the downstream devices utilise old credentials, causing the authentication to fail.

Resolution:

When migrating to the new IoT hub (bold not using DPS), follow these steps in club:

  1. Follow this guide to consign and then import device identities from the old IoT hub to the new one
  2. Reconfigure all IoT Edge deployments and configurations in the new IoT hub
  3. Reconfigure all parent-child device relationships in the new IoT hub
  4. Update each device to point to the new IoT hub hostname (iothub_hostname under [provisioning] in config.toml)
  5. If y'all chose to exclude authentication keys during the device export, reconfigure each device with the new keys given by the new IoT hub (device_id_pk nether [provisioning.authentication] in config.toml)
  6. Restart the height-level parent Edge device get-go, make certain it'due south up and running
  7. Restart each device in hierarchy level by level from tiptop to the bottom

Security daemon couldn't start successfully

Observed behavior:

The security daemon fails to offset and module containers aren't created. The edgeAgent, edgeHub and other custom modules aren't started by IoT Edge service. In aziot-edged logs, yous run into this error:

  • The daemon could non get-go up successfully: Could not start management service
  • caused by: An error occurred for path /var/run/iotedge/mgmt.sock
  • acquired by: Permission denied (os error 13)

Root crusade:

For all Linux distros except CentOS 7, IoT Edge's default configuration is to use systemd socket activation. A permission mistake happens if yous change the configuration file to not use socket activation merely leave the URLs as /var/run/iotedge/*.sock, since the iotedge user can't write to /var/run/iotedge meaning it can't unlock and mount the sockets itself.

Resolution:

You do not need to disable socket activation on a distro where socket activation is supported. Even so, if y'all prefer to not employ socket activation at all, put the sockets in /var/lib/iotedge/. To do this

  1. Run systemctl disable iotedge.socket iotedge.mgmt.socket to disable the socket units so that systemd doesn't start them unnecessarily
  2. Modify the iotedge config to use /var/lib/iotedge/*.sock in both connect and listen sections
  3. If y'all already have modules, they have the quondam /var/run/iotedge/*.sock mounts, so docker rm -f them.

Next steps

Do you think that you found a bug in the IoT Border platform? Submit an issue so that nosotros can continue to improve.

If you accept more questions, create a Support request for help.