Cloud-init bi-weekly status
Posted on Mon 24 June 2019 in status-meeting-minutes • 10 min read
Meeting information
- #cloud-init: Cloud-init bi-weekly status, 24 Jun at 16:18 — 17:33 UTC
- Full logs at [[http://ubottu.com/meetingology/logs/cloud-init/2019/cloud-init.2019-06-24-16.18.log.html]]
Meeting summary
Previous Actions
The discussion about "Previous Actions" started at 16:23.
- LINK: https://cloud-init.github.io/status-2019-06-10.html#status-2019-06-10
- LINK: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381
- ACTION: Touch base with AnhVoMSFT by next status on priority of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381
Recent Changes
The discussion about "Recent Changes" started at 16:28.
In Progress Development
The discussion about "In Progress Development" started at 16:31.
- LINK: https://trello.com/b/hFtWKUn3/daily-cloud-init-curtin
- LINK: https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+ref/feature/disk_setup_async is the WIP branch
- LINK: https://trello.com/c/TMK5ZDMf/1108-azure-async-disk-mounts
- LINK: https://cloud-init.github.io
Vote results
Action items, by person
- AnhVoMSFT
- Touch base with AnhVoMSFT by next status on priority of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381
Done items
- (none)
People present (lines said)
- blackboxsw (73)
- rharper (23)
- AnhVoMSFT (18)
- nik736 (12)
- ubot5 (4)
- meetingology (4)
Full Log
16:18 <blackboxsw>
#startmeeting Cloud-init bi-weekly status
16:18 <meetingology>
Meeting started Mon Jun 24 16:18:34 2019 UTC. The chair is blackboxsw. Information about MeetBot at http://wiki.ubuntu.com/meetingology.
16:18 <meetingology>
16:18 <meetingology>
Available commands: action commands idea info link nick
16:19 <blackboxsw>
welcome to another episode of cloud-init status updates.
16:20 <blackboxsw>
Cloud-init upstream uses this meeting as a platform for community updates, feature/bug discussions, and an opportunity to get some extra input on current development.
16:21 <blackboxsw>
our format is the following topics: Previous Actions, Recent Changes, In-progress Development, Office Hours
16:21 <blackboxsw>
anyone is welcome to participate, interject, make suggestions or ask questions
16:22 <blackboxsw>
generally we try to host this meeting every two weeks on the day listed in the channel topic
16:23 <blackboxsw>
#topic Previous Actions
16:23 <blackboxsw>
last meeting
16:23 <blackboxsw>
#link https://cloud-init.github.io/status-2019-06-10.html#status-2019-06-10
16:24 <blackboxsw>
we had an action to follow up on any bugs related to installing ifupdown on a system that had netplan installed by default.
16:24 <blackboxsw>
I believe we did see a bug come in from Azure about that.... checking for that bug id now
16:25 <blackboxsw>
#1832381
16:25 <rharper>
bug #1832381
16:25 <ubot5>
bug 1832381 in cloud-init (Ubuntu) "vm fails to boot due to conflicting network configuration when user switches from netplan to eni" [Undecided,Incomplete] https://launchpad.net/bugs/1832381
16:25 <blackboxsw>
#link https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381
16:25 <AnhVoMSFT>
There is an action item on me to attach a log to that bug. Since the incident created by the customer was closed and we did not have permission to share his log, I will need to get a repro and retrieve the log. It's not very easy to trigger a mac address change in Azure these days
16:25 <blackboxsw>
thanks AnhVoMSFT for this bug
16:27 <blackboxsw>
ok if we carry over that action item then for next status meeting AnhVoMSFT (just to close the loop if it's important)
16:27 <AnhVoMSFT>
yep - once I get some help from our networking folks to trigger a mac address change I'll update the bug with more logs
16:27 <blackboxsw>
#action Touch base with AnhVoMSFT by next status on priority of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381
16:27 * meetingology Touch base with AnhVoMSFT by next status on priority of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381
16:27 <ubot5>
Ubuntu bug 1832381 in cloud-init (Ubuntu) "vm fails to boot due to conflicting network configuration when user switches from netplan to eni" [Undecided,Incomplete]
16:28 <blackboxsw>
good deal. that's all we had for actions from last meeting
16:28 <blackboxsw>
#topic Recent Changes
16:29 <blackboxsw>
the following items have landed on tip of cloud-init's master branch
16:30 <blackboxsw>
- sysconfig: support more bonding options [Penghui Liao]
16:30 <blackboxsw>
- cloud-init-generator: use libexec path to ds-identify on redhat systems
16:30 <blackboxsw>
[Ryan Harper] (LP: #1833264)
16:30 <blackboxsw>
- tools/build-on-freebsd: update to python3 [Gonéri Le Bouder]
16:30 <ubot5>
Ubuntu bug 1833264 in cloud-init "cloud-init-generator hardcodes path to ds-identify" [Undecided,Fix committed]
16:30 <blackboxsw>
thanks to Penghui and Gonéri for driving additional changes for cloud-init in this last sessions
16:30 <blackboxsw>
session*
16:31 <blackboxsw>
#topic In Progress Development
16:32 <blackboxsw>
there a number of longer items for feature work in progress that should see some light soon
16:33 <blackboxsw>
We track these features in trello as always
16:33 <blackboxsw>
#link https://trello.com/b/hFtWKUn3/daily-cloud-init-curtin
16:33 <blackboxsw>
minor fixup for Azure instance-data.json (cloud-init query) for region and availability zone should land today
16:34 <blackboxsw>
rharper: and blackboxsw are working on Azure-related route tables and async disk mount features
16:36 <AnhVoMSFT>
is there any bug/discussion item for the async disk mount?
16:37 <blackboxsw>
AnhVoMSFT: rharper has been testing out systemd unit magic for setting up disk mounts async and initial numbers look good. How to bake that work into cloud-init is the next small hurdle I think. (I thought he mentioned today in our standup a 50% speed increase due to async mounts instead of sync waits)
16:38 <rharper>
https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+ref/feature/disk_setup_async is the WIP branch
16:39 <blackboxsw>
AnhVoMSFT: I expect we'll have something in the next couple of days.
16:39 <blackboxsw>
orrr right now. thanx rharper
16:39 <AnhVoMSFT>
that sounds really cool. I'll check it out
16:39 * blackboxsw creates a trello card that can be watched for this feature
16:40 <blackboxsw>
#link https://trello.com/c/TMK5ZDMf/1108-azure-async-disk-mounts
16:41 <blackboxsw>
feel free to subscribe to any trello cards folks see that are of interest. you will get an email if the card changes state, like from Doing to Done or if new links are added
16:42 <blackboxsw>
Odd_Bloke: rharper process question
16:42 <blackboxsw>
what do you guys think about us turning on voting on trello cards
16:42 <blackboxsw>
people with interest on a feature/card in our backlog could upvote it and that could help drive what features we grab over time
16:43 <blackboxsw>
dunno, thought it might be something we could toss around to see if that would make sense. the board it public after all
16:43 <blackboxsw>
is public rather
16:43 <rharper>
maybe; I worry about random +1 without any more context. Platform developers already work with us; and community folks file bugs/merge proposals
16:44 <blackboxsw>
good point.
16:44 <rharper>
I'm open to the idea
16:45 <blackboxsw>
for sure, if it gets interest, we can think about adding that feature. can't hurt to have some additional input, unfounded though it may be.
16:45 <AnhVoMSFT>
agreed on the usefulness might be limited. You guys are already talking to each other. Platform developers either engage directly on this board or through out of band channel (sync meeting with Canonical product groups, etc...)
16:46 <AnhVoMSFT>
Perhaps you can try it out for a couple release periods and see how it works out
16:46 <blackboxsw>
yeah, /me just likes all the shiny objects pretty icons ;) ... need to control myself
16:47 <blackboxsw>
thx AnhVoMSFT +1.
16:48 <blackboxsw>
so I think that about wraps in-progress development. I know paride has been tirelessly working on our CI infrastructure to improve quality of CI and false positives for failures due to resource constraints. So big thanks for paride working on our jenkins workers
16:48 <blackboxsw>
#topic Office Hours (next ~30 mins)
16:49 <blackboxsw>
This is an open topic to bring any cloud-init discussions, bugs, concerns or feature requests folks have.
16:49 <blackboxsw>
In the absence of such topics we spend part of this time grooming the review queue to get back to dev
16:50 <blackboxsw>
contributors so that they don't have stale branches waiting for input
16:50 <AnhVoMSFT>
We had a review sent out to add some boot time telemetry collection as part of cloud-init analyze: https://code.launchpad.net/~samgilson/cloud-init/+git/cloud-init/+merge/368943
16:50 <blackboxsw>
thanks AnhVoMSFT I'll grab a review slot on that one now
16:51 <AnhVoMSFT>
would appreciate some reviews there and also on ideas on how to retrieve similar timestamps for FreeBSD
16:51 <rharper>
AnhVoMSFT: yes, will review
16:51 <rharper>
AnhVoMSFT: also, I filed a bug related to the azure telemetry, lemme get it
16:51 <blackboxsw>
I'll kick off a CI run on that now
16:51 <blackboxsw>
rharper: ^
16:52 <rharper>
Bug 1833731
16:52 <ubot5>
bug 1833731 in cloud-init "cloud-init analyze output not formatted cleanly on Azure" [Undecided,New] https://launchpad.net/bugs/1833731
16:52 <rharper>
AnhVoMSFT: not sure if the branch for review addresses the formatting of the output, but we should take a look to clean it up
16:52 <AnhVoMSFT>
is there a good way to subscribe to new bugs with a certain keyword/tags? I.e., I would like to auto-subscribe to all bugs that has "Azure" in the bug title
16:52 <blackboxsw>
rharper: if you get a chance to double check https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/369199 we might be able to land that too
16:53 <rharper>
blackboxsw: I asked you some questions, if you've replied, I'll look again
16:53 <blackboxsw>
rharper: nevermind, I see you already looked at it
16:53 <rharper>
ah
16:53 <rharper>
perfect
16:53 <blackboxsw>
thanks
16:53 <rharper>
I think we're mostly fine; just a question on return values
16:53 * blackboxsw needed to refresh
16:54 <AnhVoMSFT>
rharper I will take a look at the analyze output and see how we can improve it. If it is a minor change we can add it to the existing review
16:54 <rharper>
AnhVoMSFT: no need to pull it into the existing stuff
16:55 <rharper>
I'd prefer a separate targetted fix; which may land independently from the boot stage (which is super interesting on its own)
16:55 <AnhVoMSFT>
cool - we will do a separate fix then
16:55 <blackboxsw>
rharper: yeah that concern is clear, we are safe on the processing of region/az in absence of 'imds' key
16:55 <blackboxsw>
it'll return none
16:57 <blackboxsw>
by virtue of that last get('location|platformFaultDomain')
16:57 <blackboxsw>
if either is absent due to any key above being absent, you'll get None as default value
16:58 <rharper>
blackboxsw: ack
17:00 <nik736>
Hi there, not sure if this is the right place to ask, but I have problems when creating a new VM, it only happens with the debian cloud image, ubuntu is fine. Booting is stuck at the drm line, the exact line is dependent on the video model type in my libvirt xml but it is basically stuck for 20-30sec and won't continue. It will boot eventually after that time. Thanks so much for any hints. Happy to provide
17:00 <nik736>
further details.
17:02 <blackboxsw>
hrm, video model timeouts are a bit out of my wheelhouse :/
17:02 * blackboxsw pokes around a bit in google
17:02 <nik736>
it seems to be that the lines after it would be about resizing the file system. I am not really sure if this is cloud-init related at all and I am not sure if it actually is caused by the video model or is just taking a bit to get to the next steps
17:03 <blackboxsw>
nik736: you can run cloud-init analyze show or cloud-init analyze blame to see what cloud-init says it is spending a lot of time on
17:03 <nik736>
I tried different host systems, Debian 18.04, 19.04, Debian 9, different libvirt versions, different qemu versions, nothing seems to be helping lol
17:03 <blackboxsw>
(If you have cloud-init v 18 or later in your image I think)
17:04 <nik736>
ah, ok, thanks, I will look into that
17:04 <blackboxsw>
nik736: also systemd-analyze blame is a good helper for what is killing boot time
17:04 <AnhVoMSFT>
do you see any timestamp gap that reflects the 20-30s in cloud-init.log ?
17:08 <rharper>
nik736: feel free to file a bug and attach logs from the 'cloud-init collect-logs' output (or serial console if available) and /var/log/cloud-init.log if you can get into the instance afterwards
17:08 <nik736>
thanks for the help, currently looking into it
17:11 <AnhVoMSFT>
rharper blackboxsw we have some instance deployment where cloud-init is hanging at the command ip route add - any idea how to look further?
17:12 <AnhVoMSFT>
this does look like a platform problem, so it is more of a question related to networking, rather than cloud-init itself
17:14 <AnhVoMSFT>
it's super hard to reproduce so the only thing we have so far to work with is logs. I thought the call to ip route add basically adds an entry to the kernel routing table. Is there an interaction with networking involved which might cause it to hang?
17:14 <rharper>
AnhVoMSFT: I wonder if it's creating a route that breaks connection to IMDS or something else that cloud-init would then do an HTTP get on ?
17:17 <nik736>
I saw in the log that 2 entries are around 1 minute apart "SUCCESS: searching for local datasources" and "Cloud-init v. 0.7.9 running 'init' at Mon, 24 Jun 2019 17:13:41 +0000. Up 73.67 seconds." I am not sure if this could be it or if this looks fine
17:18 <rharper>
0.7.9 is quite old, seeing the full cloud-init.log will be most useful for us to understand what's happening
17:19 <nik736>
okay, sec
17:19 <AnhVoMSFT>
rharper that is a good theory. I do see in a good case there's a call to IMDS immediately after that, although that call has a timeout. If it fails we should see more logs coming out of cloud-init. I'll look further into that todya
17:20 <nik736>
@rharper https://pastebin.com/fzCSH5kC
17:20 <rharper>
AnhVoMSFT: the retry logic in DataSourceAzure is quite long IIRC, so it's quite possible this is the very issue that blackboxsw is working w.r.t ensuring the instances always have a source-ip route to the IMDS
17:21 <AnhVoMSFT>
rharper indeed it is long, and the log was overly suppressed to avoid log from growing too large while VM was waiting in pre-provisioning state. We are adding back some of the logs (in a smarter way to get enough details while avoiding huge log size)
17:22 <rharper>
nik736: so, between line 260 and 261 there's a large timedelta; that's outside of cloud-init; cloud-init is executed separately 4 times (cloud-init init --local, cloud-init init, cloud-init config --modules, cloud-init config --final)
17:23 <rharper>
nik736: so if you have a systemd journal, we could see what happens between the end of cloud-init-local.service and cloud-init.service (stage1 and 2);
17:23 <nik736>
ah, okay, interesting
17:23 <nik736>
will check
17:23 <rharper>
or syslog might see stuff between those two time points
17:23 * rharper steps away for a bit, please keep sending info here; I'll respond when I;m back
17:24 <AnhVoMSFT>
nik736 systemd-analyze critical-chain cloud-init.service might help here - I think some systemd service is running right after init-local and just before init and that service is taking time
17:25 <nik736>
will check, thanks for your help, really appreciate it.
17:32 <blackboxsw>
I think I'll wrap the meeting here, but we can continue the conversation. Thanks again folks for the discussions
17:33 <blackboxsw>
next meeting will be July 8th
17:33 <blackboxsw>
as updated in the topic
17:33 <blackboxsw>
meeting minutes will be posted to
17:33 <blackboxsw>
#link https://cloud-init.github.io
17:33 <blackboxsw>
#endmeeting
Generated by MeetBot 0.1.5 (http://wiki.ubuntu.com/meetingology)